26 March 2011

The Math Debate

For quite a while, especially in internet years, there has been something of a debate about Math and programming going on in the programmer blogosphere, some notable examples of this are: Alan Skorkin’s "You Don’t Need Math Skills To Be A Good Developer But You Do Need Them To Be A Great One", Jeff Atwood’s "Should Competent Programmersbe "Mathematically Inclined"?", and of course Steve Yegge’s "Math Every Day" and "Math For Programmers"

The recent Watson victory on Jeopardy has further pushed the Math intensive fields of Machine Learning and Data Science more prominently into the programmer zeitgeist, which is especially evident on sites like Dzone and Reddit/r/programming. This trend is also underscored by the fact that a very hot area for Technolgy Company hiring is statistics.

The above bloggers, with the exception of Jeff Attwood, emphatically advocate that programmers should take the initiative to improve their Math skills.  I am very much in agreement with this approach and would recommend reading all of the above blog entries, especially those by Steve Yegge as they are most substantive. I should also add the disclaimer that this post and some of my future posts will mirror those, especially Steve Yegge’s. This is intentional as I have parallel views and hope to refine and expand upon some ideas that he has addressed in his blog.

As you may or may not be aware, all programmers are already using Math every day, and it’s right below the surface of what we do and I am not just talking about simple arithmetic, if you know it you may already see it, regular expressions and the compiler you use are built on Formal Language Theory and Automata Theory, functional languages are based in part on Lambda Calculus, Algorithm Analysis is pure Math, most data structures, if not all, can be described by Graph Theory, SQL Databases are described by the Relational Algebra and it does not end there.

As Steve Yegge also points out, for most programming jobs today you don’t need much Math, but I think that may be about to change, the way things are trending many of the low end programming jobs are moving offshore, and I worry that this could be a permanent change, after all how many manufacturing jobs have returned, of course you don’t need a factory to create software which means that the low end programming jobs may just become low paying jobs worldwide.  A little while back a blog entry showed up on DZone called "Will a Two Tier Market For Developers Emerge As a Result of Scala & Clojure?", I believe that this may be only part of the picture and you guessed it, I think Math with will be part of that equation. The increased language power is not a concept in a vacuum, its context is Math, its ability to handle and model increasingly complex problems will require an increasingly advanced conceptual understanding of those problems.

I think that the field of software development is about to go from one of the least Math intensive fields to possibly the most Math intensive in the relatively near future.  I also think that these changes will be wide sweeping and many now seemingly irrelevant disciplines of Math will be used in ways that will be both surprising and amazing.  Additionally I would not be surprised to see the field of Software Engineering evolve into a more formal discipline shaped by Math in much the same way that traditional engineering fields are, but with different types of Math.

So what’s wrong with us, why is Math in our industry such an alien concept to most of us?  It’s because of how and what is taught, the current Math curriculum is myopic, antiquated and wholly inadequate for an information driven society. The current Math educational trajectory is mainly intended for students planning on becoming and Electrical or Mechanical or other type of Engineer or a Scientist1. I would attribute this problem to four facets of the perspective of Math in regards to education, the first is our educational curriculum mostly mirrors how humans discovered Math, my grade school, middle school and high school education which included Calculus, was all Math that was discovered prior to the nineteenth century. However most of the Math that is relevant to what I do now was discovered in the last 100-150 or so years, I’m thinking roughly since the era of Cantor’s Set Theory2. Another issue with the current perspective is a bias to the to the wrong side of a of what might be called a Mathematical Schism.   The third, as I previously alluded to, is that the curriculum is still oriented to the needs of an analog manufacturing society.  This creates an additional negative side effect for C.S. curricula, you essentially have to learn a lot of new basic Math concepts, so when you get to college you start out already being behind on the Math you need, and since there is limited course time in a four year program it eats into time you could spend on other and more advanced topics.  Of course this is all irrelevant, in the sense that it’s too late, to my current situation and probably yours as well.  The fourth, which I attribute to Conrad Wolfram, is that Math is taught in terms of the mechanics of computation which diminishes much of the important underlying conceptual nature.

I think Jeff Attwood’s comments are interesting in that they probably exemplify those typical of most developers, the problem is clearly perspective, and honestly I understand it, it’s not easy to make that leap, Math related programming like Machine Learning is easy to see but the rest of it seems so non-Math related.  Also think I have to disagree with the right brain comment, as I would think creating Math, not that I would know firsthand, is as creative an endeavor as are programming and art, Satyan Devadoss advocates the interrelationship between Math and art as a way to create new Math and talks about Math as a creative endeavor in his TTC lectures "The Shape of Nature." Erik Demaine’s Computational Origami work is part of MOMA’s permanent collection and is clearly creative work.

In regards to learning Math, the problem that I think many programmers face is what, how and when do I find the time. As for the first two, Steve Yegge covers this, and I have specific future entries in the works as well to expand on this, as for the time, when do you find the time to learn anything else? The trick is to add Math into the mix and like that four year curricula you have to make choices, I am still trying to get to do some recreational programming with Scala and Closure but my Math obsession and my crazy idea to start a blog are taking time from those and many other ideas I want to pursue.  As for the time, here’s a couple of tips, there are books and tons of things you can print from the internet to keep around to look at. Also I usually try to have some reading material with me if I am potentially in any situation where I may have to wait for someone or something like meeting up with friends or waiting on an appointment and I usually have a Topology book in my car for emergency Math reading like being stuck in traffic. Additionally fellow Math enthusiast Antonio Cangiano has some good learning insights in "The pursuit of excellence in programming", which is programming related but can be applied to Math as well.

Unfortunately, my Math interest vastly exceeds my Math ability, so I have quite a number of things that I am still trying to grok.  Also I confess I am not solely driven by just programming, my interests in general Science are broad and a deeper understanding of Fractal Geometry and Chaos Theory have, for a long time, been goals of mine and even though I consider myself to incredibly dense at times, the sheer buzz of just figuring things out and seeing them in my mind has become its own reward also the ability to read and understand, sometimes all of, papers and doctorial thesis’s in CS and Math can be pretty mind blowing as well, I haven't really done much with Math yet in regards to programming, but I hope to do cool stuff with it someday soon. All in all I now feel the Math journey is worth it just for journey itself because ultimately MATH TOTALLY BLOWS MY MIND3. I guess what I am saying is try to figure out a way to love it so that it becomes something that you want to do and not a burdensome chore.  Stay tuned for some of my ideas on how to do this.

1 Discrete Math is increasingly being applied to many fields previously dominated primarily by Continuous Math, such as Physics, Chemistry, Biology and more.

2 Many disciplines were actually started well before this time such the work of Galios from the early nineteenth century is critically important as well, and one shouldn’t forget Euler’s Eighteenth Century work which was the inception of Graph Theory and Topology. Not to forget the work of Pascal and Fermat whose work and that of many others paved the way for the more recent period of heavy activity.

3Steve Yegge whispers this idea. I am YELLING IT! I just think Math is really cool!

19 March 2011

What does your framework look like?

Paul Graham, entrepreneur, Lisp advocate, and general geek hero, posits1, and I am paraphrasing, that developers should develop in Lisp because otherwise they end up duplicating inherent Lisp functionality to build an advanced system. I currently work mostly with the Spring Framework, the initial and continuing goal of Spring is to simplify and "abstract out" a lot of the drudge work of building applications with J2EE, in fact what first seduced me was the ease, especially with an ORM like Hibernate, that you can very easily and with almost no code create basic Web CRUD apps. Of course creating quality apps is not completely simplified and these technologies are non-trivial and open a whole new dimension of complexity, but in a good way, mostly.


During my current Spring and Hibernate tenure, I have come to realize that in order to build a DRY system with these technologies you really need to build your own framework on top of these frameworks, this sentiment was echoed to me in a job interview by an architect, who was lamenting that there was a perception within the organization that once you drop in Spring and Hibernate all of your infrastructure work is done, in fact our concordance on this point was one of several reasons why I was offered and accepted the position.


So what does this framework look like? Well in this context I would abstractly define a "framework" as a collection of components and rules. Its intent is to guide the structural s development of a system towards DRY and consistent code and to create and maintain Conceptual Integrity in the system and throughout the development process and to create clean high level interfaces to increase developer productivity. This framework will vary from system to system and it will be driven by project specific technology decisions and design.


The two aspects of this approach are pretty intertwined, so I’ll start with the structural components, which can include a number of possible component categories, some components may be created to fill in deficiencies with the underlying technology components, while most will probably be more vertical convenience components which bridge the gap from application needs to the generalized underlying technology frameworks like Spring. A good example of this is security. Many projects I have worked on have had special security needs that in some cases require extending and overriding existing Spring Security classes. Essentially any code that can be used in multiple places in a system is a candidate for being "pulled down" to the framework layer. Also components that can be commoditized into services can be thought of as part of this approach also. In the Spring and Hibernate world two common framework patterns are the Generic Dao and the model base class.


Ultimately I cannot define what your framework will look like, as mine tends to vary from project to project and generally includes the components mentioned above, as well as other components such as custom property editors, reusable comparators, Dao and Model audit functionality, special JSP tags, etc. Your framework may be quite different, and it might not be in Java, as it might be in another language and framework like Ruby on Rails or Scala/Lift for example. Also you should plan on the framework being an organic evolving entity which grows constantly through refactoring and harvesting, and in this case I would call it "downward refactoring" where common and possibly redundant code is refactored down to the framework layer.


The rules aspect includes, of course, coding standards, but it should also guide developers on their choices covering how to use underlying technologies, and relatively mundane things like how the project is structured, and naming, also this is not a comprehensive list and also may vary over the project event space. An example of rules would be to dictate how specific technologies are used, in Spring and Hibernate there are often multiple ways to write code which performs the same function as well as competing configuration "modes" e.g. annotations vs. XML. One common coding variation in Hibernate systems is the HQL vs Criteria API "debate" many systems I have worked on tend to have a haphazard mix of both, a consistent system will have Dao classes that have specific rules about when to choose the Criteria API over HQL and thus not leave it to the developer’s whim. Rules should also be viewed organic and evolving, especially if the developers are new to technologies. Also the rules should extend to how and when the framework components should be used.


Of course as I was writing this, I came across several reuse skepticism articles including "Reuse Myth - can you afford reusable code?" which was on the Reddit and following the links yielded: "the imperial clothing crisis" and "Hidden Costs Of Code Reuse". At first, my reaction was that all of this was completely antithetical to my arguments; however, in reading them, I realized that they are good sources for refinement of my points.


First I should point out that the framework architecture is hopefully created by the architects and alpha developers, also the alpha developers generally produce higher quality code at least three times faster than average 80%, so conceivably your software creation rate is still at least one to one, based on the argument about reusable code takes three times longer, also I would argue that in general it does in fact take more time to produce higher quality software than to produce lower quality software. Admittedly, most manager types usually don’t want to hear that!


It should be noted that framework code should be constructed in a deliberated and hopefully collaborative way, and that not everything should be made reusable, however, I am of the opinion that, if you use it more than once it is a candidate for reuse refactoring, however, this needs to be tempered by the level of effort to achieve reusability, sometimes cut and paste makes more sense. Also, as the above reuse skepticism articles suggest, don’t go crazy trying to create a bunch of reusable code before you have the need to use it. Another somewhat ironic point is that the framework concept that I am promoting often in part consists of components that reduce the generality of the underlying frameworks narrowing those API’s for more focused integration to define a cleaner interface with the higher level application code.


There is one additional "meta" component that I have used and seen during my career, it is sometimes referred to a sample or template project, the more recent name which I have encountered is bootstrap project. The bootstrap project often both exemplifies and ties the rules and components aspects together. It usually contains the configuration structure and usually sample code to guide how projects should be constructed, and it is often an actual running project which can be deployed to the target development environment.


1 This particular reference is actually restatement of Greenspun's Tenth Rule, which Paul Graham references in his essay, when I first wrote this I remembered it as Paul Graham's, however, upon rediscovering the essay I realized that it was actually a reference, I decided to leave it this way because I think Paul Graham’s essay and body of work is worth reading as is Philip Greenspun’s.