25 February 2012

Fibonacci: The Math of Agile



Agile planning poker1 uses the Fibonacci sequence or in some cases a bastardized version of it.  I do not know the exact reason that the series was chosen, I do know the intent is that there is a standard set of numbers and using these deters the player from the temptation of using false precision, like a value of 10.5 for example.  Also it captures growth so each next number is a "unit" of growth larger than the last.  I suspect that the sequence of powers of two (1,2,4,8,16,32,64...) would also work, and might be more apropos to the computer industry, but these two sequences are similar in that they both increase by a constant factor, each subsequent power of two is two times larger and in the Fibonacci sequence each term is roughly 1.618... larger than the previous term, also each has a closed form exponential formula for each term 2n or for Fibonacci sequence it's Binet’s Formula, see below.   I don’t know if the Fibonacci sequence is the right way to go, but it seems to work and it is fairly prevalent in nature and art so it’s probably not a bad choice.  I confess that while this in one of my math of programming series, I am using it primarily as an excuse to write a post about some interesting results relating to the Fibonacci sequence and the Golden Mean, so if you are only interested in Agile you may want to treat this as a short post and stop here, but I encourage you to read on, the math is mostly high school level and it really is interesting.


The Fibonacci sequence is generally attributed to a man, Leonardo Pisano Bigollo who went by Fibonacci and did not actually discover the sequence, it was known about five hundred years earlier in the tenth century by Pingala and it was probably known much earlier.  Ironically Fibonacci’s greatest mathematical contribution was the introduction of Hindu-Arabic Numeral System to Europe, just imagine trying to do multiplication and fractions with Roman numerals.


The Fibonacci sequence is what is known as a recurrence relation or recursive sequence each term is defined as the sum of the two previous terms where the first two terms, which are the generator or seed values F0 = 0 and F1 = 1, the formula is:



It can alternatively be written as:


This form will be more useful to us in a moment.


The sequence is:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393,...


The Fibonacci sequence has a very interesting relation with a very interesting number known as the Golden Mean which is sometimes referred to as phi and has the following value:



An irrational number that is approximately: 1.6180339887498948482045868343656...


The ratio of a Fibonacci number divided by the previous number in the sequence eventually converges to phi, the first few ratios are:


3/2 = 1.5
5/3 = 1.666666...
8/5 = 1.6
13/8 = 1.625
21/13 = 1.615384...
34/21 = 1.619047...
55/34 = 1.617647...
89/55 = 1.6181818...

As you can see it is already heading there.  It is interesting that a ratio which is in the set of rational numbers converges to an algebraic irrational number. The following equation describes this relationship, the Limit of consecutive quotients:




Additionally the following is true:



If we take the following formula:



And divide it by Fn we get:




We then take the limit as n goes to infinity, replacing the terms using the two limit equations from above:




And if we multiply by phi:



Solving for zero gives us:



Using the quadratic equation we get the following two roots (conjugates):



This is phi and its conjugate, which we will call tau:




Now that we have defined phi and tau we can describe Binet’s formula, mentioned above, which is an exponential form closed formula:



Binet’s formula is an easy way to compute Fibonacci Sequence values without having to calculate all of the values prior to the value you wish to calculate. 


Additionally there is an interesting matrix based calculation:



The above matrix raised to the nth power gives the above matrix of Fibonacci numbers, the first few are:



The determinant of this matrix is (-1)n which is described by Cassini’s Identity:



The Fibonacci sequence and the Golden mean have many interesting identities and relationships with other mathematical disciplines and natural phenomena, for example there is the famous relation rabbit breeding, it relates to bee ancestry, the packing of seeds in flowers, and other plant growth proportions. The Fibonacci sequence occurs in the shallow diagonals of Pascal’s Triangle, it relates to Pythagorean Triples. It relates to the similar Lucas Sequence , the Fibonacci Spiral form above which is a Logarithmic Spiral and the list goes on, a lot of this can be found on the previously linked Wikipedia Page and this BBC radio program.


1 http://renaissancesoftware.net/papers/14-papers/44-planing-poker.html

20 February 2012

Is your Team LCD (Lowest Common Denominator)?



On some of my previous projects I have felt that the teams became severely dysfunctional and as a result they cater to the lowest common denominator of the team.  This was often a result of a lack of both technical and overall leadership.  In these cases schisms formed in the team I was usually at the center of one side advocating good design practices, a high degree of control and a framework approach to reuse.  I should add here that while I am very passionate about trying to build quality software and I try to be as egoless as possible.  Sadly my opponents in these situations were the type of programmers that did not read and in some cases they were woefully ignorant of basic software design concepts like the Solid principles and design patterns.  In one case we had several “senior” java developers who didn’t understand generics or even inheritance, that situation really bordered on the absurd.


In these situations I have often felt that these developers were essentially being very selfish, in some cases it was that the developers wanted a wild west chaotic approach to software development so there were no constraints and they were free to tinker with new libraries and frameworks, what I like to call Resume Driven Development. In other cases it was that they didn’t want to have to learn anything new or challenging, and in some cases it seemed to be pure ego driven stubbornness or inferiority complex driven compensatory behavior.


Regardless of the causes of the dysfunctional nature of the team it always resulted in the dominance of the Lowest common denominator, this can applies to technical level as well professional/maturity level, often it’s both.  The lack of leadership and a fractured team often had negative effects on morale especially for the better developers.  It often created an environment that produced suboptimal architecture and code.  Consensus was often impossible to achieve and code duplication was rampant sometimes deliberate.  Naturally code reviews were either not conducted or if they were they were done in a perfunctory way so that any feedback was completely disregarded.


The worst part of the LCD team is the lack of ability to have open objective criticism.  Criticism is hard to take but anyone engaging in any activity, especially a creative one, in which they wish to improve can greatly benefit from it, this is described in detail in a great essay by Bobby Grossman.  To parallel that essay we had one person on the team who if you even hinted at being critical he immediately became super defensive.  He was a terrible programmer and will probably never improve.  To engineer a good system you need to have a constructive and positive way to deal with mistakes that occur in code, architecture and process. Discordance on the team causes mistakes to be viewed as validation of an opposing viewpoint.  It’s an extremely counterproductive and demoralizing environment.  It leads to less communication as many people will avoid the conflict, and in one case I felt that I had achieved a level of fatigue and I just stopped caring, I bailed shortly after.


The LCD team illustrates why so many software projects have problems, in the above case the project failed after I left, in part due to the dysfunctional nature of the team.  My goal here is not harp on the negative but to use this as negative space for what you really want.  The opposite of this is a team that aspires to be great and to be more than the sum of its parts. I know it sounds like a cliché but I really believe a good team is one were the stronger developers help the weaker and less experienced developers become better because everyone wants to.  You want a team that can openly discuss and criticize aspects of the system and discover and fix mistakes early.  I think this can happen with a team of professionals who can put their ego aside to do good work and a strong leader who trusts their team and leads with fairness and wisdom and who is open to hearing input from the team but willing to make firm decisions.   Human nature in most cases probably precludes such a utopian software team, I think the goal here is to optimize and aspire to it.

05 February 2012

The Software Reuse Complexity Curve

I have come to the conclusion that Complexity and Reuse have a nonlinear relationship that has a parabolic shape, shown below, Complexity increases on the Y axis and reuse increases on the X axis.  Code complexity is high both with very low reuse and very high reuse forming a minima which might be viewed as an optimization point for reusability, now remember this is a very simple two dimensional graph, a real optimization scenario for this type of problem may include many dimensions and probably would not have a nice clearly defined single minima.




Scenario #1 Low Reuse


In a low reuse situation everything that could have been implemented using reusability is duplicated and may also be implemented inconsistently, so the code base will be larger which would require a maintainer to learn and know more code, also since there will be variations of varying degree between the similar components the maintainer would have to be constantly aware of these differences, which might be subtle, one might even find them confusing at times.


As we have previously seen reuse relates to entropy, so a low reuse system would have a large random, “uncompressed” codebase, which has high entropy, I have seen this type of system in my career and it is something I hate to work on.


Skill vs. Complexity


Before I go to the opposite end of the spectrum, which is probably more interesting, and perhaps more speculative on my part, I’d like to talk about the relationship of code complexity to maintainer developer skill.


I believe, and this really shouldn’t be a stretch, that there are some types of complex code that require certain higher level proficiencies and theoretical knowledge, an obvious choice is parsers and compiler construction.  A developer who works in this area needs to have a good understanding of a number disciplines, which include but are not limited to: Formal Language Theory and Automata Theory and possibly some Abstract Algebra, which sadly many developers lack.  If you had a project that had components that are any type of compiler or parser and needed someone to create and subsequently work on the maintenance of the code, they would need to know these things, they would need to understand this complexity in an organized theoretical way.  The alternative would be an ad hoc approach to solving these problems which would probably yield significant complexity issues of its own.


These types of things can be seen at simpler scale, a system that makes use of state machines or heavy use of regular expressions will exhibit more compact and more complex code.  The maintainers have to have a skill level to deal with this more complex approach, sadly not all developers do.


Now you can probably find numerous code bases where these types of approaches and others were not used where they should have been, these codebases will be larger and will require a case by case effort to learn something that could have been “compressed” and generalized  with better theoretical understanding.  It’s a trade off if you cannot hire or retain people who have the advanced skills an advanced codebase can also be a maintenance liability.


Scenario #2 high reuse


Now let’s create a hypothetical example, say we have a system that has many similar components and a brilliant programmer manages to distill this complexity into compact highly reusable code.  The problem is that our brilliant programmer is the only one who can really understand it.  Ideas similar to this are pondered by John Cook in his post "Holographic Code".  Now it might be the case that it code that is really only understandable to our brilliant programmer, or he may be the only one in the organization with that level of proficiency, also in this case we will assume that complexity was not artificially introduced, i.e. not unnecessary complexity, which is also a real problem that arises in software systems.  Also we will assume that we know that a less complex solution that is more verbose with less reuse exists, and in fact the solution of the maintainer developer might be to completely rewrite it in a less complex fashion.



As we have previously seen, the complete compressibility of a code base can never be achieved which relates heavily to aspects of both Information Theory and Algorithmic Information theory this results in  diminishing returns in terms of shrinking an algorithm or codebase.  My speculation here, based on observation, is that as a codebase is "compressed" its complexity increases as shown above.  Essentially you need to be increasingly clever to continue to increase reuse.  Now some of these reuse complexity issues may be due to our lack of understanding of complexity in relation to software construction.  It is feasible that things that we find hard to gain reuse with today may be easier in the future with advances in this understanding.


Another idea from the research covered in my previous post is that different domains exhibit different degrees of Favorability/Resistance to reuse. I would also suspect that the types of reusable components would also vary across different domains this makes me think that perhaps in the future reusability in different domains can be guided both qualitatively and quantitatively by either empirical or theoretical means or both. This means that you might have fore-knowledge of the "compressibility" of a "domain solution" and then you would be able to guide the development effort appropriately.


I feel the ideas here are a step back from a possible absolutist thinking to reuse.  Reuse like any important approach in software, for example test coverage, which can also exhibit diminishing returns, needs to be tempered and viewed as an optimization as to the extent of its practice.  In many respects that’s what good engineering is, working within constraints and optimizing your outcome within those constraints.