Elegant Coding: October 2011

22 October 2011

A Confederacy of Cargo Cult Coders

I hate to say it but I feel that one of the biggest problems with Software Engineering is the prevalence of programmers who are Cargo Cult Coders. Now I know this may seem a bit extreme but it is my feeling that many software projects suffer in part due to the fact that most developers are mediocre at best and there are quite a few developers some who are more senior who are "faking it" . So before you call me a cynic or worse, think about this, it’s pretty well known that really good developers are really rare and good developers are rare. So let’s assume for sake of argument that developer ability falls on the Normal Distribution, I am not asserting that it does, that means the vast majority of developers fall into the average category it also runs along the lines of the 80/20 rule in that most developers range from average to bad.

Now I have seen these types of developers a fair amount and this is one possible down side of frameworks such as Spring and Hibernate. If you read my blog you know that I am a big fan of frameworks and the concept in general, but they can be abused as can any other technique or methodology. The plus side of technologies like Spring and Hibernate is that good developers can quickly create large complex well engineered systems, of course mediocre and bad programmers can, by blindly implementing framework patterns, create large complex monstrosities. I have seen cases where through the use of cutting and pasting of example code, bad programmers can create working systems with a limited understanding of the underlying technologies, I often use googled code snippets myself, but at least I understand the fundamentals and if I don’t fully understand what I am doing at the time I try to go back and learn more about how the underlying technology works. Essentially these Cargo Cult Programmers are mostly just configuring boilerplate code and code snippets by trial and error within the confines of a framework.

On one of my previous contracts I had the misfortune to work with perhaps one of the worst developers I have ever met, he was truly a consummate cargo cult coder, he was musician turned programmer¹ with about eight years of professional experience. He had zero understanding of Computer Science fundamentals and seemingly no understanding of good Software Engineering principles and yet during his career he had accumulated enough basic knowledge of Web Design, Javascript, Java, Spring, Hibernate, SQL and various other technologies and components to cobble together enterprise applications. Ironically his ability to create these applications was actually somewhat impressive, as long as you didn’t look at the code. I think the scariest thing about this developer is that he had garnered a false sense of the level of his abilities which he would project, which in turn would cause others to falsely believe that he was a highly competent developer.

I use this particular developer as an example, but this is a trend that I am seeing, developers who now claim themselves as "senior level" because they have gained proficiency as cargo cult coders, but when you see their non boilerplate code it will often demonstrate an egregious lack of knowledge of basic concepts like cohesion, coupling, inheritance, basic OOP design, threading, the underlying workings of the framework technologies themselves, etc. I think the biggest danger is the complacency and perhaps self delusion or naivety which can even manifest itself as hubris, that these developers acquire from this limited perspective, in fact it is ultimately self limiting behavior because all new technologies and languages are then viewed in the same narrow context, usually as another resume bullet, practitioners of what I call RDD (Resume Driven Development).

I believe that you can boil down what makes a good developer to two relatively simple things, one is ability and the other is desire. Ability is a complex web of inherent skill, intellect including a good memory, experience etc. and desire is the hunger for knowledge and aspiration to want to improve one’s skill and the passion to find better ways to do things. In some ways the two go hand in hand but not always, the "consummate cargo cult coder" from above had a high degree of passion, but he was stubborn and did not work well with others and was unable to take advantage of the opportunity to benefit from the knowledge of others, I actually found his situation to be somewhat tragic. The other side is people who are talented who lack desire, ironically I have been accused of this one, it’s usually not due to my lack of desire, I just sometimes find what I am working on to be boring which in turn can cause me to lag in terms of productivity, fortunately this is fairly aberrant behavior for me.

Sadly I have recently been on some pretty dysfunctional teams and I feel the biggest tragedy is that on those teams there have been developers in whom I saw that they could be more than they were on the project, but they mainly lacked guidance, which I could not provide due to team’s dynamics. This type of situation is really a double tragedy, it is tragic for the developer in that a better opportunity for the developer is missed, and it is a missed opportunity for project, the team, and the management of the team as they could have a had a better developer who was producing better quality code. I guess what I am saying is that there should be a way to try to avoid wasting developer potential, I think this is some of what Agile Process tries to achieve. "Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done." I think this should be taken further to try maximize each developer’s skill and inspire their inherent passion and to kindle increased passion to maximize the efficiency of the team.

If you are lucky the cargo cult coders on your team are just plodding along and hopefully not messing up your codebase too much and are the ones who just need that extra guidance or encouragement. Unfortunately they can do far worse damage, there are many out there who have been getting by for many years and have formed overrated opinions of themselves, these are the worst, not only can they disrupt the software structure they can often disrupt the team dynamics and create more substantial problems, in my Driven Development post I mentioned developers who I was in conflict with creating a CDD (Cognitive Dissonance Development) environment, they were cargo cult coders who had taken very defensive and recalcitrant positions, you couldn’t reason with them because if you tried to talk about good design principles or standard practices you were speaking a language they did not understand, they only saw software development as learning some new technology that they could then claim to be experts in. They will pollute your system with redundant inconsistent shoddy code that will degrade its quality, performance and maintainability. Worst yet, they will be excessively defensive in the face of criticism, which makes code reviews hard if not impossible, and since they don’t read or understand good engineering principles they often argue against them. Also they will often disrupt the team dynamics and poison it with their defensive and sometimes arrogant behavior this can result in the team becoming highly dysfunctional which can create a toxic working environment that is hostile to good developers and is the opposite of the optimal environment in that the team will not function as whole that is greater than the sum of its parts it will function as whole that is less than the sum of its parts.

¹ I have worked with plenty of non-technical/non-CS converts and there are some who are good and go the extra mile to learn the basics and beyond.

18 October 2011

O(log(n))

I find O(log(n)) to be a very interesting algorithm complexity, it’s also highly desirable and if you achieve it you are probably doing well, or you are just using a couple of common time tested algorithms and data structures such as Binary Search or [Self] Balanced Binary Trees like Red Black or AVL trees. The order of Big O Complexities is:

O(1)	constant
O(log log n)	Double Logarithmic
O(log n)	Logarithmic
O(n^m) 0 < m < 1	fractional power
O(n)	Linear
O(n log n) = O(log n!)	Loglinear
O(n²)	Quadratic
O(n³)	Cubic
O(n^m) m > 1	(general) polynomial (m is a fixed, non-negative integer; e.g. n⁴, n⁵)
O(mⁿ)	exponential (m >= 2; ; e.g. 2ⁿ, 3ⁿ)
O(n!)	Factorial

When I was growing up and hand held calculators were quickly evolving much like smart phones are now, although perhaps not as rapidly, my father would bring these calculators home and I would play with them, this was my first exposure to logarithms when I got to high school I learned more about them including some of the identities. Actually I find the Logarithmic Identities to be quite interesting and have found them important in understanding logarithms which are useful in understanding other areas of science and math. I even used to use log base 10 to calculate the number of digits of numbers for print formatting way back before we had better ways to do number formatting^†, ⌈log₁₀(n)⌉, of any base 10 number is its number of digits, where ⌈x⌉ is the ceiling function. Also many common measurements like the Richter Scale, pH and Decibel among other things are logarithmic scale. Also we have previously encountered logs in relation to information entropy . Some useful logarithmic identities are:

y = log_b(x) if and only if x = b^y
log_b(1) = 0
log_b(b) = 1
-log_b(x) = log_b(1/x)
log_b(x*y) = log_b(x) + log_b(y)
log_b(x/y) = log_b(x) - log_b(y)
log_b(xⁿ) = n log_b(x)
log_b(x) = log_b(c) * log_c(x)
log_b(x) = log_c(x) / log_c(b)

The first one demonstrates the relation between the functions of logarithm and exponentiation, as you can see the value y, which is log base b of the value x, is the exponent to raise b to get the value x, so a logarithm is just a way to get the exponent. Log base 10 of 100 is 2, log₁₀(100) = 2, since 10 to second power is 100, 10² = 100. Log base b of y, x = log_b(y) is inverse function of raising b to the y power, x=b^y, so:

b^log_b(x) = x

Another interesting observation that one can draw from the logarithmic identities is the answer to the question: What is the difference between O(log₂(x)) and O(log₁₀(x))? It’s a little trick question that I like. The answer is they are the same. Using the second to last identity (#8):

O(log₂(x)) = O(log₂(10) * log₁₀(x))

Since log₂(10) is a constant:

O(log₂(x)) = O(log₁₀(x))

Pretty cool!

This would also work with the second to last identity (#9) as well and remember that Big O notation is about [asymptotic] growth, so equals in the case of Big O notation is not the same as equals e.g. log₂(x) ≠ log₁₀(x). Also for small quantities like the examples in this article, the base does make a difference.

Simlarily exponetiation has its identities as well:

b⁰ = 1
b¹ = b
b^-n = 1/bⁿ
b^mbⁿ= b^m+n
b^m/bⁿ= b^m-n
(b^m)ⁿ = b^mn
(b/b)ⁿ = bⁿ/bⁿ
(b/a)^-n = (a/b)ⁿ

The binary tree structure (a rooted acyclic graph) visually illustrates O(log(n)) and the inverse function relationship between logarithms and exponentiation. The diagram above shows a complete^‡ and therefore balanced binary tree structure. As you can see the number of items in each row grows exponentially as powers of two, shown in red, also the total number of elements in the tree as each row is added grows in the same exponential fashion, denoted in square braces. So in a balanced tree you will fill all of the n rows up to the 2ⁿ – 1 item, and when that number is exceeded (greater than or equal to 2ⁿ) a new (n + 1) row will be added. Now the growth is not necessarily exponential, you can add items at any rate, but the structure in which items are stored are broken down along “exponential row boundaries”. So to search 2⁴ – 1 (15) items for the number 7 value we would traverse 8-4-6-7, shown in green, which is 4 nodes, this is the maximum searches for this tree, a search can also be 1,2, or 3 depending on the depth of the item. Since 4 the exact exponent of the total size of the graph, which is [2⁴-1], and therefore the log: [log₂(2⁴-1) = 3.9…], almost 4, our max traversal, O(log(2ⁿ-1)) = O(log(2ⁿ)). Since we know that these two functions are inverses this illustrates O(log(n)) in visual terms.

The Binary Search algorithm has similar log base 2 characteristics, if you remember the algorithm it takes list which has to be ordered, and it repeatedly halves the length of the remaining items and checks the element at each new position, if it is equal you are done, if it is not then depending on whether the search value is greater or less than the midpoint value you then half the bottom or top of the list respectively and repeat the process. This is better illustrated by the following Java code:

public static int binarySearch(int value, int[] array) {

int min = 0;

int max = array.length - 1;

while (min <= max) {

int mid = min + (max - min) / 2;

if (value < array[mid])

max = mid - 1;

else if (value > array[mid])

min = mid + 1;

else

return mid;

}

return -1;

}

The following recursive code example is both more elegant and gives a more intuitive feel for the Divide and Conquer nature of the algorithm. Also it is this Divide and Conquer behavior that breaks the list apart logarithmically.

public static int binarySearch(int value, int[] array) {

returnbinarySearch(value, array, 0, array.length – 1);

}

public static intbinarySearch(int value, int[] array, int min, int max) {

if (max < min)

return -1;

int mid = min + (max - min) / 2;

if (array[mid] > value)

return binarySearch(value, array, min, mid - 1);

else if (key > array[mid])

return binarySearch(value, array, mid + 1, max);

return mid;

}

Let's view a search visually:

In our search for the value 7 in our 16 item list we first go to 8, which is too large, then to 4 which is too small and then to 6, and then 7, at each point in our search the size of what remains to be searched is half of the size of the previous search space.

In our visual example needed to search 4 positions (8, 4, 6, 7) and log₂(16) = 4 which is O(log(n)).

An interesting side note is that many binary search algorithms are broken due to possible overflow problems with integers, as pointed out in a blog post by Joshua Bloch, I account for the issue, you can read more about it here.

^† Due to computing a logarithm, this may not be optimally efficient.

^‡ Complete in this context means that all positions up to and including 2ⁿ-1 positions are filled, i.e. have a node. Also it should be noted that Self Balancing algorithms may not yield such a well balanced tree and it might have more rows for 15 items, but it is still O(log(n)).