05 February 2012

The Software Reuse Complexity Curve

I have come to the conclusion that Complexity and Reuse have a nonlinear relationship that has a parabolic shape, shown below, Complexity increases on the Y axis and reuse increases on the X axis.  Code complexity is high both with very low reuse and very high reuse forming a minima which might be viewed as an optimization point for reusability, now remember this is a very simple two dimensional graph, a real optimization scenario for this type of problem may include many dimensions and probably would not have a nice clearly defined single minima.

Scenario #1 Low Reuse

In a low reuse situation everything that could have been implemented using reusability is duplicated and may also be implemented inconsistently, so the code base will be larger which would require a maintainer to learn and know more code, also since there will be variations of varying degree between the similar components the maintainer would have to be constantly aware of these differences, which might be subtle, one might even find them confusing at times.

As we have previously seen reuse relates to entropy, so a low reuse system would have a large random, “uncompressed” codebase, which has high entropy, I have seen this type of system in my career and it is something I hate to work on.

Skill vs. Complexity

Before I go to the opposite end of the spectrum, which is probably more interesting, and perhaps more speculative on my part, I’d like to talk about the relationship of code complexity to maintainer developer skill.

I believe, and this really shouldn’t be a stretch, that there are some types of complex code that require certain higher level proficiencies and theoretical knowledge, an obvious choice is parsers and compiler construction.  A developer who works in this area needs to have a good understanding of a number disciplines, which include but are not limited to: Formal Language Theory and Automata Theory and possibly some Abstract Algebra, which sadly many developers lack.  If you had a project that had components that are any type of compiler or parser and needed someone to create and subsequently work on the maintenance of the code, they would need to know these things, they would need to understand this complexity in an organized theoretical way.  The alternative would be an ad hoc approach to solving these problems which would probably yield significant complexity issues of its own.

These types of things can be seen at simpler scale, a system that makes use of state machines or heavy use of regular expressions will exhibit more compact and more complex code.  The maintainers have to have a skill level to deal with this more complex approach, sadly not all developers do.

Now you can probably find numerous code bases where these types of approaches and others were not used where they should have been, these codebases will be larger and will require a case by case effort to learn something that could have been “compressed” and generalized  with better theoretical understanding.  It’s a trade off if you cannot hire or retain people who have the advanced skills an advanced codebase can also be a maintenance liability.

Scenario #2 high reuse

Now let’s create a hypothetical example, say we have a system that has many similar components and a brilliant programmer manages to distill this complexity into compact highly reusable code.  The problem is that our brilliant programmer is the only one who can really understand it.  Ideas similar to this are pondered by John Cook in his post "Holographic Code".  Now it might be the case that it code that is really only understandable to our brilliant programmer, or he may be the only one in the organization with that level of proficiency, also in this case we will assume that complexity was not artificially introduced, i.e. not unnecessary complexity, which is also a real problem that arises in software systems.  Also we will assume that we know that a less complex solution that is more verbose with less reuse exists, and in fact the solution of the maintainer developer might be to completely rewrite it in a less complex fashion.

As we have previously seen, the complete compressibility of a code base can never be achieved which relates heavily to aspects of both Information Theory and Algorithmic Information theory this results in  diminishing returns in terms of shrinking an algorithm or codebase.  My speculation here, based on observation, is that as a codebase is "compressed" its complexity increases as shown above.  Essentially you need to be increasingly clever to continue to increase reuse.  Now some of these reuse complexity issues may be due to our lack of understanding of complexity in relation to software construction.  It is feasible that things that we find hard to gain reuse with today may be easier in the future with advances in this understanding.

Another idea from the research covered in my previous post is that different domains exhibit different degrees of Favorability/Resistance to reuse. I would also suspect that the types of reusable components would also vary across different domains this makes me think that perhaps in the future reusability in different domains can be guided both qualitatively and quantitatively by either empirical or theoretical means or both. This means that you might have fore-knowledge of the "compressibility" of a "domain solution" and then you would be able to guide the development effort appropriately.

I feel the ideas here are a step back from a possible absolutist thinking to reuse.  Reuse like any important approach in software, for example test coverage, which can also exhibit diminishing returns, needs to be tempered and viewed as an optimization as to the extent of its practice.  In many respects that’s what good engineering is, working within constraints and optimizing your outcome within those constraints.

No comments:

Post a Comment