Elegant Coding: August 2011

A couple of years ago, Tom Demarco published a brief article titled: "Software Engineering:An Idea Whose Time Has Come and Gone?". Now I know this is old news, and in it he claims that the role of measurement and hence control has not been a factor in some notable successful projects, he cites Google Earth and Wikipedia, now I cannot comment on those or many famous successful projects as I have no firsthand knowledge, however, I suspect that control has been important it just was not employed as he has anticipated and as he points out he is no longer involved in hands on development. So perhaps the premise of the famous quote "You can’t control what you can’t measure" is what is flawed. I believe that the majority of successful software projects and successful companies built around successful software have exhibited at least some if not a large degree of control over the both the software construction process and the software structure. I admit my assertion is tenuous at best as I have very little data to support this but I do have what I consider some interesting anecdotal support.

My own experience also tells me this is true, I am not well versed in the world of software metrics but what we have see m to be largely unwieldy and even potentially dubious (KSLOC) not to mention a lack of tooling and methodologies to apply them. Still I feel that I can look at code base and make "aesthetic" judgments about good and bad, cohesion and coupling are fairly abstract, understanding them can shed light on how to refactor code to improve it and to give a general indication of quality. In looking at code, both good and bad patterns emerge and can be utilized to assess and improve a code base. One thing I can often do is reduce code size by refactoring general code into a framework and refactor it to more effectively use API’s like the Java API and API’s like Apache’s StringUtil and DateUtil classes or things like effectively employing the Spring Framework’s data binding. Also code can often be better abstracted by composition, inheritance and general modularization and the use of patterns like the SOLID principles. These are all things that even the best developers occasionally miss, I often apply these refactoring to my own code as well, and if you are a good practitioner, what I am saying here is probably what you do every day.

While my personal control has mostly been limited to small and midsize projects and I can tell you that I have received some complements about my work to create cleaner more structured systems, my professional experience pales in comparison to that of others and I find their stories more interesting as you probably do also. I’ll start with Steve Yegge’s "Done, and Gets Things Smart" post where he talks, not about his role, but his observations about what went on at Google:

... but I've realized that one of the Google seed engineers (exactly one) is almost singlehandedly responsible for the amazing quality of Google's engineering culture. And I mean both in the sense of having established it, and also in the sense of keeping the wheel spinning. ... Done, and Gets Things Smart folks aren't necessarily your friends. They're just people you're lucky enough to have worked with.

At first it's entirely non-obvious who's responsible for Google's culture of engineering discipline: the design docs, audited code reviews, early design reviews, readability reviews, resisting introduction of new languages, unit testing and code coverage, profiling and performance testing, etc. You know. The whole gamut of processes and tools that quality engineering organizations use to ensure that code is open, readable, documented, and generally non-shoddy work.
But if you keep an eye on the emails that go out to Google's engineering staff, over time a pattern emerges: there's one superheroic dude who's keeping us all in line.

The black bolding is his and the blue bolding is my highlighting. Clearly this demonstrates a pretty high degree of control. The next comes from Martin Fowler’s "Who Needs an Architect?":

Architectus Oryzus [is the] kind of architect must be very aware of what's going on in the project, looking out for important issues and tackling them before they become a serious problem.
...
In many ways, the most important activity of Architectus Oryzus is to mentor the development team, to raise their level so that they can take on more complex issues. Improving the development team's ability gives an architect much greater leverage than being the sole decision maker and thus running the risk of being an architectural bottleneck. This leads to the satisfying rule of thumb that an architect's value is inversely proportional to the number of decisions he or she makes.

A similar sentiment is found in Lean Software Development: An Agile Toolkit By Mary Poppendieck and Tom Poppendieck:

Master Developers
In an extensive study of large system design, _[22] Bill Curtis and his coauthors found that for most large systems, a single or small team of exceptional designers emerge to assume primary responsibility for the design. Exceptional designers exercise leadership through their superior knowledge rather than bestowed authority. Their deep understanding of both the customers and the technical issues gain them the respect of the development team. Exceptional designers are people who are extremely familiar with the application domain and are skilled at communicating their technical vision to the development team. They are usually consumed with the success of their systems.
^[22]Curtis, Kransner, and Iscoe, "A field Study of the Software Design Process for Large Systems," 1272.

Another person who seems to support this is Joshua Block in his talk about API design, I have already blogged about that and the relevant quote is here. Also John Carmack of Id Software talks about controlling the software development process, among other things, and he makes many points that I think jibe with my control argument.

I my opinion these all show that good control does occur but it’s probably done more intuitively and it is more art than science and probably occurs without traditional metrics. That is why software projects can be insanely successful without measurement. Actually it is my experience that control and success is not a binary relation in that control does not unconditionally imply success but rather it is a continuum such that control increases the probability of success. Also I am not saying that software metrics is an idea whose time has come and gone, actually I believe quite the opposite, but that is another topic for another time. Another takeaway from these quotes is that having the right people and empowering them in that very critical role is a very important point in software management, again another entire topic.

In the ORM world, e.g. Hibernate, the term Object Graph is used quite a bit and it is a powerful notion, however, I suspect that Object Graph often gets used without much thought to some of the underlying concepts. Generally the term refers to an Object, usually a Domain Object which often contains other Domain Objects and/or Collections of other Domain Objects via composition. With an ORM like Hibernate very complex Object Graphs can be read, altered and persisted with very little work, it is a effective concept that can make building applications both easy and elegant especially if you have the proficiency and forethought to design your mappings properly.

In order to talk more intelligently about some of the underlying concepts regarding the Object Graph it is helpful to get a few fundamentals of Graph Theory out of the way, if you know them feel free to skip ahead. Actually this is a good opportunity to put them in a context that may be more intuitive for many developers. Graph Theory is an abstraction, a framework if you will, built on top of Set Theory, however, we don’t need to get bogged down in a Set based definition here, I do encourage you to pursue this on your own as it is extremely seminal and becoming increasingly essential. We just need to get four fairly simple graph concepts out of the way which can be explained easily and visually, they are: The difference between a simple graph and a multi graph, the difference between a cyclic graph and an acyclic graph, the definition of a directed graph, and the definition of a tree. Let’s start with the graphs:

As you can see from the above diagram we have several different types of graphs, now it should be noted that these attributes can overlap, the Undirected Graph contains cycles, so it is cyclic as does the Directed Graph, the Multi-Graph¹, by definition contains cycles so it is cyclic this one is also undirected. The Acyclic Graph is undirected. So these attributes can combine, except for an acyclic multi-graph which by definition can’t exist, I think.

In order to limit complexity for our discussion we are going to limit our object graph to the case of a directed acyclic graph, which is also known as a DAG. More specifically we will be considering a specific case of this known as a directed rooted tree, where edges have a direction away from the root. The following example shows a "rooted" DAG aka a tree:

As you can see the above tree is also a domain object graph it is also it is a directed graph where the direction indicates containment via composition, which can be thought of as a "has-a" relation. Now in some cases using ORM’s like hibernate one might include a mapping from Address back to Person so the ORM knows how to manage the relation and this is a cycle but it is an artifact created by the ORM since all accesses to this object graph by the application are from the root (Person node). Now that is not to say that there might not be other structural reasons for such a cycle, perhaps an application needs to hold a list of addresses, but needs to get some of the Person attributes, then in this use case the Address node becomes the "relative" root from which the application access occurs so the object graph would then have two roots and need a cycle to maintain that duality.

So in many cases in regards to the ORM entity model the object graph can be thought of as a tree, a rooted directed acyclic graph (DAG). There is one important divergence from traditional graph theory which is the presence of collections in graphs, these form a special node that can have combinatorial enumerative properties, in the case of the various types of collections also it can have certain mapping properties in the case of a Map object. An example of a modified person object that might hold a collection of addresses is:

There are a number of ways to expand on these ideas further but I think I will leave those for future posts.

¹ This is the graph that Euler came up with for famous the Seven Bridges of Königsberg problem. The bridges are shown in the top image.

Elegant Coding

21 August 2011

You Can Control What You Can’t Measure

19 August 2011

The Object Graph

About Me