Showing posts with label Software Frameworks. Show all posts
Showing posts with label Software Frameworks. Show all posts

13 May 2012

Software System Morphology

On the Importance of Naming in Software Systems, part III

In my previous posts on naming "A Survey of the Conventional Wisdom on Software Naming" and "More Thoughts on Formal Approaches to Naming in Software" I discussed several ideas about defining a way to talk about software system naming, I feel the name "Software System Morphology" captures this "big picture" concept. My objective is to develop a comprehensive way to define and analyze the semantic nature (morphology) of the construction of a software system. It’s an ambitious undertaking and whether or not I succeed in this endeavor I am hoping that these ideas will contribute to a better solution to this problem.

To review, I am defining three attributes that a name of signifier can have, the name scope context, i.e. where the name is used or occurs, the type or class of the system referent signified by the name, and the lexical structure of the name i.e. the n-gram of name units.  I feel that these three qualities can be used to determine and possibly measure aspects of the morphological structure of software and by determining the morphological structure of a software system that structure might be "comprehended" and refined which could yield higher quality software. 

In Linguistics a morpheme is the smallest unit of meaning and it is considered, by some, to be a complete sub-discipline of the study of Linguistics. I admit that I am in the process of learning more about it, I am a few chapters into What is Morphology by Mark Aronoff and Kirsten Fudeman it is an excellent book for linguistic neophytes like me.  Software System Morphology will both be divergent and dependent on natural language morphological concepts, which are quite complex.  However a general understanding of linguistic morphology will make these ideas more accessible, I suggest looking into it on your own, it’s fascinating and worth it solely on its own merits.  A quick example of English Morphology can be given with the morphologically complex word autobiography which consists of the morphemes: auto/bio/graph/y, where [auto] means "self", [bio] means "life", [graph] means "drawn" and [y] means "characterized by".

In my first post of this series I explored various ideas proposed by a few prominent software engineers, one idea that I discussed was the use of what were denoted as qualifiers which included the scope qualifier, the type qualifier and the computed-value qualifier. These qualifiers which are prefixed or suffixed to names can be thought of more generally as morphemes. So we can now incorporate those ideas into a more general way of looking at meaning in software naming. Now the natural conclusion, if you have been paying attention, may be to assume that the name unit concept will map to the morpheme concept, it does not, at least in terms of only software morphology and this is in interesting distinction, software morphemes may consist of n-grams of name units which can be natural language morphemes two examples are CommandPattern and MedicalRecord where separating these into individual name units potentially become meaningless in certain software contexts.

Software System Morphology is primarily influenced by three things, the semantic nature of the problem domain(s), the semantic nature of the solution domain(s), which include languages ORMs, paradigms, design patterns, data structures, and finally the natural language(s) in which both the problem domain and solution domains are expressed. The previous example items CommandPattern and MedicalRecord illustrate these ideas as well.  

Ok now let’s take our concepts further with a more detailed example.  Assume we have a Hospital Medical system which has the following problem domain concepts: [Medical Record, Patient, Visit, Doctor, Hospital, Address, Invoice]. Obviously there are more. For the solution domain we will assume a JEE/Spring style implementation which might include the following concepts: [Entity, Dao, ServiceLayer, Controller, Map, View].  Now some expected names (signifiers) might be: MedicalRecordEntity, PatientEntity, PatientServiceLayer, PatientDao, PatientController, PatientView, VisitEntity, VisitDao, VisitServiceLayer, VisitController, VisitView, PatientVisitMap, PatientHospitalVisitor,... Now as you can see the names are composed of morphemes from both the problem domain and the solution domain.  This is to be expected as each name tells what it is/does in the problem domain and what it is/does in the solution domain. In fact any developer with knowledge of JEE Spring Web application developer can look at these names and know what to expect and where they fit in the system.  Additionally it is easy to look at the above set of domain concepts and anticipate named system referents that I omitted.

Previously I pointed out that the idea referred to as a qualifier is really a morpheme and while this is true the term qualifier seems to denote a more contextual setting in terms of the domain concepts.  In the above examples the solution domain morphemes act as qualifiers telling where and how each named system referent fits into the solution Entity tells that it is a domain object, Dao tells that it relates to persistence, ServiceLayer tells that one would expect to see a separation of concerns that would put the business logic here. Structurally one would expect the domain objects (with the Entity suffix) to be passed back from the Dao’s and ServiceLayer’s and there might be transformative manipulation or aggregation of the domain objects in the ServiceLayer.  The idea of qualification is meaning based but in this case it also indicates structural aspects of the solution domain. In this example I intentionally use the Entity suffix, in many systems the domain concepts would be used alone i.e. [Patient, Visit], I have mixed feelings about this in one sense it can be viewed as potentially unnecessary, in another sense it represents a qualifier that can possibly improve comprehension. Qualifiers do make names longer and while I tend to favor longer names it can become problematic if they get too long.  These ideas towards naming are powerful for building comprehensible maintainable code as you can see from the above name examples that they follow a consistent pattern, admittedly these are fairly easy naming tasks, but you would be surprised how many developers fail to even do these types of things consistently.

Unlike others who advocate specific practices in regards to naming I feel that naming should be dealt with in a flexible framework that can be adapted to a specific project’s needs.  While I personally agree and disagree with ideas previously discussed, my philosophy is that naming should be done consistently and comprehensively reflecting deliberate choices using conceptual understanding and analysis of the Domains. Also it should be tracked in a Lexicon, an idea that I am working on and hope to explore in a later post.  Also as I have said before I feel that tooling could be used to allow better conceptual understanding and navigation and the use of analytics and possibly visualization could be used to ease naming decisions in practice by providing improvements to both static analysis and interactive autosuggest/intellisense aids.

I feel that these ideas are an attempt to create a more "formalized" approach to what Eric Evan’s does in Domain Driven Design which also covers a number of other structural ideas, I hope to extend my ideas here to better integrate some of those as well.  The next steps for me are to undertake an analytic approach I also hope to explore the application of ideas like formal concept analysis and spectral graph theory techniques which may add some mathematical rigor to these ideas and perhaps allow some empirical validation and probably refinement. Now it gets really hard so it might be a while for those follow-ups.


29 April 2012

What is Generic Programming?



Over the course of my career I have strived to write reusable code, always looking for more general patterns and trying to refactor code to more flexible more generic versions sometimes with mixed success.  Lately it has become something of an obsession of mine in that I both try to employ it and have been endeavoring to learn more about it, what it really is, and how to do it right, as I have previously pondered I believe there is a complexity cost with increased reuse.  Over the last few years, which I have been programming in Java, generics has become a central tool in achieving reuse.  So such a concept comes to mind but it is just one idea in a larger framework of ideas on generic programming.


There is a fair amount of research and discussion of the ideas pertaining to generic programming, the title of this post is taken directly from a paper on the topic, which perhaps not so coincidently was presented at the same OOPSLA (2005) as the paper on Software Reuse and Entropy that I previously discussed.  It just seems like every idea I explore is super-interrelated to everything else and this post will be no exception, I can conceptually connect these two papers using a Kevin Bacon like approach using just two other research papers, ideas I have for other posts.  I will try to keep this one as focused as possible.  Also the referenced paper is, like the Software Framework paper, very math heavy but different math, Category Theory,Order Theory, Universal Algebra, etc. This post will focus on more high level concepts and not the details of math that I wish I could claim that I fully understand.


So back to our question, what is generic programming?   For me it is about reuse, and while the information entropy approach to understanding reuse was about reuse from perspective of measure and the reuse (measurable) variation on different domains, I feel that the generic programming concept is about the structure of reuse and how it is implemented in various programming paradigms but the concept itself transcends language specific interpretations and implementations.


As you can tell I am not sure how to exactly answer this question so I will defer to the experts.


"What is generic programming?" in part outlines the paradigmatic distinctions between mainstream OO languages (Java Generics, C++ with STL) and Functional languages (Haskel, Scheme):


As for most programming paradigms, there are several definitions of generic programming in use. In the simplest view generic programming is equated to a set of language mechanisms for implementing type-safe polymorphic containers, such as List<T> in Java. The notion of generic programming that motivated the design of the Standard Template Library (STL) advocates a broader definition: a programming paradigm for designing and developing reusable and efficient collections of algorithms. The functional programming community uses the term as a synonym for polytypic and type-indexed programming, which involves designing functions that operate on data-types having certain algebraic structures.

In "Generic Programming" written in 1988 by David A. Musser and Alexander A. Stepanov use Ada and Scheme and describe generic programming as:


By generic programming, the definition of algorithms and data structures at an abstract or generic level, thereby accomplishing many related programming tasks simultaneously.  The central notion is that of generic algorithms, which are parameterized procedural schemata that are completely independent of the underlying data representation and are derived from concrete efficient algorithms.

...

A discipline that consists of the gradual lifting of concrete algorithms abstracting over details, while retaining the algorithm semantics and efficiency.


That last sentence has a very Agile/Refactoring feel to it.


Ralf Hinze in "Generic Programs and Proofs" describes it as: 


A generic program is one that the programmer writes once, but which works over many different data types.

...

Broadly speaking, generic programming aims at relieving the programmer from repeatedly writing functions of similar functionality for different user-defined data types. A generic function such as a pretty printer or a parser is written once and for all times; its specialization to different instances of data types happens without further effort from the user. This way generic programming greatly simplifies the construction and maintenance of software systems as it automatically adapts functions to changes in the representation of data.

...

So, at first sight, generic programming appears to add an extra level of complication and abstraction to programming. However, I claim that generic programming is in many cases actually simpler than conventional programming.  The fundamental reason is that genericity gives you "a lot of things for free" ...


This last statement resonates with me as it parallels my thoughts on the desirability of using software frameworks and I feel that I continuously have this conversation (argument) with junior level programmers and also senior level cargo cult coders who don’t get it.


Design Patterns are a common generic concept in mainstream programming.  It turns out that a number of design patterns can be described in the framework laid out by these papers and a number of papers do exactly that including "Design Patterns as Higher-Order Datatype-Generic Programs " part of extensive research on the topic by Jeremy Gibbons:


Design patterns, as the subtitle of the seminal book has it, are ‘elements of reusable object-oriented software’. However, within the confines of existing mainstream programming languages, these supposedly reusable elements can only be expressed extra-linguistically: as prose, pictures, and prototypes. We believe that this is not inherent in the patterns themselves, but  evidence of a lack of expressivity in the languages of today. We expect that, in the languages of the future, the code parts of design patterns will be expressible as directly-reusable library components. The benefits will be considerable: patterns may then be reasoned about, typechecked, applied and reused, just as any other abstractions can. Indeed, we claim that the languages of tomorrow will suffice; the future is not far away. All that is needed, in addition to what is provided by essentially every programming language, are higher-order (parametrization by code) and datatype-generic (parametrization by type constructor) features. Higher-order constructs have been available for decades in functional programming languages such as ML and Haskell. Datatype genericity can be simulated in existing programming languages [2], but we already have significant experience with robust prototypes of languages that support it natively [2]. We argue our case by capturing as higher-order datatypegeneric programs a small subset ORIGAMI of the Gang of Four (GOF) patterns. (For the sake of rhetorical style, we equate ‘GOF patterns’ with ‘design patterns’.) These programs are parametrized along three dimensions: by the shape of the computation, which is determined by the shape of the underlying data, and represented by a type constructor (an operation on types); by the element type (a type); and by the body of the computation, which is a higher-order argument (a value, typically a function).  Although our presentation is in a functional programming style, we do not intend to argue that functional programming is the paradigm of the future (whatever we might feel personally!).  Rather, we believe that functional programming languages are a suitable test-bed for experimental language features - as evidenced by parametric polymorphism and list comprehensions, for example, which are both now finding their way into mainstream programming languages such as Java and C#.  We expect that the evolution of programming languages will continue to follow the same trend: experimental language features will be developed and explored in small, nimble laboratory languages, and the successful experiments will eventually make their way into the outside world. Specifically, we expect that the mainstream languages of tomorrow will be broadly similar to the languages of today — strongly and statically typed, object-oriented, with an underlying imperative mindset — but incorporating additional features from the functional world — specifically, higher-order operators and datatype genericity

Lately I have been making more of an effort to learn Scala. Bruno Oliveira and Jeremy Gibbons make the case in "Scala for Generic Programmers" that it might be one of or even the best language for expressing Generic Programming concepts in part due to its hybridization of object oriented and functional paradigms. Here is the abstract in its entirety:


Datatype-generic programming involves parametrization by the shape of data, in the form of type constructors such as ‘list of’.  Most approaches to datatype-generic programming are developed in the lazy functional programming language Haskell. We argue that the functional object-oriented language Scala is in many ways a better setting. Not only does Scala provide equivalents of all the necessary functional programming features (such parametric polymorphism, higher-order functions, higher-kinded type operations, and type- and constructor-classes), but it also provides the most useful features of object-oriented languages (such as subtyping, overriding, traditional single inheritance, and multiple inheritance in the form of traits). We show how this combination of features benefits datatype-generic programming, using three different approaches as illustrations.

So far for me Scala has been a joy.  Its features and flexibility are beautifully elegant of course I am tainted coming from the clumsy, clinking, clanking, clattering caliginous world of Java, BTW Java we need to talk, it’s not you, it’s me. I’ve met someone else.


Jeremy Gibbons Makes the following point in "Datatype-Generic Programming":


Generic programming is about making programming languages more flexible without compromising safety. Both sides of this equation are important, and becoming more so as we seek to do more and more with computer systems, while becoming ever more dependent on their reliability.

This idea is similar to ideas that Gerald Sussman explores in his talk at Strangeloop 2011 "We Really Don’t Know How To Compute".  However, his general and perhaps philosophical approach to the idea of generic programming is very different from the previous ideas:    


[8:40]We spend all of our time modifying existing code, the problem is our code is not adequately evolvable it is not adequately modifiable to fit the future in fact what we want is to make systems that have the property that they are good for things that the designer didn’t even think of or intend.  ... That’s the real problem. ...  The problem is that when we build systems whatever they are we program ourselves into holes. ...  It means that we make decisions early in some process that spread all over our system the consequences of those decisions such that the things that we want to change later are very difficult to change because we have to change a very large amount of stuff.  ... I want to figure ways that we can organize systems so that the consequence of the decisions we make are not expensive to change, we can’t avoid making decisions but we can try to figure out ways to minimize the cost of changing decisions we have made.

...

[11:40]Dynamically extensible generic operations  ...  I want to dynamically extend things while my program is running  ... I want a program to compute up what it is going to do.


He shows an example[19:00] which is "the definition of the method of computing Lagrange Equations from a Lagrangian given a configuration space path". This is apparently easy stuff, to him at least.  Fortunately you don’t really need to understand the details of this math to understand his points here:


[20:45]This is a little tiny feeling of what is it needed to make things more flexible consider the value here, the value is that I am dynamically allowed to change the meaning of all my operators to add new things that they can do. Dynamically. Not at compile time, at runtime.

...

[21:15]It gives me tremendous flexibility. Flexibility because the program I wrote to run on  a bunch of numbers with only a little bit of tweaking suddenly runs on matrices so long as I didn’t make mistake of commuting something wrong. ... So this makes proving the theorems very hard In fact maybe impossible the cost and the benefits are very extreme. I can pay  correctness or proofs of correctness or belief in correctness, I can pay that for tremendous flexibility. ... Is correctness the essential thing I want, I am not opposed to proofs, I love proofs I am an old mathematician. The problem is that putting us into the situation that Mr. Dijkstra got us into ... where you are supposed to prove everything to be right before you write the program getting into that situation puts us in a world where we have to make the specifications for the parts as tight as possible because it’s hard to prove general things, except occasionally it’s easier but it’s usually harder to prove a general theorem about something so you make a very specialized thing that works for a particular case you build this very big tower of stuff and boy is that brittle change the problem a little and it all falls over.


His ideas tend towards adaptable and self modifying code at one point he talks about self modifying code in both assembly language and the use of eval in Lisp.  He also touches on a number of parallels in the biological world including genetics and the adaptability of the human immune system to fight off mutating parasites.  Interestingly some of his ideas seem similar to work done by Stephanie Forrest on using evolutionary/genetic algorithms for program repair which she speaks about at the keynote of SPLASH 2010, formerly known as OOPSLA.  The genetic repair approach at the conceptual level is striving for the same results as what Gerald Sussman describes.  Also did you notice that the Levenshtein Distance between "generic programming" and "genetic programming" is (one substitution), pretty weird!


Seriously though, the objectives of all of these approaches are the same: building better more flexible software that handles more situations. These ideas are reflected in how Glenn Vanderburg describes software engineering:


Software engineering is the science and art of designing and making, with economy and elegance, systems so that they can readily adapt to situations which they may be subjected.

I feel this Software Engineering insight reinforces my opinion that we will eventually see a real math backed discipline of software engineering and I think that the research work discussed  here both supports my conjecture on this point and most likely lays part of that foundation.  I also think some of these ideas reflect ideas I discussed in "The Software Reuse Complexity Curve" which implies that programmers will need to become more aware and more adept at more advanced theoretically based concepts like for example GADT’s (Generalized algebraic datatypes), etc. and of course math in general such as Abstract Algebra.


Clearly the idea of Generic programming has some very specific ideas and some very general objectives. There seem to be two approaches here which might described Predictive vs. Adaptive, much like the idea of Separation of Design and Construction described by Martin Fowler.  The Gibbons et al more calculational approach is predictive whereas the Sussman/Forrest ideas describe an adaptive approach. I suspect that this will just be another dimension of program design that will have to be considered and assessed as a tradeoff in its application, which Sussman also mentions.

25 March 2012

More Thoughts on Formal Approaches to Naming in Software

On the Importance of Naming in Software Systems, Part II


In my last post on naming I surveyed and attempted to categorize the conventional wisdom in regards to naming. Here I will expand on some of my ideas and extend my naming domain vocabulary.


A software system is composed of many things and most of them if not all have names.  In order to talk about this I realized that I need a term to describe the things that have names, Steve McConnell makes the comment "a variable and a variable’s name are essentially the same thing".  While this is effectively true, named things in a system have different types, uses, etc.  In trying to come up with a name for these named things, I felt that I could do better than the phrase "named things". For the word "things" I contemplated item, object, entity, artifact, and more. These all had various connotations and possible contexts which confused the meaning. I turned to linguistics for the answer, and found the following on Wikipedia under intension, the bolding is mine:


The meaning of a word can be thought of as the bond between the idea or thing the word refers to and the word itself. Swiss linguist Ferdinand de Saussure contrasts three concepts:



So the word referent seems to be a good choice as it probably has no context with relation to software engineering.  To take this further, I will use the expression Named System Referent or possibly just System Referent to refer to those things in a software system that have a name.  To be clear here a name is what is referred to above as a signifier but I will just use name.  A System Referent is the thing (referent) in the Software System that the name refers to.  If this is confusing hopefully some examples will help.  Some System Referents include but are not limited to the following1:

  • Classes
  • Variables (local, instance, static)
  • Methods
  • Method Parameters
  • Packages
  • Database Tables, Columns, Triggers, Stored Procedures, etc.
  • HTML Files, CSS Files, Javascript Files, Config Files, all files
  • Directories
  • Urls
  • Documents
  • XML Elements and Attributes
  • CSS Classes

And the list goes on.


Another observation we can make is that System Referents can have a context or a grouping, for example System Referents which include classes, packages, variables, methods, etc. may have different conventions from other System Referent groupings such as database System Referents.


Now that we have a way to talk about named items, we can explore the very important idea of naming conventions.  The name of a System Referent often consists of several components, from a linguistic stand point these components are similar to, but not the same as a lexical unit or a morpheme, for this I am choosing the term Name Unit, for example an identifier "personId" consists of two Name Units, "person" and "id". Name units imply a separator mechanism. In this example the use of CamelCase, another example is "person_id" where the name units are separated by the separator "_".  So a name unit is that atomic piece of a name that is separated by a separator and a separator is either explicit e.g. "_" or implied e.g. CamelCase. In my previous post we looked at different prefix and suffix qualifiers, this is a generalization of the qualifier concept which deals with the whole name and not just one part of it.


I wish to compare, from the structural perspective, two common conventions one for the Java System Referent grouping and that of an SQL Database grouping.  For the Java Naming convention we will assume the standard Java naming convention.  For the database we will limit ourselves to Tables and Columns and assume uppercase Names, name units that consist only of upper case characters, the separator will be an underscore "_".


We can define formal a more formal approach to the lexical structure of a naming convention using a BNF, or in this case my own variant extended BNF, for example our BNF for variable names in Java might look like the following:


name-unit-terminal ::= [A-Z]+[a-zA-Z0-9]*

name-unit-prefix-terminal ::= [a-z]+[a-zA-Z0-9]*

<name> ::=  name-unit-prefix-terminal | name-unit-prefix-terminal <name-unit>

<name-unit> ::= name-unit-terminal | name-unit-terminal <name-unit>


The Naming convention for our database column names and table names:


name-unit-terminal ::= [A-Z]+[A-Z0-9]*

<name> ::=  name-unit-terminal | name-unit-terminal "_" <name>


Two Concrete examples illuminating these are as follows:


public class Person {

private Long id;

private String lastName;

private String firstName;

private String middle;

private String phone;

private String email;

public Long getId() {

return id;

}

public void setId(Long value) {

id = value;

}

public String getLastName() {

return lastName;

}

public void setLastName(String value) {

lastName = value;

}

public String getFirstName() {

return firstName;

}

public void setFirstName(String value) {

firstName = value;

}

public String getMiddle() {

return middle;

}

public void setMiddle(String value) {

middle = value;

}

public String getPhone() {

return phone;

}

public void setPhone(String value) {

phone = value;

}

public String getEmail() {

return email;

}

public void setEmail(String email) {

this.email = email;

}

}


 


CREATE TABLE PERSON (

ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

LAST_NAME VARCHAR(40) NULL,

FIRST_NAME VARCHAR(20) NULL,

MIDDLE VARCHAR(20) NULL,

PHONE VARCHAR(20) NULL,

EMAIL VARCHAR(256) NOT NULL

);


Naming Conventions consist of both a Lexical Component and a Semantic Component. In this post we have focused on the lexical aspects of naming, from the intension definition above we focused on the signifier and the referent. To really create a comprehensive approach the Semantic aspect cannot be ignored. In the intension definition above this can be viewed as focusing on the signified and the referent. This is an area that is explored in the idea of the Ubiquitous Language within the realm of  Domain Driven Design.


The Semantic Elements of a system and of a naming convention are more complex and ambiguous. I do think that software systems should have a System Lexicon which would be a more concrete formalization and extension of the System’s Vocabulary or Ubiquitous Language this would also incorporate ideas encapsulated in a Data Dictionary.  A System Lexicon would be an entity of some complexity and would require some commitment to build also applying it could be problematic and might require tooling such as an IDE interactive name suggestion feature or a static analysis component or both. It might even require an ontology or taxonomy oriented graphical tool for constructing and maintaining the System Lexicon depending on the size and complexity of the system.  I will get into some ideas that extend the lexical aspects and move into the Semantic aspects in future posts.  I know that this might seem pretty grandiose but I think this powerful stuff that could lead to better approaches to create higher quality software.


1This list is biased to Java Web development, but these ideas should be general enough to span any software system.

19 March 2012

A Survey of the Conventional Wisdom on Software Naming

On the Importance of Naming in Software Systems, Part I





Naming is a very important aspect of software, however, its importance and relevance to building quality software is often overlooked or ignored all together.  I feel that naming needs to be part of a larger more formal framework for building software.  In previous posts I have advocated a framework approach to capturing reuse and standardizing software design and implementation and I feel that naming is a crucial part of such an approach. Previously I included the following quote about naming in an API design talk by Joshua Bloch:


Names matter a lot, there are some people that think that names don’t matter and when you sit down and say well this isn’t named right, they say don’t waste your time let’s just move on, it’s good enough. No! Names, in an API, that are going to be used by anyone else that includes yourself in a few months mater an awful lot. The idea is that every API is kind of a little language and people who are going to use your API needs to learn that language and then speak in that language and that means that names should be self explanatory, you should avoid cryptic abbreviations so the original Unix names, I think, fail this one miserably.

He is not the only prominent software engineer to advocate an attention to naming, in this post I will summarize and discuss several sources.  I will start with his material which is part of his talk on API design which I summarized in my previous post.  Actually I feel that most if not all of what he says on API design is applicable to software in general including what he has to say on naming. The summary points are as follows:


Names should be self explanatory.

Avoid cryptic abbreviations.

Don’t have multiple words meaning that same thing.

Names should be consistent – same word means same thing.

Be regular–strive for symmetry. This implies a larger context of naming and a relationship between concepts.

Code should read like prose. (As a result of good API design)


Chapter two titled "Meaningful Names" of Robert Martin’s Clean Code, written by Tim Ottinger, is dedicated to naming. A summary of the advice is as follows:


Use Intention Revealing names

Avoid Disinformation – This is a corollary to Use Intention Revealing names. This means that the components of the name actually conceptually match the intent.

Make Meaningful Distinctions – This means Pick One Word per Concept.

Use Pronounceable Names. This means Avoid cryptic abbreviations.

Use Searchable Names. This is mainly about not using single character names and magic constants.

Avoid Encodings. This means do not use suffixes or prefixes to encode type or status as a member "m_".  He also discourages using "I" as a prefix to an interface. This includes avoiding Hungarian Notation.

Avoid Mental Mapping. This is an anti pattern. It is a corollary to Use Intention Revealing Names.

Class Names should consist of nouns of verb Phrases.

Method Names should consist of verbs or verb Phrases.

Pick One Word per Concept. 

Use Solution Domain Names. Solution domain names refer to things like design patterns or more distinct concepts such as Queue or Binary Tree

Use Problem Domain Names – This is pretty straight forward, if your system processes medical records, you might expect a class called MedicalRecord.

Add Meaningful Context – This deals with the idea of adding suffixes or prefixes to denote additional context information in the name.  The example given is use addrState vs. state.

Don’t Add Gratuitous Context – This talks about the idea of adding unnecessary or too specific suffixes or prefixes to denote context.

Choosing good names requires good descriptive Skills and Shared cultural background


Steve McConnell’s Code Complete contains the following advice1:


A name should fully and accurately describe what the variable represents.

A good mnemonic name generally speaks to the problem rather than the solution. A good name tends to express the what more than the how. In general, if a name refers to some aspect of computing rather than to the problem, it's a how rather than a what. Avoid such a name in favor of a name that refers to the problem itself.

Use pronounceable names.

Avoid misleading names or abbreviations and use consistent abbreviations.

Avoid names that are misleading.

Avoid names with similar meanings.

Avoid variables with different meanings but similar names

Name Specific Types of Data using Standardized Prefixes – Use prefixes to denote (encode) type such as Hungarian Notation.

Use computed-value qualifiers if needed at the end of the name. Examples are: Total, Sum, Average, Max, Min, Record, String, or Pointer.

Use a Naming Convention and that naming convention should be compatible standard conventions for the language.   Naming convention should distinguish among local, class, and global data.  Naming convention should distinguish among type names, named constants, enumerated types, and ­variables.

Good variable names are a key element of program readability. Specific kinds of variables such as loop indexes and status variables require specific considerations.

Names should be as specific as possible. Names that are vague enough or general enough to be used for more than one purpose are usually bad names.

Naming conventions distinguish among local, class, and global data. They distinguish among type names, named constants, enumerated types, and variables.

Abbreviations are rarely needed with modern programming languages. If you do use abbreviations, keep track of abbreviations in a project dictionary or use the standardized prefixes approach.



I did reword some of these to make them more stand alone. You may want to pursue the sources to gain a fuller picture of each author’s perspective.  The interesting thing here is while there are several key common themes to these approaches there are some very explicit contradictions. For example Clean Code recommends against Type encoding like Hungarian Notation while Code Complete recommends it.  Conversely Code Complete seems to implicitly recommend against using Solution Domain Names "speaks to the problem rather than the solution" whereas Clean Code advocates their use.

I feel that when talking about naming it is easy to conflate ideas pertaining to structural (lexical) concerns and ideas pertaining to domain (semantic) concerns.  Unfortunately while these ideas are separate they are also intertwined and an intelligent approach to naming needs to recognize this.  

One structural theme is the idea of qualifiers, which Code Complete explicitly names, but the idea appears implicitly in Clean Code as well.  Qualifiers in this context are generally prefixes and suffixes that qualify the name. Qualifiers mentioned above include scope qualifiers which denote the scope of a variable such as the prefixes "m_" for member or "global_" for global. Hungarian Notation in this context2 is an example of a type qualifier. Additionally Code Complete defines computed-value qualifier suffixes to denote values which are computed values these include: Total, Sum, Average, Max, Min, Record, String, or Pointer.  Interestingly qualifiers are a structural element of a name which denotes additional specific types of semantic meaning to the name. The structural and semantic construction of names is a topic I intend to continue exploring in subsequent posts.

The previous ideas address both structural (lexical) and domain (semantic) concerns now I wish to look at a targeted summary of more semantic oriented conceptual advice in Eric Evan’s Domain Driven Design:


Domain model terms are part the UBIQUITOUS LANGUAGE.

In regards to an INTENTION-REVEALING INTERFACE, name classes and operations to describe their effect and purpose without reference to the means by which they do what they promise. This relieves the client developer of the need to understand the internals. These names should conform to the UBIQUITOUS LANGUAGE so that team members can quickly infer their meaning.

In regards to MODULES aka Packages, If your model is telling a story your MODULES are chapters. The name of the MODULE conveys its meaning.  These names enter the UBIQUITOUS LANGUAGE.

Name each BOUNDED CONTEXT and make the names part of the UBIQUITOUS LANGUAGE.


In Domain Driven Design the idea of conceptually developing and implementing the Domain iteratively is explored.  This book reveals many approaches and concepts pertaining to that topic, the ideas that I am interested in which are targeted in the quotes above pertain to naming.  A common theme in naming which would be hard to dispute is to Use Problem Domain Names. The idea of a UBIQUITOUS LANGUAGE is promoted in the book as a way to help define the domain so that all stakeholders have a common way to talk about the domain and its representation in the resulting software.  Domain Driven Design promotes using the UBIQUITOUS LANGUAGE as a theme to defining the system and unifying the system documentation.  To me a UBIQUITOUS LANGUAGE implies the need to track, define and attempt to codify it over time which I think implies a more formal way to document it like a lexicon is needed, an idea I will follow up on more in a future post.

Another issue with talking about software naming is the idea of context.  Names pretty much always exist in a context, in Java a variable can be in the context of a method or a class. Both methods and classes themselves can exist in the context of classes and classes exist in the context of classes and packages and packages are themselves hierarchical. In fact software systems have developed the concept of Namespace to in part deal with this problem.  

Each name exists in a context, and in order to talk about naming we need a language just as you need to define a language for a domain. Actually naming is a domain, a meta-domain perhaps, a domain that describes other domains.  This means we need a "UBIQUITOUS LANGUAGE" for the domain of naming. So in that language I am going to define name scope context to define to context or "hierarchical place" in which a name occurs. I am purposefully not using namespace since that has other explicit meanings.  I know using the words scope and context seem redundant but I felt using "scope" alone didn’t work because it has a specific meaning and context by itself was too vague, using name scope context can refer to a broader set of circumstances such as an attribute’s position in an xml document relative to its parent element or a SQL Column name relative to its parent table, a file’s directory and so on.

An interesting thing I have noticed about name scope context is that it can be redundant.  For example let’s assume that we have an object named a database table named PERSON and a corresponding object named Person.  I have seen the following approach to naming these:


CREATE TABLE PERSON (

    PERSON_ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

    LAST_NAME VARCHAR(40) NULL,

    FIRST_NAME VARCHAR(20) NULL,

    MIDDLE VARCHAR(20) NULL,

    PHONE VARCHAR(20) NULL,

    EMAIL VARCHAR(256) NOT NULL,

    DELETED CHAR(1) NOT NULL DEFAULT 'N',

    CREATE_DATE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    MODIFY_DATE TIMESTAMP NULL

)


 


class Person {

    private int personId;

    private String lastName;

    private String firstName;

    private String middle;

    private String phone;

    private String email;

    private boolean deleted;

    private Date createDate;

    private Date modifyDate;

...


Now in these examples the database table PERSON contains the column named PERSON_ID and the class Person contains the corresponding field named personId.  In both of these cases ID or id respectively would be probably be better as the use of person is repeating the name used within the name scope context of the member variable.  The down side of this type of non-duplication is the loss of the ability to search for the names, which I think implies that software name search probably needs more sophisticated mechanisms over that of simple text searching, another idea I will expand on in the future.

This is my first post on naming, I have three more planned and I will get into more ideas some of which come from linguistics. I feel that Domain Driven Design gets into a number of linguistic and linguistic oriented concepts, maybe what we are going for here is Semantic Driven Design, but that would be just another YADD.


1 Code Complete provides a lot of ideas here are few more:


The name should long enough that you don't have to puzzle it out.

Avoid misspelled words in names.

Avoid words that are commonly misspelled.

Don't differentiate variable names solely by capitalization.

Creating Short Names That Are Readable. These are guidelines for shortening names when you need to, so this is a special case not a general one.  I know I have run into this with Oracle 11g’s 30 Char limit.

Avoid unnecessary abbreviations.

Avoid multiple natural languages.

Avoid names that could be misread or mispronounced.

Avoid names that are different by only one or two characters.

Avoid names that sound similar.

Avoid names that conflict with standard library routine names or with predefined variable names.

Format names for readability.

Avoid excessively long names.

Code is read far more times than it is written. Be sure that the names you choose favor read-time convenience over write-time convenience.


2 This use of Hungarian notation is apparently not how it was intended to be used see "Making Wrong Code Look Wrong".



31 July 2011

Software Frameworks: Resistance isn’t Futile

As I have previously discussed, in my opinion there are three main framework components that can be described succinctly as, libraries, rules, and templates. It is the library component that I wanted to talk about here, perhaps in a context that might be seen as more evidence in support of frameworks in certain cases. Now to recap, building a framework does not make sense for all projects, the two main scenarios that I have seen that are extremely conducive to it are an organization with multiple similar projects with similar problem domains. The second case is a medium to large project with a lot of commonality that would favor reusability across the project. One of my complaints about the Framework debate is that it is debated as a black and white argument. Either frameworks are absolutely required or the worst thing you can do. Now I am sure that many developers can anecdotally cite either side of this argument, which is what I feel really drives this debate and there is no doubt that I do this as well, but the goal, in my opinion, is to step back and look at this problem from a bigger perspective.

One place that I have found some interesting perspective is in a paper by Todd L. Veldhuizen titled: "Software Libraries and Their Reuse:Entropy, Kolmogorov Complexity, and Zipf’s Law", there is a slide version here, now a word of caution this paper is very math intensive, but it should be possible to read it and gain some insights without understanding the math, for example in the paper he states the following:

A common theme in the software reuse literature is that if we can only get the right environment in place— the right tools, the right generalizations, economic incentives, a "culture of reuse" — then reuse of software will soar, with consequent improvements in productivity and software quality. The analysis developed in this paper paints a different picture: the extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse.

I think this is a good observation, many projects that I have worked on have exhibited characteristics that are favorable to reuse, but I have read a number of counter arguments especially by people in fast paced startups where the flux of the system evolution is potentially resistant to reusability, actually in this case it’s probably the SDLC, not necessarily the problem domain that is resistant. Also I would suspect the probability of reuse is in part inversely proportional to system size, so for small systems it’s less likely or at least will have a smaller set of reusable components, so an investment in reuse may not be seen as justified.

Another interesting observation from the paper is:

Under reasonable assumptions we prove that no finite library can be complete: there are always more components we can add to the library that will allow us increase reuse and make programs shorter. To make this work we need to settle a subtle interplay between the Kolmogorov complexity notion of compressibility (there is a shorter program doing the same thing) and the information theoretic notion of compressibility (low entropy over an ensemble of programs).

This is especially interesting if you have some familiarity with Information Theory, and if you don’t I recommend learning more about it. Here he is comparing characteristics of both Algorithmic Information Theory [Kolmogorov complexity] and Shannon’s Information Theory [information theoretic notion].  Roughly, Algorithmic Information Theory is concerned with the smallest algorithm to generate data and Shannon’s Information Theory is about how to represent the data in the most compact form.  These concepts are closely related to data compression and in the paper this is paralleled to the idea that reusing code, will make the system smaller in terms of lines of code, or more specifically: symbols, which effectively "compresses" the codebase.  In Algorithmic Information theory you can never really know if you have the smallest algorithm so I may be taking some liberty here, but I think the takeaway is that when trying to create reuse you can probably do it forever so one needs to temper this desire with practicality. In other words there is probably a point where any subsequent work towards reuse is a diminishing return.

I find the paper compelling and I confess that perhaps I am being bamboozled by math that I still do not fully understand, but intuitively these ideas feel right to me.  Also the application of Zipf’s law is interesting and should be pretty intuitive, once again roughly, Zipfs law relates to power curve distributions, also related to the 80/20 rule, the prime example is the frequency of English words in text, words like [and, the, some, in, etc.] are much, perhaps orders of magnitude, more common than words like [deleterious, abstruse, etc.].  This distribution shows up in things like the distribution of elements in the universe, think hydrogen vs. platinum, the wealth distribution of people you vs. Bill Gates, how many followers people have on twitter, etc. and to a smaller scale, the curve is scale invariant, in software, often some components will have a fair amount of reuse, things string copy functions, entity base classes, etc., whereas others may only have a couple of reuses.

On the lighter side, Power curves relate to Chaos Theory, I have seen a number of people including Andy Hunt draw parallels between Agile and Chaos theory, although these are usually pretty loose, it does strike me that one way to model chaos is through iterated maps, which is reminiscent of the iterative process of agile, also the attractor and orbit concepts seem to parallel the target software system as well.

Another place, and this is one that most developers will find more accessible, I would have lead with this but the title and flow leaned the other way, is Joshua Bloch’s "How To Design A Good API and Why it Matters".  Actually I think this a presentation that every developer should watch especially if you are involved in high level design and architecture related work, and don’t worry, there is no math to intimidate the average developer.  A summary article version can be found here, but I would still recommend watching the full video at least once if not multibple times. Slides to an older version of the talk can be found here.

In his presentation he talks about getting the API right the first time because you are stuck with it. I think this helps illuminate one very important and problematic aspect of framework design and software development.  The problem can be illustrated with a linguistic analogy, in linguistics there are two ways to define rules for a language: prescriptive, is where you apply (prescribe) the rules on to the language vs. descriptive where the rules describe the language as it exists, I strongly favor the descriptive.  One of the most famous English language rules: "not to end sentences with prepositions" is a prescriptive rule thought to be from Latin which was introduced by John Dryden and famously mocked by Winston Churchill "This is the sort of nonsense up with which I will not put.", pointing out that it really doesn’t always fit a Germanic language like English.  I know I’m a bit off topic again with my "writer’s embellishment", not to mention that the same idea is discussed in depth by Martin Fowler which he terms "Predictive versus Adaptive". 

It is a common problem which Agile attempts to address and it is common in general software design and construction, it also occurs API and framework design.  Software construction as we know is an organic process and I feel that frameworks are best developed in part out of that process though Martin Fowler’s Harvesting which can be termed as descriptive framework creation.  What Joshua Bloch in part describes and to some degree cautions against can be described as a prescriptive approach to API/Frameworks.  I think many developers including me have attempted to create framework components and API’s early in a project usually driven by a high level vision of what the resulting system will look like only to find out later that certain assumptions were not valid or certain cases were not accounted for1.  What he talks about is a pretty rigid prescriptive model which is in many ways at odds with an adaptive agile approach, I feel that the more adaptive agile approach is really what is needed for the framework approach and we do see this via versioning, for example the differences between Spring 1.x and Spring 3.x are substantial and no one would want to use 1.x now, but there are apps that are tied to it now.  Also this approach of complete backwards compatibility was used with Java Generics, specifically Type Erasure which has lead to a significant weakening of the implementation of that abstraction.  It is my understanding that Scala has recently undergone some substantial changes from version to version leading some to criticize its stability while others cite that it is the only way to avoid painting yourself into a corner like with Java.  The harvesting approach will often involve refinement and changes to the components which are extracted and this can lead to the need for refactorings that can potentially affect large amounts of already existing code. It’s a real chicken and egg problem.

He starts his presentation with the following Characteristics of a Good API:

Characteristics of a Good API
  • Easy to learn
  • Easy to use, even without documentation
  • Hard to misuse
  • Easy to read and maintain code that uses it
  • Sufficiently powerful to satisfy requirements
  • Easy to extend
  • Appropriate to audience

He also makes the following point:

APIs can be among a company's greatest assets

I think this is sentiment is reflected in Paul Graham’s "Beating the Averages" of course that is more about Lisp but underlying principle is the same, actually an interesting language agnostic point comes from Peter Norvig, I can’t find the reference, but he said he had a similar attitude towards Lisp until he got to Google and saw good programmers who were incredibly productive in C++. I feel that this is all just the framework argument, maximizing reusability by building reusable high level abstractions within your problem domain that allow you to be more productive and build new high level components more quickly, it’s all about efficiency.

In regards to API’s he adds:

API Should Be As Small As Possible But No Smaller

To which he adds:

  • When in doubt leave it out.
  • Conceptual weight is more important than the bulk - The number of concepts.
  • The most important way to reduce weight is reusing interfaces.

He attributes this sentiment to Einstein, but he wasn’t sure about it, I did some follow up the paraphrasing is "Everything should be made as simple as possible, but no simpler." or "Make things as simple as possible, but not simpler." More about that can be found here. These are good cautions about over-design or over-engineering APIs, Frameworks, and Software in general, I have definitely been guilty of this at times, once again this is something that needs to be balanced.

The following is what I consider to be seminal advice:

All programmers are API designers because good programming is inherently modular and these inter modular boundaries are API’s and good API’s tend to get reused.

As stated in his slides:

Why is API Design Important to You?
  • If you program, you are an API designer
    • Good code is modular–each module has an API
  • Useful modules tend to get reused
    • Once module has users, can’t change API at will
    • Good reusable modules are corporate assets
  • Thinking in terms of APIs improves code quality

The next two quotes are pretty long, and it was no easy task transcribing them, he talks really fast, I tried to accurately represent this as best as possible, also I feel that he really nails some key ideas and I wanted to have a written record of it to reference, since I am not aware of any other:

Names matter a lot, there are some people that think that names don’t matter and when you sit down and say well this isn’t named right, they say don’t waste your time let’s just move on, it’s good enough. No! Names, in an API, that are going to be used by anyone else that includes yourself in a few months mater an awful lot.  The idea is that every API is kind of a little language and people who are going to use your API needs to learn that language and then speak in that language and that means that names should be self explanatory, you should avoid cryptic abbreviations so the original Unix names, I think, fail this one miserably.

He augments these ideas by adding consistency and symmetry:

You should be consistent, it is very important that the same word means the same thing when used repeatedly in your API and you don’t have multiple words meaning that same thing so let us say that you have a remove and a delete in the same API that is almost always wrong what’s the difference between remove and delete, Well I don’t know when I listen to those two things they seem to mean the same thing if they do mean the same thing then call them both the same thing if they don’t then make the names different enough to tell you how they differ if they were called let’s say delete and expunge I would know that expunge was a more permanent kind of removal or something like that. Not only should you strive for consistency you should strive for symmetry so if you API has two verbs add and remove and two nouns entry and key, I would like to see addEntry, addKey, removeEntry, removeKey if one of them is missing there should be a very good reason for it I am not saying that all API’s should be symmetric but the great bulk of them should. If you get it right the code should read like prose, that’s the prize.

From the Slides:

Names Matter–API is a Little Language
  • Names Should Be Largely Self-Explanatory
    • Avoid cryptic abbreviations
  • Be consistent–same word means same thing
    • Throughout API, (Across APIs on the platform)
  • Be regular–strive for symmetry
  • Code should read like prose

I feel that this hits some essential concepts which really resonate with me, in fact I have follow on posts planned to further deconstruct and develop these ideas more generally.  From the framework perspective this also gets at some of the variety of the Framework code components, they can be the Paul Graham’s functional Lisp abstractions, they can be DSL’s, they can be Object Oriented like Spring, Hibernate and Java API, etc.   Any framework built for a domain will have their conceptual vocabularies or API languages that are a higher level abstraction of the problem domain, Domain Specific Abstractions, and they all benefit from concepts like consistency and symmetry as appropriate to the domain.

The following is a very common and widespread problem that often inhibits reuse and leads to less efficient production of lower quality software:

Reuse is something that is far easier to say than to do. Doing it requires both good design and very good documentation. Even when we see good design, which is still infrequently, we won’t see the components reused without good documentation.

- D. L. Parnas, Software Aging. Proceedings of the 16th International Conference on Software Engineering, 1994.

He adds:

Example Code should be Exemplary, one should spend ten times as much time on example code than production code.

He references a paper called "Design fragments" by George Fairbanks also here which looks interesting but I have not had time to read it yet.

This is also, I believe, to be a critical point that is possibly symptomatic of problems with many software projects. I feel that projects never explicitly allow for capturing reuse in terms of planning, schedule and developer time.  Often reuse is a lucky artifact if you have proactive developers who make the extra effort to do it and it can often be at odds with the way I have seen projects run (mismanaged).  I have some follow up planned for this topic as well.

In terms of design he adds this interesting point:

You need one strong design lead to that can ensure that the api that you are designing is cohesive and pretty and clearly the work of one single mind or at least a single minded body and that’s always a little bit of a trade off being able to satisfy the needs of many costumers and yet produce something that is beautiful and cohesive.

I have to confess that I do not recall how I came across Todd Veldhuizen’s paper or Joshua Bloch’s talk, but I felt that they were really about similar ideas, in writing this and finding all of the references again I realized that my association of these two was not coincidental at all.  For they are both part of the Library-Centric Software Design LCSD'05 workshop for Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'05) with Joshua Bloch delivering the same talk as the Keynote Address.

Now I admit that one goal of mine is to put these ideas, much of which was borrowed from the two referenced works, into a written form that I can reference back to, I hope this stands up by itself but is really written for my future entries. Also, for the record this is not the first time I have "leached" off of Joshua Bloch’s work.

Ultimately my ideas will diverge from some of those in regards to API design, Joshua Bloch speaks from a position of greater responsibility in terms of APIs. I see them as part of a framework continuum used to construct software and I think many of his ideas apply directly and more generally to that.  Also I see framework and the system as interlocked and feel that frameworks can drive structure and consistency for example Object-Oriented API’s aka frameworks can in turn drive the structure of software, the Spring Framework is a good example of this, Spring is build heavily around IOC which is one of the SOLID principles.

There will always be arguments against frameworks, the classic one is it creates more code that is more complex and it requires more learning, my counter argument to this is twofold: First any code that is in production probably has the need to be known and understood which might require it to be learned regardless of how efficiently it is created.  Also if a framework creates reusability and consistency the initial learning curve will be higher but each subsequent encounter with a codebase that constructed this way should be easier. Also highly redundant inconsistent code is potentially (much) more difficult to learn and maintain because there is more of it.  The second is if your framework API is well defined and well documented it should make the resultant code much easier to understand and maintain aka "read like prose".  This will be due to the fact that much of the "generic" complexity is "abstracted" downwards into the framework level.  For example compare the code to implement a Spring Controller to the underlying classes that do the work such as AnnotationMethodHandlerAdapter Now if you have defects and issues at that lower level they will be harder to fix and changing common code can have side effects to other dependant code, it’s not a perfect world.

I think the issue with reuse and the framework approach is asking the right question: How resistant (or favorable) is you domain and your SDLC to reuse?  I think most domains have some if not a fair amount of favorability to reuse and I see reuse as increased efficiency in software construction. 

1Perhaps: "...certain cases for were not accounted." Dryden blows.

19 March 2011

What does your framework look like?

Paul Graham, entrepreneur, Lisp advocate, and general geek hero, posits1, and I am paraphrasing, that developers should develop in Lisp because otherwise they end up duplicating inherent Lisp functionality to build an advanced system. I currently work mostly with the Spring Framework, the initial and continuing goal of Spring is to simplify and "abstract out" a lot of the drudge work of building applications with J2EE, in fact what first seduced me was the ease, especially with an ORM like Hibernate, that you can very easily and with almost no code create basic Web CRUD apps. Of course creating quality apps is not completely simplified and these technologies are non-trivial and open a whole new dimension of complexity, but in a good way, mostly.


During my current Spring and Hibernate tenure, I have come to realize that in order to build a DRY system with these technologies you really need to build your own framework on top of these frameworks, this sentiment was echoed to me in a job interview by an architect, who was lamenting that there was a perception within the organization that once you drop in Spring and Hibernate all of your infrastructure work is done, in fact our concordance on this point was one of several reasons why I was offered and accepted the position.


So what does this framework look like? Well in this context I would abstractly define a "framework" as a collection of components and rules. Its intent is to guide the structural s development of a system towards DRY and consistent code and to create and maintain Conceptual Integrity in the system and throughout the development process and to create clean high level interfaces to increase developer productivity. This framework will vary from system to system and it will be driven by project specific technology decisions and design.


The two aspects of this approach are pretty intertwined, so I’ll start with the structural components, which can include a number of possible component categories, some components may be created to fill in deficiencies with the underlying technology components, while most will probably be more vertical convenience components which bridge the gap from application needs to the generalized underlying technology frameworks like Spring. A good example of this is security. Many projects I have worked on have had special security needs that in some cases require extending and overriding existing Spring Security classes. Essentially any code that can be used in multiple places in a system is a candidate for being "pulled down" to the framework layer. Also components that can be commoditized into services can be thought of as part of this approach also. In the Spring and Hibernate world two common framework patterns are the Generic Dao and the model base class.


Ultimately I cannot define what your framework will look like, as mine tends to vary from project to project and generally includes the components mentioned above, as well as other components such as custom property editors, reusable comparators, Dao and Model audit functionality, special JSP tags, etc. Your framework may be quite different, and it might not be in Java, as it might be in another language and framework like Ruby on Rails or Scala/Lift for example. Also you should plan on the framework being an organic evolving entity which grows constantly through refactoring and harvesting, and in this case I would call it "downward refactoring" where common and possibly redundant code is refactored down to the framework layer.


The rules aspect includes, of course, coding standards, but it should also guide developers on their choices covering how to use underlying technologies, and relatively mundane things like how the project is structured, and naming, also this is not a comprehensive list and also may vary over the project event space. An example of rules would be to dictate how specific technologies are used, in Spring and Hibernate there are often multiple ways to write code which performs the same function as well as competing configuration "modes" e.g. annotations vs. XML. One common coding variation in Hibernate systems is the HQL vs Criteria API "debate" many systems I have worked on tend to have a haphazard mix of both, a consistent system will have Dao classes that have specific rules about when to choose the Criteria API over HQL and thus not leave it to the developer’s whim. Rules should also be viewed organic and evolving, especially if the developers are new to technologies. Also the rules should extend to how and when the framework components should be used.


Of course as I was writing this, I came across several reuse skepticism articles including "Reuse Myth - can you afford reusable code?" which was on the Reddit and following the links yielded: "the imperial clothing crisis" and "Hidden Costs Of Code Reuse". At first, my reaction was that all of this was completely antithetical to my arguments; however, in reading them, I realized that they are good sources for refinement of my points.


First I should point out that the framework architecture is hopefully created by the architects and alpha developers, also the alpha developers generally produce higher quality code at least three times faster than average 80%, so conceivably your software creation rate is still at least one to one, based on the argument about reusable code takes three times longer, also I would argue that in general it does in fact take more time to produce higher quality software than to produce lower quality software. Admittedly, most manager types usually don’t want to hear that!


It should be noted that framework code should be constructed in a deliberated and hopefully collaborative way, and that not everything should be made reusable, however, I am of the opinion that, if you use it more than once it is a candidate for reuse refactoring, however, this needs to be tempered by the level of effort to achieve reusability, sometimes cut and paste makes more sense. Also, as the above reuse skepticism articles suggest, don’t go crazy trying to create a bunch of reusable code before you have the need to use it. Another somewhat ironic point is that the framework concept that I am promoting often in part consists of components that reduce the generality of the underlying frameworks narrowing those API’s for more focused integration to define a cleaner interface with the higher level application code.


There is one additional "meta" component that I have used and seen during my career, it is sometimes referred to a sample or template project, the more recent name which I have encountered is bootstrap project. The bootstrap project often both exemplifies and ties the rules and components aspects together. It usually contains the configuration structure and usually sample code to guide how projects should be constructed, and it is often an actual running project which can be deployed to the target development environment.


1 This particular reference is actually restatement of Greenspun's Tenth Rule, which Paul Graham references in his essay, when I first wrote this I remembered it as Paul Graham's, however, upon rediscovering the essay I realized that it was actually a reference, I decided to leave it this way because I think Paul Graham’s essay and body of work is worth reading as is Philip Greenspun’s.