13 May 2012

Software System Morphology

On the Importance of Naming in Software Systems, part III

In my previous posts on naming "A Survey of the Conventional Wisdom on Software Naming" and "More Thoughts on Formal Approaches to Naming in Software" I discussed several ideas about defining a way to talk about software system naming, I feel the name "Software System Morphology" captures this "big picture" concept. My objective is to develop a comprehensive way to define and analyze the semantic nature (morphology) of the construction of a software system. It’s an ambitious undertaking and whether or not I succeed in this endeavor I am hoping that these ideas will contribute to a better solution to this problem.

To review, I am defining three attributes that a name of signifier can have, the name scope context, i.e. where the name is used or occurs, the type or class of the system referent signified by the name, and the lexical structure of the name i.e. the n-gram of name units.  I feel that these three qualities can be used to determine and possibly measure aspects of the morphological structure of software and by determining the morphological structure of a software system that structure might be "comprehended" and refined which could yield higher quality software. 

In Linguistics a morpheme is the smallest unit of meaning and it is considered, by some, to be a complete sub-discipline of the study of Linguistics. I admit that I am in the process of learning more about it, I am a few chapters into What is Morphology by Mark Aronoff and Kirsten Fudeman it is an excellent book for linguistic neophytes like me.  Software System Morphology will both be divergent and dependent on natural language morphological concepts, which are quite complex.  However a general understanding of linguistic morphology will make these ideas more accessible, I suggest looking into it on your own, it’s fascinating and worth it solely on its own merits.  A quick example of English Morphology can be given with the morphologically complex word autobiography which consists of the morphemes: auto/bio/graph/y, where [auto] means "self", [bio] means "life", [graph] means "drawn" and [y] means "characterized by".

In my first post of this series I explored various ideas proposed by a few prominent software engineers, one idea that I discussed was the use of what were denoted as qualifiers which included the scope qualifier, the type qualifier and the computed-value qualifier. These qualifiers which are prefixed or suffixed to names can be thought of more generally as morphemes. So we can now incorporate those ideas into a more general way of looking at meaning in software naming. Now the natural conclusion, if you have been paying attention, may be to assume that the name unit concept will map to the morpheme concept, it does not, at least in terms of only software morphology and this is in interesting distinction, software morphemes may consist of n-grams of name units which can be natural language morphemes two examples are CommandPattern and MedicalRecord where separating these into individual name units potentially become meaningless in certain software contexts.

Software System Morphology is primarily influenced by three things, the semantic nature of the problem domain(s), the semantic nature of the solution domain(s), which include languages ORMs, paradigms, design patterns, data structures, and finally the natural language(s) in which both the problem domain and solution domains are expressed. The previous example items CommandPattern and MedicalRecord illustrate these ideas as well.  

Ok now let’s take our concepts further with a more detailed example.  Assume we have a Hospital Medical system which has the following problem domain concepts: [Medical Record, Patient, Visit, Doctor, Hospital, Address, Invoice]. Obviously there are more. For the solution domain we will assume a JEE/Spring style implementation which might include the following concepts: [Entity, Dao, ServiceLayer, Controller, Map, View].  Now some expected names (signifiers) might be: MedicalRecordEntity, PatientEntity, PatientServiceLayer, PatientDao, PatientController, PatientView, VisitEntity, VisitDao, VisitServiceLayer, VisitController, VisitView, PatientVisitMap, PatientHospitalVisitor,... Now as you can see the names are composed of morphemes from both the problem domain and the solution domain.  This is to be expected as each name tells what it is/does in the problem domain and what it is/does in the solution domain. In fact any developer with knowledge of JEE Spring Web application developer can look at these names and know what to expect and where they fit in the system.  Additionally it is easy to look at the above set of domain concepts and anticipate named system referents that I omitted.

Previously I pointed out that the idea referred to as a qualifier is really a morpheme and while this is true the term qualifier seems to denote a more contextual setting in terms of the domain concepts.  In the above examples the solution domain morphemes act as qualifiers telling where and how each named system referent fits into the solution Entity tells that it is a domain object, Dao tells that it relates to persistence, ServiceLayer tells that one would expect to see a separation of concerns that would put the business logic here. Structurally one would expect the domain objects (with the Entity suffix) to be passed back from the Dao’s and ServiceLayer’s and there might be transformative manipulation or aggregation of the domain objects in the ServiceLayer.  The idea of qualification is meaning based but in this case it also indicates structural aspects of the solution domain. In this example I intentionally use the Entity suffix, in many systems the domain concepts would be used alone i.e. [Patient, Visit], I have mixed feelings about this in one sense it can be viewed as potentially unnecessary, in another sense it represents a qualifier that can possibly improve comprehension. Qualifiers do make names longer and while I tend to favor longer names it can become problematic if they get too long.  These ideas towards naming are powerful for building comprehensible maintainable code as you can see from the above name examples that they follow a consistent pattern, admittedly these are fairly easy naming tasks, but you would be surprised how many developers fail to even do these types of things consistently.

Unlike others who advocate specific practices in regards to naming I feel that naming should be dealt with in a flexible framework that can be adapted to a specific project’s needs.  While I personally agree and disagree with ideas previously discussed, my philosophy is that naming should be done consistently and comprehensively reflecting deliberate choices using conceptual understanding and analysis of the Domains. Also it should be tracked in a Lexicon, an idea that I am working on and hope to explore in a later post.  Also as I have said before I feel that tooling could be used to allow better conceptual understanding and navigation and the use of analytics and possibly visualization could be used to ease naming decisions in practice by providing improvements to both static analysis and interactive autosuggest/intellisense aids.

I feel that these ideas are an attempt to create a more "formalized" approach to what Eric Evan’s does in Domain Driven Design which also covers a number of other structural ideas, I hope to extend my ideas here to better integrate some of those as well.  The next steps for me are to undertake an analytic approach I also hope to explore the application of ideas like formal concept analysis and spectral graph theory techniques which may add some mathematical rigor to these ideas and perhaps allow some empirical validation and probably refinement. Now it gets really hard so it might be a while for those follow-ups.

No comments:

Post a Comment