On the Importance of Naming in Software Systems, Part I
Naming is a very important aspect of software, however, its importance and relevance to building quality software is often overlooked or ignored all together. I feel that naming needs to be part of a larger more formal framework for building software. In previous posts I have advocated a framework approach to capturing reuse and standardizing software design and implementation and I feel that naming is a crucial part of such an approach. Previously I included the following quote about naming in an API design talk by Joshua Bloch:
Names matter a lot, there are some people that think that names don’t matter and when you sit down and say well this isn’t named right, they say don’t waste your time let’s just move on, it’s good enough. No! Names, in an API, that are going to be used by anyone else that includes yourself in a few months mater an awful lot. The idea is that every API is kind of a little language and people who are going to use your API needs to learn that language and then speak in that language and that means that names should be self explanatory, you should avoid cryptic abbreviations so the original Unix names, I think, fail this one miserably.
He is not the only prominent software engineer to advocate an attention to naming, in this post I will summarize and discuss several sources. I will start with his material which is part of his talk on API design which I summarized in my previous post. Actually I feel that most if not all of what he says on API design is applicable to software in general including what he has to say on naming. The summary points are as follows:
Names should be self explanatory.
Avoid cryptic abbreviations.
Don’t have multiple words meaning that same thing.
Names should be consistent – same word means same thing.
Be regular–strive for symmetry. This implies a larger context of naming and a relationship between concepts.
Code should read like prose. (As a result of good API design)
Chapter two titled "Meaningful Names" of Robert Martin’s Clean Code, written by Tim Ottinger, is dedicated to naming. A summary of the advice is as follows:
Use Intention Revealing names
Avoid Disinformation – This is a corollary to Use Intention Revealing names. This means that the components of the name actually conceptually match the intent.
Make Meaningful Distinctions – This means Pick One Word per Concept.
Use Pronounceable Names. This means Avoid cryptic abbreviations.
Use Searchable Names. This is mainly about not using single character names and magic constants.
Avoid Encodings. This means do not use suffixes or prefixes to encode type or status as a member "m_". He also discourages using "I" as a prefix to an interface. This includes avoiding Hungarian Notation.
Avoid Mental Mapping. This is an anti pattern. It is a corollary to Use Intention Revealing Names.
Class Names should consist of nouns of verb Phrases.
Method Names should consist of verbs or verb Phrases.
Pick One Word per Concept.
Use Solution Domain Names. Solution domain names refer to things like design patterns or more distinct concepts such as Queue or Binary Tree
Use Problem Domain Names – This is pretty straight forward, if your system processes medical records, you might expect a class called MedicalRecord.
Add Meaningful Context – This deals with the idea of adding suffixes or prefixes to denote additional context information in the name. The example given is use addrState vs. state.
Don’t Add Gratuitous Context – This talks about the idea of adding unnecessary or too specific suffixes or prefixes to denote context.
Choosing good names requires good descriptive Skills and Shared cultural background
Steve McConnell’s Code Complete contains the following advice1:
A name should fully and accurately describe what the variable represents.
A good mnemonic name generally speaks to the problem rather than the solution. A good name tends to express the what more than the how. In general, if a name refers to some aspect of computing rather than to the problem, it's a how rather than a what. Avoid such a name in favor of a name that refers to the problem itself.
Use pronounceable names.
Avoid misleading names or abbreviations and use consistent abbreviations.
Avoid names that are misleading.
Avoid names with similar meanings.
Avoid variables with different meanings but similar names
Name Specific Types of Data using Standardized Prefixes – Use prefixes to denote (encode) type such as Hungarian Notation.
Use computed-value qualifiers if needed at the end of the name. Examples are: Total, Sum, Average, Max, Min, Record, String, or Pointer.
Use a Naming Convention and that naming convention should be compatible standard conventions for the language. Naming convention should distinguish among local, class, and global data. Naming convention should distinguish among type names, named constants, enumerated types, and variables.
Good variable names are a key element of program readability. Specific kinds of variables such as loop indexes and status variables require specific considerations.
Names should be as specific as possible. Names that are vague enough or general enough to be used for more than one purpose are usually bad names.
Naming conventions distinguish among local, class, and global data. They distinguish among type names, named constants, enumerated types, and variables.
Abbreviations are rarely needed with modern programming languages. If you do use abbreviations, keep track of abbreviations in a project dictionary or use the standardized prefixes approach.
I did reword some of these to make them more stand alone. You may want to pursue the sources to gain a fuller picture of each author’s perspective. The interesting thing here is while there are several key common themes to these approaches there are some very explicit contradictions. For example Clean Code recommends against Type encoding like Hungarian Notation while Code Complete recommends it. Conversely Code Complete seems to implicitly recommend against using Solution Domain Names "speaks to the problem rather than the solution" whereas Clean Code advocates their use.
I feel that when talking about naming it is easy to conflate ideas pertaining to structural (lexical) concerns and ideas pertaining to domain (semantic) concerns. Unfortunately while these ideas are separate they are also intertwined and an intelligent approach to naming needs to recognize this.
One structural theme is the idea of qualifiers, which Code Complete explicitly names, but the idea appears implicitly in Clean Code as well. Qualifiers in this context are generally prefixes and suffixes that qualify the name. Qualifiers mentioned above include scope qualifiers which denote the scope of a variable such as the prefixes "m_" for member or "global_" for global. Hungarian Notation in this context2 is an example of a type qualifier. Additionally Code Complete defines computed-value qualifier suffixes to denote values which are computed values these include: Total, Sum, Average, Max, Min, Record, String, or Pointer. Interestingly qualifiers are a structural element of a name which denotes additional specific types of semantic meaning to the name. The structural and semantic construction of names is a topic I intend to continue exploring in subsequent posts.
The previous ideas address both structural (lexical) and domain (semantic) concerns now I wish to look at a targeted summary of more semantic oriented conceptual advice in Eric Evan’s Domain Driven Design:
Domain model terms are part the UBIQUITOUS LANGUAGE.
In regards to an INTENTION-REVEALING INTERFACE, name classes and operations to describe their effect and purpose without reference to the means by which they do what they promise. This relieves the client developer of the need to understand the internals. These names should conform to the UBIQUITOUS LANGUAGE so that team members can quickly infer their meaning.
In regards to MODULES aka Packages, If your model is telling a story your MODULES are chapters. The name of the MODULE conveys its meaning. These names enter the UBIQUITOUS LANGUAGE.
Name each BOUNDED CONTEXT and make the names part of the UBIQUITOUS LANGUAGE.
In Domain Driven Design the idea of conceptually developing and implementing the Domain iteratively is explored. This book reveals many approaches and concepts pertaining to that topic, the ideas that I am interested in which are targeted in the quotes above pertain to naming. A common theme in naming which would be hard to dispute is to Use Problem Domain Names. The idea of a UBIQUITOUS LANGUAGE is promoted in the book as a way to help define the domain so that all stakeholders have a common way to talk about the domain and its representation in the resulting software. Domain Driven Design promotes using the UBIQUITOUS LANGUAGE as a theme to defining the system and unifying the system documentation. To me a UBIQUITOUS LANGUAGE implies the need to track, define and attempt to codify it over time which I think implies a more formal way to document it like a lexicon is needed, an idea I will follow up on more in a future post.
Another issue with talking about software naming is the idea of context. Names pretty much always exist in a context, in Java a variable can be in the context of a method or a class. Both methods and classes themselves can exist in the context of classes and classes exist in the context of classes and packages and packages are themselves hierarchical. In fact software systems have developed the concept of Namespace to in part deal with this problem.
Each name exists in a context, and in order to talk about naming we need a language just as you need to define a language for a domain. Actually naming is a domain, a meta-domain perhaps, a domain that describes other domains. This means we need a "UBIQUITOUS LANGUAGE" for the domain of naming. So in that language I am going to define name scope context to define to context or "hierarchical place" in which a name occurs. I am purposefully not using namespace since that has other explicit meanings. I know using the words scope and context seem redundant but I felt using "scope" alone didn’t work because it has a specific meaning and context by itself was too vague, using name scope context can refer to a broader set of circumstances such as an attribute’s position in an xml document relative to its parent element or a SQL Column name relative to its parent table, a file’s directory and so on.
An interesting thing I have noticed about name scope context is that it can be redundant. For example let’s assume that we have an object named a database table named PERSON and a corresponding object named Person. I have seen the following approach to naming these:
CREATE TABLE PERSON (
PERSON_ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
LAST_NAME VARCHAR(40) NULL,
FIRST_NAME VARCHAR(20) NULL,
MIDDLE VARCHAR(20) NULL,
PHONE VARCHAR(20) NULL,
EMAIL VARCHAR(256) NOT NULL,
DELETED CHAR(1) NOT NULL DEFAULT 'N',
CREATE_DATE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
MODIFY_DATE TIMESTAMP NULL
)
class Person {
private int personId;
private String lastName;
private String firstName;
private String middle;
private String phone;
private String email;
private boolean deleted;
private Date createDate;
private Date modifyDate;
...
Now in these examples the database table PERSON contains the column named PERSON_ID and the class Person contains the corresponding field named personId. In both of these cases ID or id respectively would be probably be better as the use of person is repeating the name used within the name scope context of the member variable. The down side of this type of non-duplication is the loss of the ability to search for the names, which I think implies that software name search probably needs more sophisticated mechanisms over that of simple text searching, another idea I will expand on in the future.
This is my first post on naming, I have three more planned and I will get into more ideas some of which come from linguistics. I feel that Domain Driven Design gets into a number of linguistic and linguistic oriented concepts, maybe what we are going for here is Semantic Driven Design, but that would be just another YADD.
1 Code Complete provides a lot of ideas here are few more:
The name should long enough that you don't have to puzzle it out.
Avoid misspelled words in names.
Avoid words that are commonly misspelled.
Don't differentiate variable names solely by capitalization.
Creating Short Names That Are Readable. These are guidelines for shortening names when you need to, so this is a special case not a general one. I know I have run into this with Oracle 11g’s 30 Char limit.
Avoid unnecessary abbreviations.
Avoid multiple natural languages.
Avoid names that could be misread or mispronounced.
Avoid names that are different by only one or two characters.
Avoid names that sound similar.
Avoid names that conflict with standard library routine names or with predefined variable names.
Format names for readability.
Avoid excessively long names.
Code is read far more times than it is written. Be sure that the names you choose favor read-time convenience over write-time convenience.
2 This use of Hungarian notation is apparently not how it was intended to be used see "Making Wrong Code Look Wrong".
Thanks for your treatment of the naming rules. A similar, though highly condensed version also appears in AgileInAFlash and at its related blog (http://agileinaflash.com/2009/02/meaningful-names.html).
ReplyDeleteIt's nice to see the differences and similarities called out.
Thanks for caring enough to write this up. Best wishes.
Tim Ottinger
Hi Tim,
DeleteIronically we are wrestling with some naming issues on my current project. It really is hard to do well and I find that it is sometimes burdensome and can even be a little distracting. I hope to someday see better methodologies and tools to help alleviate this problem. I have two to three follow up posts planned to expand some of these ideas.
Also I like the history of the book chapter.
Thanks for commenting
Geoff
Have you seen jbrains' model for improving names: http://blog.thecodewhisperer.com/2011/06/15/a-model-for-improving-names
ReplyDelete