Elegant Coding: More Thoughts on Formal Approaches to Naming in Software

On the Importance of Naming in Software Systems, Part II

In my last post on naming I surveyed and attempted to categorize the conventional wisdom in regards to naming. Here I will expand on some of my ideas and extend my naming domain vocabulary.

A software system is composed of many things and most of them if not all have names. In order to talk about this I realized that I need a term to describe the things that have names, Steve McConnell makes the comment "a variable and a variable’s name are essentially the same thing". While this is effectively true, named things in a system have different types, uses, etc. In trying to come up with a name for these named things, I felt that I could do better than the phrase "named things". For the word "things" I contemplated item, object, entity, artifact, and more. These all had various connotations and possible contexts which confused the meaning. I turned to linguistics for the answer, and found the following on Wikipedia under intension, the bolding is mine:

The meaning of a word can be thought of as the bond between the idea or thing the word refers to and the word itself. Swiss linguist Ferdinand de Saussure contrasts three concepts:

the signifier — the "sound image" or string of letters on a page that one recognizes as a sign.

the signified — the concept or idea that a sign evokes.

the referent — the actual thing or set of things a sign refers to. See Dyadic signs and Reference (semantics).

So the word referent seems to be a good choice as it probably has no context with relation to software engineering. To take this further, I will use the expression Named System Referent or possibly just System Referent to refer to those things in a software system that have a name. To be clear here a name is what is referred to above as a signifier but I will just use name. A System Referent is the thing (referent) in the Software System that the name refers to. If this is confusing hopefully some examples will help. Some System Referents include but are not limited to the following¹:

Classes
Variables (local, instance, static)
Methods
Method Parameters
Packages
Database Tables, Columns, Triggers, Stored Procedures, etc.
HTML Files, CSS Files, Javascript Files, Config Files, all files
Directories
Urls
Documents
XML Elements and Attributes
CSS Classes

And the list goes on.

Another observation we can make is that System Referents can have a context or a grouping, for example System Referents which include classes, packages, variables, methods, etc. may have different conventions from other System Referent groupings such as database System Referents.

Now that we have a way to talk about named items, we can explore the very important idea of naming conventions. The name of a System Referent often consists of several components, from a linguistic stand point these components are similar to, but not the same as a lexical unit or a morpheme, for this I am choosing the term Name Unit, for example an identifier "personId" consists of two Name Units, "person" and "id". Name units imply a separator mechanism. In this example the use of CamelCase, another example is "person_id" where the name units are separated by the separator "_". So a name unit is that atomic piece of a name that is separated by a separator and a separator is either explicit e.g. "_" or implied e.g. CamelCase. In my previous post we looked at different prefix and suffix qualifiers, this is a generalization of the qualifier concept which deals with the whole name and not just one part of it.

I wish to compare, from the structural perspective, two common conventions one for the Java System Referent grouping and that of an SQL Database grouping. For the Java Naming convention we will assume the standard Java naming convention. For the database we will limit ourselves to Tables and Columns and assume uppercase Names, name units that consist only of upper case characters, the separator will be an underscore "_".

We can define formal a more formal approach to the lexical structure of a naming convention using a BNF, or in this case my own variant extended BNF, for example our BNF for variable names in Java might look like the following:

name-unit-terminal ::= [A-Z]+[a-zA-Z0-9]*

name-unit-prefix-terminal ::= [a-z]+[a-zA-Z0-9]*

<name> ::= name-unit-prefix-terminal | name-unit-prefix-terminal <name-unit>

<name-unit> ::= name-unit-terminal | name-unit-terminal <name-unit>

The Naming convention for our database column names and table names:

name-unit-terminal ::= [A-Z]+[A-Z0-9]*

<name> ::= name-unit-terminal | name-unit-terminal "_" <name>

Two Concrete examples illuminating these are as follows:

public class Person {

private Long id;

private String lastName;

private String firstName;

private String middle;

private String phone;

private String email;

public Long getId() {

return id;

}

public void setId(Long value) {

id = value;

}

public String getLastName() {

return lastName;

}

public void setLastName(String value) {

lastName = value;

}

public String getFirstName() {

return firstName;

}

public void setFirstName(String value) {

firstName = value;

}

public String getMiddle() {

return middle;

}

public void setMiddle(String value) {

middle = value;

}

public String getPhone() {

return phone;

}

public void setPhone(String value) {

phone = value;

}

public String getEmail() {

return email;

}

public void setEmail(String email) {

this.email = email;

}

CREATE TABLE PERSON (

ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

LAST_NAME VARCHAR(40) NULL,

FIRST_NAME VARCHAR(20) NULL,

MIDDLE VARCHAR(20) NULL,

PHONE VARCHAR(20) NULL,

EMAIL VARCHAR(256) NOT NULL

);

Naming Conventions consist of both a Lexical Component and a Semantic Component. In this post we have focused on the lexical aspects of naming, from the intension definition above we focused on the signifier and the referent. To really create a comprehensive approach the Semantic aspect cannot be ignored. In the intension definition above this can be viewed as focusing on the signified and the referent. This is an area that is explored in the idea of the Ubiquitous Language within the realm of Domain Driven Design.

The Semantic Elements of a system and of a naming convention are more complex and ambiguous. I do think that software systems should have a System Lexicon which would be a more concrete formalization and extension of the System’s Vocabulary or Ubiquitous Language this would also incorporate ideas encapsulated in a Data Dictionary. A System Lexicon would be an entity of some complexity and would require some commitment to build also applying it could be problematic and might require tooling such as an IDE interactive name suggestion feature or a static analysis component or both. It might even require an ontology or taxonomy oriented graphical tool for constructing and maintaining the System Lexicon depending on the size and complexity of the system. I will get into some ideas that extend the lexical aspects and move into the Semantic aspects in future posts. I know that this might seem pretty grandiose but I think this powerful stuff that could lead to better approaches to create higher quality software.

¹This list is biased to Java Web development, but these ideas should be general enough to span any software system.

Elegant Coding

25 March 2012

More Thoughts on Formal Approaches to Naming in Software

On the Importance of Naming in Software Systems, Part II

2 comments:

About Me