Elegant Coding: Linguistics

Showing posts with label Linguistics. Show all posts

13 May 2012

Software System Morphology

On the Importance of Naming in Software Systems, part III

In my previous posts on naming "A Survey of the Conventional Wisdom on Software Naming" and "More Thoughts on Formal Approaches to Naming in Software" I discussed several ideas about defining a way to talk about software system naming, I feel the name "Software System Morphology" captures this "big picture" concept. My objective is to develop a comprehensive way to define and analyze the semantic nature (morphology) of the construction of a software system. It’s an ambitious undertaking and whether or not I succeed in this endeavor I am hoping that these ideas will contribute to a better solution to this problem.

To review, I am defining three attributes that a name of signifier can have, the name scope context, i.e. where the name is used or occurs, the type or class of the system referent signified by the name, and the lexical structure of the name i.e. the n-gram of name units. I feel that these three qualities can be used to determine and possibly measure aspects of the morphological structure of software and by determining the morphological structure of a software system that structure might be "comprehended" and refined which could yield higher quality software.

In Linguistics a morpheme is the smallest unit of meaning and it is considered, by some, to be a complete sub-discipline of the study of Linguistics. I admit that I am in the process of learning more about it, I am a few chapters into What is Morphology by Mark Aronoff and Kirsten Fudeman it is an excellent book for linguistic neophytes like me. Software System Morphology will both be divergent and dependent on natural language morphological concepts, which are quite complex. However a general understanding of linguistic morphology will make these ideas more accessible, I suggest looking into it on your own, it’s fascinating and worth it solely on its own merits. A quick example of English Morphology can be given with the morphologically complex word autobiography which consists of the morphemes: auto/bio/graph/y, where [auto] means "self", [bio] means "life", [graph] means "drawn" and [y] means "characterized by".

In my first post of this series I explored various ideas proposed by a few prominent software engineers, one idea that I discussed was the use of what were denoted as qualifiers which included the scope qualifier, the type qualifier and the computed-value qualifier. These qualifiers which are prefixed or suffixed to names can be thought of more generally as morphemes. So we can now incorporate those ideas into a more general way of looking at meaning in software naming. Now the natural conclusion, if you have been paying attention, may be to assume that the name unit concept will map to the morpheme concept, it does not, at least in terms of only software morphology and this is in interesting distinction, software morphemes may consist of n-grams of name units which can be natural language morphemes two examples are CommandPattern and MedicalRecord where separating these into individual name units potentially become meaningless in certain software contexts.

Software System Morphology is primarily influenced by three things, the semantic nature of the problem domain(s), the semantic nature of the solution domain(s), which include languages ORMs, paradigms, design patterns, data structures, and finally the natural language(s) in which both the problem domain and solution domains are expressed. The previous example items CommandPattern and MedicalRecord illustrate these ideas as well.

Ok now let’s take our concepts further with a more detailed example. Assume we have a Hospital Medical system which has the following problem domain concepts: [Medical Record, Patient, Visit, Doctor, Hospital, Address, Invoice]. Obviously there are more. For the solution domain we will assume a JEE/Spring style implementation which might include the following concepts: [Entity, Dao, ServiceLayer, Controller, Map, View]. Now some expected names (signifiers) might be: MedicalRecordEntity, PatientEntity, PatientServiceLayer, PatientDao, PatientController, PatientView, VisitEntity, VisitDao, VisitServiceLayer, VisitController, VisitView, PatientVisitMap, PatientHospitalVisitor,... Now as you can see the names are composed of morphemes from both the problem domain and the solution domain. This is to be expected as each name tells what it is/does in the problem domain and what it is/does in the solution domain. In fact any developer with knowledge of JEE Spring Web application developer can look at these names and know what to expect and where they fit in the system. Additionally it is easy to look at the above set of domain concepts and anticipate named system referents that I omitted.

Previously I pointed out that the idea referred to as a qualifier is really a morpheme and while this is true the term qualifier seems to denote a more contextual setting in terms of the domain concepts. In the above examples the solution domain morphemes act as qualifiers telling where and how each named system referent fits into the solution Entity tells that it is a domain object, Dao tells that it relates to persistence, ServiceLayer tells that one would expect to see a separation of concerns that would put the business logic here. Structurally one would expect the domain objects (with the Entity suffix) to be passed back from the Dao’s and ServiceLayer’s and there might be transformative manipulation or aggregation of the domain objects in the ServiceLayer. The idea of qualification is meaning based but in this case it also indicates structural aspects of the solution domain. In this example I intentionally use the Entity suffix, in many systems the domain concepts would be used alone i.e. [Patient, Visit], I have mixed feelings about this in one sense it can be viewed as potentially unnecessary, in another sense it represents a qualifier that can possibly improve comprehension. Qualifiers do make names longer and while I tend to favor longer names it can become problematic if they get too long. These ideas towards naming are powerful for building comprehensible maintainable code as you can see from the above name examples that they follow a consistent pattern, admittedly these are fairly easy naming tasks, but you would be surprised how many developers fail to even do these types of things consistently.

Unlike others who advocate specific practices in regards to naming I feel that naming should be dealt with in a flexible framework that can be adapted to a specific project’s needs. While I personally agree and disagree with ideas previously discussed, my philosophy is that naming should be done consistently and comprehensively reflecting deliberate choices using conceptual understanding and analysis of the Domains. Also it should be tracked in a Lexicon, an idea that I am working on and hope to explore in a later post. Also as I have said before I feel that tooling could be used to allow better conceptual understanding and navigation and the use of analytics and possibly visualization could be used to ease naming decisions in practice by providing improvements to both static analysis and interactive autosuggest/intellisense aids.

I feel that these ideas are an attempt to create a more "formalized" approach to what Eric Evan’s does in Domain Driven Design which also covers a number of other structural ideas, I hope to extend my ideas here to better integrate some of those as well. The next steps for me are to undertake an analytic approach I also hope to explore the application of ideas like formal concept analysis and spectral graph theory techniques which may add some mathematical rigor to these ideas and perhaps allow some empirical validation and probably refinement. Now it gets really hard so it might be a while for those follow-ups.

25 March 2012

More Thoughts on Formal Approaches to Naming in Software

On the Importance of Naming in Software Systems, Part II

In my last post on naming I surveyed and attempted to categorize the conventional wisdom in regards to naming. Here I will expand on some of my ideas and extend my naming domain vocabulary.

A software system is composed of many things and most of them if not all have names. In order to talk about this I realized that I need a term to describe the things that have names, Steve McConnell makes the comment "a variable and a variable’s name are essentially the same thing". While this is effectively true, named things in a system have different types, uses, etc. In trying to come up with a name for these named things, I felt that I could do better than the phrase "named things". For the word "things" I contemplated item, object, entity, artifact, and more. These all had various connotations and possible contexts which confused the meaning. I turned to linguistics for the answer, and found the following on Wikipedia under intension, the bolding is mine:

The meaning of a word can be thought of as the bond between the idea or thing the word refers to and the word itself. Swiss linguist Ferdinand de Saussure contrasts three concepts:

the signifier — the "sound image" or string of letters on a page that one recognizes as a sign.

the signified — the concept or idea that a sign evokes.

the referent — the actual thing or set of things a sign refers to. See Dyadic signs and Reference (semantics).

So the word referent seems to be a good choice as it probably has no context with relation to software engineering. To take this further, I will use the expression Named System Referent or possibly just System Referent to refer to those things in a software system that have a name. To be clear here a name is what is referred to above as a signifier but I will just use name. A System Referent is the thing (referent) in the Software System that the name refers to. If this is confusing hopefully some examples will help. Some System Referents include but are not limited to the following¹:

Classes
Variables (local, instance, static)
Methods
Method Parameters
Packages
Database Tables, Columns, Triggers, Stored Procedures, etc.
HTML Files, CSS Files, Javascript Files, Config Files, all files
Directories
Urls
Documents
XML Elements and Attributes
CSS Classes

And the list goes on.

Another observation we can make is that System Referents can have a context or a grouping, for example System Referents which include classes, packages, variables, methods, etc. may have different conventions from other System Referent groupings such as database System Referents.

Now that we have a way to talk about named items, we can explore the very important idea of naming conventions. The name of a System Referent often consists of several components, from a linguistic stand point these components are similar to, but not the same as a lexical unit or a morpheme, for this I am choosing the term Name Unit, for example an identifier "personId" consists of two Name Units, "person" and "id". Name units imply a separator mechanism. In this example the use of CamelCase, another example is "person_id" where the name units are separated by the separator "_". So a name unit is that atomic piece of a name that is separated by a separator and a separator is either explicit e.g. "_" or implied e.g. CamelCase. In my previous post we looked at different prefix and suffix qualifiers, this is a generalization of the qualifier concept which deals with the whole name and not just one part of it.

I wish to compare, from the structural perspective, two common conventions one for the Java System Referent grouping and that of an SQL Database grouping. For the Java Naming convention we will assume the standard Java naming convention. For the database we will limit ourselves to Tables and Columns and assume uppercase Names, name units that consist only of upper case characters, the separator will be an underscore "_".

We can define formal a more formal approach to the lexical structure of a naming convention using a BNF, or in this case my own variant extended BNF, for example our BNF for variable names in Java might look like the following:

name-unit-terminal ::= [A-Z]+[a-zA-Z0-9]*

name-unit-prefix-terminal ::= [a-z]+[a-zA-Z0-9]*

<name> ::= name-unit-prefix-terminal | name-unit-prefix-terminal <name-unit>

<name-unit> ::= name-unit-terminal | name-unit-terminal <name-unit>

The Naming convention for our database column names and table names:

name-unit-terminal ::= [A-Z]+[A-Z0-9]*

<name> ::= name-unit-terminal | name-unit-terminal "_" <name>

Two Concrete examples illuminating these are as follows:

public class Person {

private Long id;

private String lastName;

private String firstName;

private String middle;

private String phone;

private String email;

public Long getId() {

return id;

}

public void setId(Long value) {

id = value;

}

public String getLastName() {

return lastName;

}

public void setLastName(String value) {

lastName = value;

}

public String getFirstName() {

return firstName;

}

public void setFirstName(String value) {

firstName = value;

}

public String getMiddle() {

return middle;

}

public void setMiddle(String value) {

middle = value;

}

public String getPhone() {

return phone;

}

public void setPhone(String value) {

phone = value;

}

public String getEmail() {

return email;

}

public void setEmail(String email) {

this.email = email;

}

CREATE TABLE PERSON (

ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

LAST_NAME VARCHAR(40) NULL,

FIRST_NAME VARCHAR(20) NULL,

MIDDLE VARCHAR(20) NULL,

PHONE VARCHAR(20) NULL,

EMAIL VARCHAR(256) NOT NULL

);

Naming Conventions consist of both a Lexical Component and a Semantic Component. In this post we have focused on the lexical aspects of naming, from the intension definition above we focused on the signifier and the referent. To really create a comprehensive approach the Semantic aspect cannot be ignored. In the intension definition above this can be viewed as focusing on the signified and the referent. This is an area that is explored in the idea of the Ubiquitous Language within the realm of Domain Driven Design.

The Semantic Elements of a system and of a naming convention are more complex and ambiguous. I do think that software systems should have a System Lexicon which would be a more concrete formalization and extension of the System’s Vocabulary or Ubiquitous Language this would also incorporate ideas encapsulated in a Data Dictionary. A System Lexicon would be an entity of some complexity and would require some commitment to build also applying it could be problematic and might require tooling such as an IDE interactive name suggestion feature or a static analysis component or both. It might even require an ontology or taxonomy oriented graphical tool for constructing and maintaining the System Lexicon depending on the size and complexity of the system. I will get into some ideas that extend the lexical aspects and move into the Semantic aspects in future posts. I know that this might seem pretty grandiose but I think this powerful stuff that could lead to better approaches to create higher quality software.

¹This list is biased to Java Web development, but these ideas should be general enough to span any software system.

19 March 2012

A Survey of the Conventional Wisdom on Software Naming

On the Importance of Naming in Software Systems, Part I

Naming is a very important aspect of software, however, its importance and relevance to building quality software is often overlooked or ignored all together. I feel that naming needs to be part of a larger more formal framework for building software. In previous posts I have advocated a framework approach to capturing reuse and standardizing software design and implementation and I feel that naming is a crucial part of such an approach. Previously I included the following quote about naming in an API design talk by Joshua Bloch:

Names matter a lot, there are some people that think that names don’t matter and when you sit down and say well this isn’t named right, they say don’t waste your time let’s just move on, it’s good enough. No! Names, in an API, that are going to be used by anyone else that includes yourself in a few months mater an awful lot. The idea is that every API is kind of a little language and people who are going to use your API needs to learn that language and then speak in that language and that means that names should be self explanatory, you should avoid cryptic abbreviations so the original Unix names, I think, fail this one miserably.

He is not the only prominent software engineer to advocate an attention to naming, in this post I will summarize and discuss several sources. I will start with his material which is part of his talk on API design which I summarized in my previous post. Actually I feel that most if not all of what he says on API design is applicable to software in general including what he has to say on naming. The summary points are as follows:

Names should be self explanatory.
Avoid cryptic abbreviations.
Don’t have multiple words meaning that same thing.
Names should be consistent – same word means same thing.
Be regular–strive for symmetry. This implies a larger context of naming and a relationship between concepts.
Code should read like prose. (As a result of good API design)

Chapter two titled "Meaningful Names" of Robert Martin’s Clean Code, written by Tim Ottinger, is dedicated to naming. A summary of the advice is as follows:

Use Intention Revealing names
Avoid Disinformation – This is a corollary to Use Intention Revealing names. This means that the components of the name actually conceptually match the intent.
Make Meaningful Distinctions – This means Pick One Word per Concept.
Use Pronounceable Names. This means Avoid cryptic abbreviations.
Use Searchable Names. This is mainly about not using single character names and magic constants.
Avoid Encodings. This means do not use suffixes or prefixes to encode type or status as a member "m_". He also discourages using "I" as a prefix to an interface. This includes avoiding Hungarian Notation.
Avoid Mental Mapping. This is an anti pattern. It is a corollary to Use Intention Revealing Names.
Class Names should consist of nouns of verb Phrases.
Method Names should consist of verbs or verb Phrases.
Pick One Word per Concept.
Use Solution Domain Names. Solution domain names refer to things like design patterns or more distinct concepts such as Queue or Binary Tree
Use Problem Domain Names – This is pretty straight forward, if your system processes medical records, you might expect a class called MedicalRecord.
Add Meaningful Context – This deals with the idea of adding suffixes or prefixes to denote additional context information in the name. The example given is use addrState vs. state.
Don’t Add Gratuitous Context – This talks about the idea of adding unnecessary or too specific suffixes or prefixes to denote context.
Choosing good names requires good descriptive Skills and Shared cultural background

Steve McConnell’s Code Complete contains the following advice¹:

A name should fully and accurately describe what the variable represents.
A good mnemonic name generally speaks to the problem rather than the solution. A good name tends to express the what more than the how. In general, if a name refers to some aspect of computing rather than to the problem, it's a how rather than a what. Avoid such a name in favor of a name that refers to the problem itself.
Use pronounceable names.
Avoid misleading names or abbreviations and use consistent abbreviations.
Avoid names that are misleading.
Avoid names with similar meanings.
Avoid variables with different meanings but similar names
Name Specific Types of Data using Standardized Prefixes – Use prefixes to denote (encode) type such as Hungarian Notation.
Use computed-value qualifiers if needed at the end of the name. Examples are: Total, Sum, Average, Max, Min, Record, String, or Pointer.
Use a Naming Convention and that naming convention should be compatible standard conventions for the language. Naming convention should distinguish among local, class, and global data. Naming convention should distinguish among type names, named constants, enumerated types, and variables.
Good variable names are a key element of program readability. Specific kinds of variables such as loop indexes and status variables require specific considerations.
Names should be as specific as possible. Names that are vague enough or general enough to be used for more than one purpose are usually bad names.
Naming conventions distinguish among local, class, and global data. They distinguish among type names, named constants, enumerated types, and variables.
Abbreviations are rarely needed with modern programming languages. If you do use abbreviations, keep track of abbreviations in a project dictionary or use the standardized prefixes approach.

I did reword some of these to make them more stand alone. You may want to pursue the sources to gain a fuller picture of each author’s perspective. The interesting thing here is while there are several key common themes to these approaches there are some very explicit contradictions. For example Clean Code recommends against Type encoding like Hungarian Notation while Code Complete recommends it. Conversely Code Complete seems to implicitly recommend against using Solution Domain Names "speaks to the problem rather than the solution" whereas Clean Code advocates their use.

I feel that when talking about naming it is easy to conflate ideas pertaining to structural (lexical) concerns and ideas pertaining to domain (semantic) concerns. Unfortunately while these ideas are separate they are also intertwined and an intelligent approach to naming needs to recognize this.

One structural theme is the idea of qualifiers, which Code Complete explicitly names, but the idea appears implicitly in Clean Code as well. Qualifiers in this context are generally prefixes and suffixes that qualify the name. Qualifiers mentioned above include scope qualifiers which denote the scope of a variable such as the prefixes "m_" for member or "global_" for global. Hungarian Notation in this context² is an example of a type qualifier. Additionally Code Complete defines computed-value qualifier suffixes to denote values which are computed values these include: Total, Sum, Average, Max, Min, Record, String, or Pointer. Interestingly qualifiers are a structural element of a name which denotes additional specific types of semantic meaning to the name. The structural and semantic construction of names is a topic I intend to continue exploring in subsequent posts.

The previous ideas address both structural (lexical) and domain (semantic) concerns now I wish to look at a targeted summary of more semantic oriented conceptual advice in Eric Evan’s Domain Driven Design:

Domain model terms are part the UBIQUITOUS LANGUAGE.
In regards to an INTENTION-REVEALING INTERFACE, name classes and operations to describe their effect and purpose without reference to the means by which they do what they promise. This relieves the client developer of the need to understand the internals. These names should conform to the UBIQUITOUS LANGUAGE so that team members can quickly infer their meaning.
In regards to MODULES aka Packages, If your model is telling a story your MODULES are chapters. The name of the MODULE conveys its meaning. These names enter the UBIQUITOUS LANGUAGE.
Name each BOUNDED CONTEXT and make the names part of the UBIQUITOUS LANGUAGE.

In Domain Driven Design the idea of conceptually developing and implementing the Domain iteratively is explored. This book reveals many approaches and concepts pertaining to that topic, the ideas that I am interested in which are targeted in the quotes above pertain to naming. A common theme in naming which would be hard to dispute is to Use Problem Domain Names. The idea of a UBIQUITOUS LANGUAGE is promoted in the book as a way to help define the domain so that all stakeholders have a common way to talk about the domain and its representation in the resulting software. Domain Driven Design promotes using the UBIQUITOUS LANGUAGE as a theme to defining the system and unifying the system documentation. To me a UBIQUITOUS LANGUAGE implies the need to track, define and attempt to codify it over time which I think implies a more formal way to document it like a lexicon is needed, an idea I will follow up on more in a future post.

Another issue with talking about software naming is the idea of context. Names pretty much always exist in a context, in Java a variable can be in the context of a method or a class. Both methods and classes themselves can exist in the context of classes and classes exist in the context of classes and packages and packages are themselves hierarchical. In fact software systems have developed the concept of Namespace to in part deal with this problem.

Each name exists in a context, and in order to talk about naming we need a language just as you need to define a language for a domain. Actually naming is a domain, a meta-domain perhaps, a domain that describes other domains. This means we need a "UBIQUITOUS LANGUAGE" for the domain of naming. So in that language I am going to define name scope context to define to context or "hierarchical place" in which a name occurs. I am purposefully not using namespace since that has other explicit meanings. I know using the words scope and context seem redundant but I felt using "scope" alone didn’t work because it has a specific meaning and context by itself was too vague, using name scope context can refer to a broader set of circumstances such as an attribute’s position in an xml document relative to its parent element or a SQL Column name relative to its parent table, a file’s directory and so on.

An interesting thing I have noticed about name scope context is that it can be redundant. For example let’s assume that we have an object named a database table named PERSON and a corresponding object named Person. I have seen the following approach to naming these:

CREATE TABLE PERSON (

PERSON_ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

LAST_NAME VARCHAR(40) NULL,

FIRST_NAME VARCHAR(20) NULL,

MIDDLE VARCHAR(20) NULL,

PHONE VARCHAR(20) NULL,

EMAIL VARCHAR(256) NOT NULL,

DELETED CHAR(1) NOT NULL DEFAULT 'N',

CREATE_DATE TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

MODIFY_DATE TIMESTAMP NULL

)

class Person {

private int personId;

private String lastName;

private String firstName;

private String middle;

private String phone;

private String email;

private boolean deleted;

private Date createDate;

private Date modifyDate;

...

Now in these examples the database table PERSON contains the column named PERSON_ID and the class Person contains the corresponding field named personId. In both of these cases ID or id respectively would be probably be better as the use of person is repeating the name used within the name scope context of the member variable. The down side of this type of non-duplication is the loss of the ability to search for the names, which I think implies that software name search probably needs more sophisticated mechanisms over that of simple text searching, another idea I will expand on in the future.

This is my first post on naming, I have three more planned and I will get into more ideas some of which come from linguistics. I feel that Domain Driven Design gets into a number of linguistic and linguistic oriented concepts, maybe what we are going for here is Semantic Driven Design, but that would be just another YADD.

¹ Code Complete provides a lot of ideas here are few more:

The name should long enough that you don't have to puzzle it out.
Avoid misspelled words in names.
Avoid words that are commonly misspelled.
Don't differentiate variable names solely by capitalization.
Creating Short Names That Are Readable. These are guidelines for shortening names when you need to, so this is a special case not a general one. I know I have run into this with Oracle 11g’s 30 Char limit.
Avoid unnecessary abbreviations.
Avoid multiple natural languages.
Avoid names that could be misread or mispronounced.
Avoid names that are different by only one or two characters.
Avoid names that sound similar.
Avoid names that conflict with standard library routine names or with predefined variable names.
Format names for readability.
Avoid excessively long names.
Code is read far more times than it is written. Be sure that the names you choose favor read-time convenience over write-time convenience.

² This use of Hungarian notation is apparently not how it was intended to be used see "Making Wrong Code Look Wrong".

05 December 2011

Responding to "Eleven Equations" Comments, Math Notation, and More

Wow that was quite an experience, I have been looking at sites like Hacker News, Reddit and DZone for years but to see my own post go to #1 on Hacker News and rise high on Reddit and DZone was quite a thrill, with over 70K hits at the time of this writing. It has completely changed my life the phone is ringing off the hook with endorsement offers, it looks like I will only be wearing Nike for the next year.

The best part was the over 200 comments on my Blog, Hacker News, and Reddit, it spawned some spirited discussion on all three locations and many people added their own equations some that I was unfamiliar with and I learned a few new things, excellent! The various discussions for the most part were great and very enjoyable and I’d like to thank everyone who contributed. In the post I commented that I wondered how I would feel about the list in a year and I already think I would change the list by one equation. I was tempted to respond to the comments on all of the forums, I know I should at least respond on my own blog, I just figured I’d do it as a post that way I could hit a number of topics.

One thing that people objected to was the title, which induced some inferences in the tone of the title. One redditer summed it up as:

Someone presuming, yet again, to know what "every" Computer Scientist should know? A title like "my favourite computer science equations" might be better. This list contains a lot of ideas which are only relevant in very narrow domains.

Yes, probably on the title comment, but the title was an intentional "spoof" if you will of a title of an article on Wired, and if you read my post you would have gotten that. Actually I confess that you might even accuse me of "Hacker News Post Popularity Hacking" as I saw the Wired post rise up and figured that there was a high probability that a similar CS related post could achieve a similar placement, actually I think it exceeded my expectations and went even higher, it shot to #1 in under two hours of my submitting it and it stayed on the front page for almost a day. I felt the tone I was setting, by using the term "rip off" and the Spinal tap joke (This one goes to eleven) that I don’t think anyone got, was subtly light hearted, perhaps it was too subtle. I recall reading that the title is key to getting people to read posts, and clearly this title was somewhat controversial which probably helped it, actually most of my titles are fairly boring or references that make them less popular, but I generally title things the way want not for popularity, well with this one exception.

Another thing that people took umbrage with was the use of the equations, as I just explained since I had set my title accordingly my format was now set. One commenter, in two different comments on a thread on Hacker News added:

I think this is a pretty silly post, to be honest. CS covers so much, and everytime I see a list of "things you should know", I have to resist the urge to roll my eyes and ignore it. But then I read it, and inevitably roll my eyes anyway.

...

I disagree. Programming is math. Highly advanced math, in fact. It's just a different type of math. And the 11 equations in the OP's article just barely touches on what CS is about. There is far more to it than that.

I mostly agree with this commenter. I admit that the sentiment that this not really about the equations but the general domain knowledge is implied, again perhaps too subtle. At one point someone mentions memorizing these equations which again means you are missing the point. It’s about knowing what these equations mean and how to use them. I feel that the math domain knowledge for our field is growing and becoming more important. Do I use any of this for my day job? Well no, but I hope to one day. As to the broad domain I actually have a post "Math For Programmer TNG" that covers a very broad domain of math that I think is potentially relevant. Also not all areas of math are going to be used in every math related job.

As to the narrowness insinuated by the two comments above, the post touched on the following domains: Combinatorics, Logic, Boolean Algebra, Set Theory, Lattice Theory, Probability Theory, Statistics, Category Theory, Number Theory including Complex Numbers, Relational Algebra, Abstract Algebra, Linear Algebra, Shannon Information Theory, Algorithmic Information Theory, Kolmogorov complexity, Graph Theory, Formal Language Theory, Automata Theory, Combinatory Logic, Lambda Calculus, Machine Learning, Data Mining, Encryption, Limits, Topology, Fourier Analysis and The Simpsons. Not exactly narrow, or am I missing something here? In fact the list was picked in part for diversity.

The Pumping Lemma turned out to be fairly controversial, I do admit that the Pumping Lemma was perhaps a bit "contrived" as an Equation and as one person put it: "that's the most hideous formulation of the pumping lemma I have ever seen". That formulation came from Wikipedia. As I mentioned I really did want something from Formal Language Theory or Automata Theory and it’s not so much about determining which language is regular but knowing the difference between a Regular Language and a Context Free Language so that you are not the cargo cult coder who tries to parse html with regular expressions because you understand these distinctions. I think the following exchange summed up what I was going for:

(except the Pumping Lemma, WTF?)

Reply:"Seriously, you never learnt theoretical computer science? As in, theory of computation? Automata theory? Complexity theory? Nothing like that?"

There were a number of comments about the use of mathematical notation including the following:

"Shannon's Information Theory, Eigenvector, DeMorgan's Laws, etc. None of those names are meaningful or descriptive. And then the greek letters and made up symbols. Math could learn something from Computer Science:

"https://www.google.com/search?q=readable+code

This commenter seemed to have an issue with the actual names of things in math. I found this to be very narrow minded. Math is many things including being a domain of knowledge, that’s like complaining about the names: Ytterbium in Chemistry, Bose–Einstein Condensate in Physics, Borneo in world Geography, or Impressionism in art. Really!?!

Now the complaints about the notation are more understandable but still unreasonable in my opinion, math in some respects is a language, perhaps somewhere between a natural language and programming language, it has a notation and to really be able to understand it you need to learn this notation (language). Someone remarked that symbols were "unfriendly". Well I find Chinese, Arabic¹, etc. to consist of unfriendly symbols, but that’s because I don’t know them. Also like natural languages you have inconsistencies, and multiple meanings for symbols. One person complained about my notation for the Set version of De Morgan's laws, this may be a nonstandard notation, I saw it in some course notes on probability and set theory and I liked it and I just used it without thinking about it. I do think it’s more elegant. This makes a good point about math. If you learn it and read from diverse sources you will encounter notational differences. This has been my experience, in fact if you look on the Wikipedia page for De Morgan's laws you will find the following example from Modal Logic:

I am used to the notation used in Hughes and Cresswell:

That’s just how things work. You can get hung up on it and complain or you can accept it and move on. Don’t get me wrong it would be nice if there was a notational standard that every one followed for all math.

Like natural languages Math has its equivalences of "Homographs", for example the following symbols can have multiple context dependent meanings: |b| can mean absolute value of b if it is a number or cardinality of b if it is a set or length of b if it is string, ∂ can mean Partial Derivative or Topological Boundary, and as we saw ∧ and ∨ can mean meet and join or "logical and" and "logical or".

Just as Math has "Homographs", as in natural languages it also has its equivalences of "Polysemes", meaning that there are multiple ways to express the same ideas. For example set difference can be expressed as (S – T) or (S \ T), the Powerset notation can be either 2^S or P(S), and meet and join can be expressed as the either of the following sets of symbols:

The list of both math "Homographs" and "Polysemes" is much longer.

For De Morgan's laws, someone added the following generalization using the Existential and Universal Quantifiers, which is really interesting and also illustrates the duality:

It's not much of a generalization, but I prefer the predicate version of De Morgan's:
~∀x p(x) ≡∃x ~p(x)
~∃x p(x) ≡∀x ~p(x)

If you read the Wikipedia Article it includes this generalization as well:

The above equations do generalize De Morgan's laws within Logic and Set Theory respectively but my point was to further generalize them in terms of Lattices which is a different concept all together and gets at the deeper intuition about the interrelation of Logic and Set Theory, quoting from Combinatorics the Rota way: by Rota and Kung, an amazing book I am trying to read, emphasis on the word trying:

"As John Von Neumann put it, the theory of Boolean Algebras is "pointless" set theory.

This set version would be written using the complement exponentiation notation as follows:

I really do prefer this notation, I originally encountered it in "Introduction to Probability Theory" by By Ali Ghodsi and Cosmin Arad, lecture 1(pdf). This notation is also used in Combinatorics the Rota way. So this notation is now the official notation of my blog, get used to it or hit the road. ;)

A commenter (who was also the demorgans notation complainer) responded with the following:

...
For the rest, they seem more likely to be used by someone doing something related to machine learning/AI/information theory than you run of the mill I-spend-all-day-parsing-user-input programmer.

Thank you for making me feel like I'm not a 'true' computer scientist. Next would you like to tell me about how I'm not a 'man' because I don't like to work on cars?

Well I hate to break it to you, but if this list makes you feel that way then chances are you are like me: a software developer with a CS Degree and that does not make you or me a Computer Scientist. This is from Wikipedia:

"The general public sometimes confuses computer scientists with other computer professionals having careers in information technology, or think that computer science relates to their own experience with computers, ...

Oh, and yes you are not a man if you do not at least know how to change your oil, including the filter. ;)

Also a number of people remarked that I did not explain the equations in enough detail and I did not provide code examples. It was meant as high level post, if I were to explain all of these equations the post would have been excessively long, maybe even a whole book. As to the code examples again it would be too long, if you want code examples that relate to math, I have posts for that, I recommend the following posts on my blog, some were linked in the original post: "O(log(n))", "Triangles, Triangular Numbers, and the Adjacency Matrix", "The Combinatorial and Other Math of the Java Collections API", "Math You Can Use : Refactoring If Statements with De Morgan's Laws", and "Monoid for the Masses".

Regrettably some people responded with the sentiment of: I’m not good at math or that math is too hard, I don’t think I can get a CS degree. Fortunately there were a number of comments of encouragement to counter these sentiments, my favorite was this one:

Don't let this be intimidating to you - instead of asking yourself "how come I don't know this?" ask yourself "how can I learn more about this?". This might sound cheesy and simplified but it's as simple as "nobody was born knowing this". I'm 31 and I wish I had the money to go back in education and learn all of this with the official way but for now I'm just picking resources on the web and who knows? It just might happen...

I have a post planned to talk about my experiences and thoughts about learning math, I will say if you are a programmer you are doing math in a sense. You just need to formalize your understanding of the underlying concepts. If it is any consolation I find Math to be very hard at times, I pretty much suck at it in the traditional sense, but it is so beautiful and when you grok a concept, at least for me, it is an incredible buzz. Every new concept that you learn changes how you see everything, I see so much math in programming and the world now it’s scary, I just wish I could figure out how to express it. Everything is math. Over the last few years my ability to understand things has greatly increased and continues to do so. Yes it’s hard but it is so worth it.

A number of commenters added their own equations to the discussion, and this was the best part, which included a few new things for me. In general my list was mostly directed at more pure math oriented equations that I felt were fairly central to a specific discipline and encapsulated certain principles. Here are a few recommendations and thoughts:

One person pointed out the absence of P=NP or P ≠ NP, to which I would respond "D’oh!" If I was doing this over I might even replace the Pumping Lemma with P=NP.

One poster recommended triangular numbers:

Here's a real simple and practical equation: 1+2+3+4 . . . N = N(N+1)/2
This equation represents the number ofedges on a complete graph with N+1 vertices or the number of possible pairings given N+1 objects. Useful whenestimating O(N) for certain algorithms that involve comparing an item to everyother item in a set.

I love Triangular Numbers and this is a great equation, I wrote a whole post "Triangles, Triangular Numbers, and the Adjacency Matrix" on exactly what he is mentioning, and considered that equation but rejected it because triangular numbers are sort of a special case of the Binomial Coefficient equation:

Here are some of the other equations and theorems that were suggested, I wouldn’t really consider these because they didn’t fit my theme and criteria but that doesn’t mean they aren’t interesting:

One commenter accused me of shoehorning in Euler’s Identity. This is partially true. He also accused me of "Name Dropping". That’s not true. A defender supplied the following course notes:

Analytic Combinatorics: A Primer

I followed up on that and found that you can download a copy of Analytic Combinatorics:
by Flajolet and Sedgewick. Warning: pretty advanced stuff.

As the popularity of my post was waning on Hacker News a math oriented post called "Fuzzy string search" was making its way up the ranks. It’s an interesting post and I think it helps illustrate a couple of points I have been making here, first of all it uses mathematical notation, which is not uncommon when writing about these types of topics. Additionally the post includes the equation for the Triangle Inequality:

An equation that crossed my mind when writing my equations post. It is not the equation itself, but the concept, if you become familiar with the idea of a Metric or Metric Spaces, you instantly recognize this equation concept without having to read the associated text, not that you shouldn’t read the associated text. Knowing these ideas gives you the ability to better understand more complex ideas that are based on these concepts. I think this is the case in all knowledge domains. Also I noticed that there was not one comment on that post complaining about math notation.

A lot of what went on is part of an ongoing conversation, I blogged about it in "The Math Debate" which links to other famous blogs of the same ilk. Ultimately the equations probably don’t apply to most programming jobs, the title contained as some commenters pointed out "Computer Science Geeks" not "Corporate Software Developers". A few people castigated the commenters who seemed to be advocating positions of ignorance, thanks, BTW. I personally find it tragic, the following comment sums this up:

"Someone with a CS degree who knows nothing of De Morgan's law should have their credit removed and be demoted to making websites or performing tech support tasks.

Reply: "There goes 98% of your CS graduates then. I wish I was joking, but alas.."

Of course it’s a bit I ironic that as I am wrapping this up, "A co-Relational Model of Data for Large Shared Data Banks" by Erik Meijer, Gavin Bierman just popped up on Hacker News, the relevant quote is:

Every programmer is familiar with the two dual forms of De Morgan’s laws

The writing quality of the post was criticized. I don't know what people will think of this post, I feel like I just wrote a clip show. I admit that previous post was not my best effort, while it was fun putting it together it was not easy, I had hoped to learn all of the equations in more detail, I did to some degree, I confess I still don’t fully get the Y-Combinator, but I will. I finally decided to just push it out and move on. This post, while being relevant to my mission of blogging about math was something of a social experiment in popularity, it was interesting but I will be returning to the true mission of my blog which is my journey through math and the pursuit of a real Software Engineering discipline.

Ultimately I am interested in the readers, like a young colleague of mine who was too busy to comment because he was following the links to learn new things he didn't know, not the complainers and naysayers. There were many positive comments and again, thanks. I did feel a needed to refute some of the negative comments. The bottom line is if you want to read about math, you need to get used to the notation, it probably won't help you to build CRUD Web apps but it will hopefully help you take your career to the next level.

¹Actually this is not true I find them to be quite beautiful, especially Arabic.