02 May 2011

Monoid for the Masses

If you don’t know what a Monoid is, it is an Algebraic Structure, and before we go any further, let me just say: Don’t Panic! If you have a negative or skeptical reaction because the idea of Algebra makes you think of something that you learned in middle school and don’t really use, then you’ll be relieved to hear that you mostly don’t even need any prior knowledge of Algebra to understand the concept that I want to talk about here. In fact if you are programmer you probably already know it, you just don’t know that you know it.

So what is this mysterious Monoid that you already know? It is String concatenation. Take the following Java code for example:

String string1 = "aaaa";

String string2 = "bbbb";

String string3 = string1 + string2;

See I told you that you already knew it.

If your CS Math education was similar to mine, which I hope for your sake it wasn’t, you may have learned about Group Theory and it seemed completely abstract and unrelated to the programming related classes. Well it turns out it is relevant. Also whether you learned this or not I am hoping that you can follow what I am presenting here without any prior knowledge.

A Monoid can be thought of as a simpler version of a Group, and we will define it more formally, later. Our Monoid, strings under the operation of concatenation uses the plus sign [+] operator and a String will be bounded by quote marks ["], pretty standard stuff, and it isn’t even hard.

To talk about this properly we will restrict the character set to strings that only contain "a" or a "b", so that any string will only contain these two characters, in fact think of all strings that are all the possible combinations of only these two characters, so the Set is all strings that can have an "a" or "b" in any position, some examples are:

"a","b","ab","abb", "aaa", "bbb", "baba",...

You get the idea. This forms a Set of all possible "ab" string combinations. This Set is Closed under + (concatenation), which means if you concatenate any two strings in the set you get a string that is still in the set. So every new string is still made up of only a’s and b’s, for example:

"aa" + "abab" = "aaabab"

Concatenation is Associative, for example:

("aa" + "bb") + "aba" = "aa" + ("bb" + "aba") ="aabbaba"

Concatenation has an identity element, an Identity means that when you combine it with any element of the set you get the same element back, it’s like zero in addition, the Identity element is the empty string "", think of it like the string version of zero, after all it has zero length, for example:

"ab" + "" = "ab"

One thing of note is that the Set of all "ab" strings is actually all strings with either "a" or "b" plus (Union with) the empty string meaning that it also includes the empty string "", this is an important distinction. I did not mention it earlier to simplify the initial explanation.

To recap, Strings under concatenation form a Monoid and a Monoid is defined as:

  • A Set and a Binary Operation where the Operation is Closed on the Set.
  • The Operation is Associative.
  • The Set contains an identity element under the Operation.

So why isn’t this a Group? A Group requires an Inverse, in a sense a Group is a Monoid with an Inverse and there is none here, for example can you think of an X for:

"ab" + X = ""

Additionally you should note that concatenation is not Commutative, for example:

("aa" + "bb" ≠ "bb" + "aa") ⇔ ("aabb" ≠ "bbaa")

However, the concatenation of the identity is Commutative, it also referred to as a double sided identity:

"" + "ab" = "ab" + "" = "ab"

Why only [a,b]? Well to keep it simple, this is referred to as the alphabet and you can define your alphabet as anything you want such as [0,1], [a-zA-Z], all ASCII characters, etc.

The really cool thing about this Monoid is that it’s the basis for Formal Language Theory, and actually this is really a first lesson in Formal Language Theory and it is also good introduction to a basic concept from Abstract Algebra. Not to mention this could probably be taught in middle or even grade school.

3 comments:

  1. Nice explanation. I missed some of this concepts in my CS college, and need to recover it yet. Thanks for the article!

    Obs: you misspelled 'moniod' some times.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete