Elegant Coding: Lattice Theory

Showing posts with label Lattice Theory. Show all posts

11 March 2013

Programming and Order Theory

Covariance, Contravariance and Order Theory

In this post I make the observation that covariance and contravariance in programming are what are known as Order Duals. I am not the first person to make this observation, however, these ideas often tend to be buried in academic research papers like "Adding Axioms to Cardelli-Wegner Subtyping" by Anthony J H Simons, don’t get me wrong I love these types of papers, they give me hope and inspiration that software engineering will someday become a first class engineering citizen. Unfortunately, these types of papers tend to be too theoretical and thus not very accessible for the average developer. This is unfortunate as the idea of covariance and contravariance as order duals both puts these concepts into the mathematical context of order theory and possibly gives programmers some native context for order theory. So hopefully this post will make these ideas more programmer friendly.

I previously wrote a post about lattice theory, which is part of the more general order theory, where I talked about some basic order theory ideas such as duality. Order theory occurs quite a lot in software and programming and this is part of a series of posts to talk about those occurrences.

Covariance and contravariance receive a fair amount of attention, as they should, in software blogs and some of these posts include some interesting observations. One, perhaps slightly off topic observation, is "Liskov Substitution Principle is Contravariance" which is an interesting observation and interesting post if you overlook the disdainful tone towards OO. Another more relevant post which is a nice post about "Covariance and Contravaiance in Scala" relates these ideas to category theory which is relevant especially since apparently you can think of "Category Theory as Coherently Constructive Lattice Theory", warning heavy going in that paper.

Defining Covariance and Contravariance

To me one of the most striking and perhaps apropos examples of order theory in software is that of covariance and contravariance which Eric Lippert defines on his blog as:

The first thing to understand is that for any two types T and U, exactly one of the following statements is true:

T is bigger than U.

T is smaller than U.

T is equal to U.

T is not related to U.

For example, consider a type hierarchy consisting of Animal, Mammal, Reptile, Giraffe, Tiger, Snake and Turtle, with the obvious relationships. (Mammal is a subclass of Animal, etc.) Mammal is a bigger type than Giraffe and smaller than Animal, and obviously equal to Mammal. But Mammal is neither bigger than, smaller than, nor equal to Reptile, it’s just different.

He has an eleven part series on covariance and contravariance, his posts cover some C# implantation details but the ideas are generally applicable and looking at one language’s details can help with comparing and contrasting to other languages.

Wikipedia includes the following definition, the animal example is pretty popular:

Covariant: converting from wider (Animals) to narrower (Cats).

Contravariant: converting from narrower (Triangles) to wider (Shapes).

Invariant: Not able to convert.

Including this is slightly redundant but this definition captures the conversion aspect and defines the relationships explicitly.

Covariance and Contravariance as Order Duals

The above are definitions that have order theory written all over them. In fact that is pretty much a text book definition of an order Relation in that it is reflexive, transitive, and antisymmetric. It is reflexive since Animal = Animal, transitive since Animal ≤ Mammal ≤ Cat implies Animal ≤ Cat, and antisymmetric since Animal ≤ Mammal implies not Animal ≥ Mammal and in the animal example there are cases of both comparability and incomparability as you would find in a partial order.

As you can see from the above definitions both sets of terms, wider/narrower or bigger/smaller, which are the same, define an order dual for comparison. To write it more formally we will call the set C classes of various types in an OO hierarchy. So covariant would be represented by less than or equals ≤ and contravariant would be represented by greater than or equals ≥ and a set of classes with these order relations can be written with mathematical notation as (C, ≤) = (C, ≥)^d .

Types as Sets of Fields

It was in researching this post that I came across the paper "Adding Axioms to Cardelli-Wegner Subtyping". These kinds of discoveries are one of the reasons I write these posts. In that paper they quote another paper "On understanding types, data abstraction and polymorphism" by Luca Cardelli and Peter Wegner:

a type A is included in, or is a subtype of another type B when all the values of type A are also values of B, that is, exactly when A, considered as a set of values, is a subset of B

The ideas about types and subtypes covered in these papers extend beyond which fields a class or object has, however, I thought it would be interesting and beneficial to limit the discussion to that case. One reason is that if you take the fields of an object or class then all subtype collections of fields will be found in the powerset and all subtypes will be a subset relation and these can be drawn as my favorite lattice, yes I have favorite lattice, the powerset lattice. Also in this case covariance and contravariance are now defined as the subset and superset operations on the powerset lattice.

Types as Sets of Fields in the Real World

Now I always feel that a real example helps quite a bit so I have created a set of example classes, Scala traits actually, which illustrate the above ideas using a quasi-real-world example. Please note that these code examples are designed for the purposes of illustrating these ideas and may contain design issues that one would not implement in the real world, but they should be close enough to bridge the conceptual gap, if you will. Also this first example is should be applicable to dynamically typed languages that might use duck typing or structural typing.

The following Scala traits define a possible domain object hierarchy that could be used to persist data to a database and render it back to a web page among other possible uses:

trait BaseDomain {

override def equals(that: Any) : Boolean

override def hashCode : Int

}

trait PersonInfo extends BaseDomain {

var firstName : String

var lastName : String

}

trait Address extends BaseDomain {

var street : String

var street2 : String

var city : String

var state : String

var country : String

var zipCode : String

}

trait PhoneNumber extends BaseDomain {

var phoneNumber : String

var extension : String

}

trait Person extends BaseDomain {

var personInfo : PersonInfo

var address : Address

var phoneNumber : PhoneNumber

}

These traits yield the following field powerset lattice:

Now suppose we would want to define a type for each of the above lattice points, which we probably would not do but there may be cases to do similar types of things in the real world. Let’s define the following Scala traits that wrap the above domain object hierarchy elements:

trait PersonInfoTrait extends BaseDomain {

var personInfo : PersonInfo

}

trait AddressTrait {

var address : Address

}

trait PhoneNumberTrait extends BaseDomain {

var phoneNumber : PhoneNumber

}

trait PersonInfoAddressTrait extends AddressTrait with PersonInfoTrait {

}

trait AddressPhoneNumberTrait extends AddressTrait with PhoneNumberTrait {

}

trait PersonInfoPhoneNumberTrait extends PhoneNumberTrait with PersonInfoTrait {

}

trait PersonTrait extends PersonInfoAddressTrait with AddressPhoneNumberTrait with PersonInfoPhoneNumberTrait {

}

Since Scala traits support multiple inheritance we can define the above type hierarchy which can be drawn as the powerset lattice that it is:

Again we can see covariant and contravariant types defined on this lattice and each relation is actually the subset superset relation on fields.

I feel the basic observations and the above examples on order duals and covariance and contravariance make the ideas pretty strait forward for field set context. In writing this I delved into a number of papers, adding some of them to my "to read list", on Types in programming and Type Theory and I feel there are probably some deeper insights and implications of all this.

08 October 2012

The Ubiquitous Patterns of Pascal’s Triangle

Pascal’s Triangle is one of the most fascinating and intriguing objects in math, I was first introduced to it in the sterile context of high school algebra as an aside. Only decades later did I discover its true grandeur when I rediscovered it in the context of combinatorics and number theory. I will attempt to provide some, hopefully, enlightening images but I feel that mathematical notation is both important and beautifully concise and allows one to have a transcendent experience with math so we will start with a formula, one of my “Eleven Equations”. The more I learn the more I feel it is more than justified to be in that list. I also feel that its centrality in combinatorics and probability theory makes it an equation that every math or computer science geek or enthusiast should be able to both recite off of the top their heads and be able to apply, if you don’t agree maybe this post will inspire you to think differently. The equation is that of the binomial coefficient formula and it describes each element of Pascal’s Triangle:

The above formula evaluates under the rules that c ≤ r, r ≤ 0 and 0! = 1. I prefer the use the letters r and c over the traditional n and k as in n chose k, one can as easily say r choose c. The reason I prefer r and c is that r means row and c means column and these values can be viewed as a way to address positions within Pascal’s Triangle, for example using the binomial coefficient notation from the above formula Pascal’s Triangle can be represented as:

As you can see in the above representation each position is the address (r,c) given by the coefficient notation. Evaluating the expressions above with the Binomial Coefficient formula yields the more common representation of Pascal’s Triangle:

Binomial Theorem

If I recall correctly and has been a while, the high school algebra introduction to Pascal’s triangle was mostly tied to the Binomial Theorem, which is written with sigma (summation) notation as:

For our purposes we are interested in the one variable version which can be written as:

This can be expanded to:

This can be written in "Pascal’s Triangle Form" as:

Additive Structure

These types of introductions often focus on the additive properties as well. Each element is the sum of the two elements above it, which can be illustrated as:

The logic of this representation assumes that there is a zero on each side to add to propagate the ones on both side, we can rewrite this using the binomial coefficient notation as:

I admit this is where it starts to get more interesting to me. Not to denigrate its importance but I never really got that excited by the whole binomial theorem thing, although the series e stuff is pretty cool. If we pick 5 choose 3 in the above representation, it can be written as the following sum:

If you look at the above representation you can see that this can be done for all entries that have two parents, this can be generalized as:

This formula, which is a recurrence relation, is known as Pascal’s Identity, it is one of several formulas known as Combinatorial Identities.

Powers of Two

Another interesting and extremely important pattern in Pascal’s Triangle is the fact that the values of each row sum to the power of two of the row number, which the following representation depicts:

This can be written as the following formula, I think this also qualifies as an identity:

Sets

As I mentioned at the beginning the binomial coefficient formula is read as r choose c. I was trying to get back to that but I took the long way, so looking at our examples above our 5 choose 3 means if I have five things how many ways can I choose three of them which is 10 ways . This is useful for counting, in combinatorics and probability theory I could take a deck of 52 cards and count how many 5 card hands I could choose and calculate probabilities of getting various hands, these are pretty standard applications of these types of counting problems. However, if we step back and look at this counting, where each row represents a set of items, then each column represents the number of subsets of size c, this another reason I prefer c here it means both column and cardinality of the subset. We can create a representation of Pascal’s Triangle to show this, but it does become unwieldy pretty quickly due to the exponential growth of each row, we would get:

In this representation each number represents a set element, these can map to any sets with these cardinalities. Each column is wrapped by parentheses. The ones on the left are created by the empty set and the ones on the right are created by the set itself. So each row in Pascal’s triangle gives the structure of the powerset of a set with the cardinality of the row number. Also it is not a coincidence that each row sums to a power of two since the cardinality of the powerset is the cardinality of the set raised to the power of 2.

Powerset Lattice

Since I’ve had lattice theory on my mind lately, I couldn’t help but calling on our old friend the powerset lattice, as you can see in the context of the above each powerset lattice is going to have a row from Pascal’s triangle imprinted on it, each antichain consists of sets of the same cardinality and the number sets in the antichain maps back to a column:

Principle of Inclusion and Exclusion (P.I.E)

The principle of inclusion and exclusion is used to count the size (cardinality) of a union of sets based on the cardinality of the sets and their intersections, the most common form, which appears in many probability texts, is:

|A ∪ B| = |A| + |B| - |A ∩ B|

The reason for this is that if you take the size of the union two sets, if they have any common elements then they counted twice, so subtracting the intersection removes the duplicate count. This type of problem, counting things more than once and then needing to remove the duplicates comes up in other counting problems as well. The venn diagram below shows the intersection of two sets, these are the elements that get counted twice in |A| + |B|:

Now as a disclaimer this Pascal’s triangle correlation is strictly my own observation so it might be wrong, but it seems to hold. If you take the formula from above and add a term for the empty set, which is not needed as it evaluates to zero you get |∅| + |A| + |B| - |A ∩ B|. Now you can see the number of elements for each term with number of sets (0, 1, 2) are 1 2 1, row 2 of Pascal’s triangle, for three you have 1 3 3 1:

\|A ∪ B ∪ C\| =
	\|∅\| +	(1
	\|A\| + \|B\| + \|C\|	(3
	- \|A ∩ B\| - \|A ∩ C\| - \|B ∩ C\|	(3
	+ \|A ∩ B ∩ C\|	(1

And for four we have (1 4 6 4 1)

\|A ∪ B ∪ C ∪ D \| =
	\|∅\| +	(1
	\|A\| + \|B\| + \|C\|+ \|D\|	(4
	- \|A ∩ B\| - \|A ∩ C\| - \|A ∩ D\| - \|B ∩ C\| - \|B ∩ D\| - \|C ∩ D\|	(6
	+ \|A ∩ B ∩ C\| + \|A ∩ B ∩ D\| + \|A ∩ C ∩ D\| + \|B ∩ C ∩ D\|	(4
	- \|A ∩ B ∩ C ∩ D \|	(1

This does not prove it but I would bet the pattern continues. Like I said this is my own observation, so it might be wrong, if you know either way please let me know.

Binomial Distribution

The binomial distribution is given by the following formula:

I converted it to my r,c nomenclature, I may be going too far using it here but I wanted to keep consistent, remember r=n, and c=k. It is a discrete probability distribution for the number of successes of r independent yes/no experiments each of which yields success with probability p. If you set p=1/2, the symmetric case, then the row number r divided by 2^r gives you the binomial distribution. By the central limit theorem this symmetric case approaches the normal distribution as r increases.

Central Binomial Coefficient and Symmetry

The above representation shows the symmetry in Pascal’s Triangle, each row is colored in a way that highlights the numbers that are the same, the first and last columns consisting of ones match each other obviously, but in each row the second column matches the second to last column, and the same for the third column and so one. This gives rise to, you guessed it, another combinatorial identity:

The columns highlighted in the gray box are special case called the central binomial coefficient. If you notice each even numbered row has an odd number of columns with the central one only occurring only once, it still follows the above identity but the identity evaluates to a value equaling itself. The central binomial coefficient is given by the following form, I use n here since we don’t care about row/column in this case:

Figurate Numbers

There is an area of math that deals with figurate numbers, these are numbers formed by arranging dots to form various polygons which form sequences as you expand size of each polygon, I looked at some interesting properties of triangular numbers in a previous post, for example here are the first 5 triangular numbers:

Triangular numbers are created by adding up the natural numbers, but there is a binomial formula which evaluates to a closed formula:

The idea of figurate numbers can be extended into higher dimensions, so not only do you have numbers that create polygons you can create polyhedrons as well. When you add up triangular numbers, you get the tetrahedral number which can be described by an analogous formula:

This can be visually depicted as stacking up the triangular number to form tetrahedrons:

Pascal’s Triangle has a number of relations with figurate numbers, but the most notable are the triangular and tetrahedral numbers, shown in red and blue respectively:

Also if you look at the binomial expression for these you will notice that the column is fixed, for triangular numbers it is c=2 and for tetrahedral numbers it is c=3, this maps our addressable form to the second and third columns as shown.

The above tabular form is based on a table from Concrete Mathematics, 2nd Edition by Ronald L. Graham, Donald E. Knuth and Oren Patashnik. Again you can see column one, where c=1, is the natural numbers, column two is the trianglular numbers, the summation of natural numbers and three is the tetrahedral numbers, the summation of the triangular numbers. This would imply that this pattern continues to progress, and it does. The next column are the pentatope numbers, which describe a series of four dimensional polytopes, a polytope is an extension of a polyhedron into higher dimensions, called pentachorons which are formed by stacking three dimensional tetrahedrons in four dimensions. These numbers are given by the formula:

And these numbers progress into higher dimensions, each summing the numbers immediately below them. These are generalized as the pyramidal numbers and are described by the following table:

Number	Summation Form	Binomial Form	Closed Form

Powers of Eleven

There is another interesting pattern in Pascal’s Triangle each successive row is equivalent to a power eleven of as you can see in the following table:

0	11⁰	1
1	11¹	11
2	11²	121
3	11³	1331
4	11⁴	14641
5	11⁵	161051

Oops, I fibbed a little here, 11⁵ does not equal the row (1 5 10 10 5 1), this breaks down due to the need to carry 1 to make ten. However, the pattern will hold if you push up the number base, for example 11 hexadecimal, which I will write as 11₁₆, to the 5^th power is (11₁₆)⁵ = 15AA51, remember A equals 10 so now the pattern holds, and it will continue to hold as long as you keep upping the number base, but remember (11₁₆)⁵≠ 11⁵.

Sierpinski Triangle

There is another interesting relation to another mathematically interesting triangle, the Sierpinski Triangle aka Sierpinski gasket is a fractal that can be constructed by taking an equilateral triangle and removing the center which is an equilateral triangle of ¼ area and then repeating this on each of the three remaining triangles, and so on. It can also be created by applying modulo 2 to each element in Pascal’s Triangle as shown below. The zeros are grayed out to highlight the pattern:

And the List Goes On...

As we saw we ventured into several areas of mathematics, combinatorics, number theory, probability theory, fractal geometry and geometry (topology), and lattice theory. There are many other patterns such as relations to polytopes, cellular automata, a pattern with prime numbers and more. The Wikipedia page has many interesting patterns, some of which I used here also two other interesting sites are Christopher Olah’s post and this.