Covariance, Contravariance and Order Theory
In this post I make the observation that covariance and contravariance in programming are what are known as Order Duals. I am not the first person to make this observation, however, these ideas often tend to be buried in academic research papers like "Adding Axioms to Cardelli-Wegner Subtyping" by Anthony J H Simons, don’t get me wrong I love these types of papers, they give me hope and inspiration that software engineering will someday become a first class engineering citizen. Unfortunately, these types of papers tend to be too theoretical and thus not very accessible for the average developer. This is unfortunate as the idea of covariance and contravariance as order duals both puts these concepts into the mathematical context of order theory and possibly gives programmers some native context for order theory. So hopefully this post will make these ideas more programmer friendly.
I previously wrote a post about lattice theory, which is part of the more general order theory, where I talked about some basic order theory ideas such as duality. Order theory occurs quite a lot in software and programming and this is part of a series of posts to talk about those occurrences.
Covariance and contravariance receive a fair amount of attention, as they should, in software blogs and some of these posts include some interesting observations. One, perhaps slightly off topic observation, is "Liskov Substitution Principle is Contravariance" which is an interesting observation and interesting post if you overlook the disdainful tone towards OO. Another more relevant post which is a nice post about "Covariance and Contravaiance in Scala" relates these ideas to category theory which is relevant especially since apparently you can think of "Category Theory as Coherently Constructive Lattice Theory", warning heavy going in that paper.
Defining Covariance and Contravariance
To me one of the most striking and perhaps apropos examples of order theory in software is that of covariance and contravariance which Eric Lippert defines on his blog as:
The first thing to understand is that for any two types T and U, exactly one of the following statements is true:
- T is bigger than U.
- T is smaller than U.
- T is equal to U.
- T is not related to U.
For example, consider a type hierarchy consisting of Animal, Mammal, Reptile, Giraffe, Tiger, Snake and Turtle, with the obvious relationships. (Mammal is a subclass of Animal, etc.) Mammal is a bigger type than Giraffe and smaller than Animal, and obviously equal to Mammal. But Mammal is neither bigger than, smaller than, nor equal to Reptile, it’s just different.
He has an eleven part series on covariance and contravariance, his posts cover some C# implantation details but the ideas are generally applicable and looking at one language’s details can help with comparing and contrasting to other languages.
Wikipedia includes the following definition, the animal example is pretty popular:
- Covariant: converting from wider (Animals) to narrower (Cats).
- Contravariant: converting from narrower (Triangles) to wider (Shapes).
- Invariant: Not able to convert.
Including this is slightly redundant but this definition captures the conversion aspect and defines the relationships explicitly.
Covariance and Contravariance as Order Duals
The above are definitions that have order theory written all over them. In fact that is pretty much a text book definition of an order Relation in that it is reflexive, transitive, and antisymmetric. It is reflexive since Animal = Animal, transitive since Animal ≤ Mammal ≤ Cat implies Animal ≤ Cat, and antisymmetric since Animal ≤ Mammal implies not Animal ≥ Mammal and in the animal example there are cases of both comparability and incomparability as you would find in a partial order.
As you can see from the above definitions both sets of terms, wider/narrower or bigger/smaller, which are the same, define an order dual for comparison. To write it more formally we will call the set C classes of various types in an OO hierarchy. So covariant would be represented by less than or equals ≤ and contravariant would be represented by greater than or equals ≥ and a set of classes with these order relations can be written with mathematical notation as (C, ≤) = (C, ≥)d .
Types as Sets of Fields
It was in researching this post that I came across the paper "Adding Axioms to Cardelli-Wegner Subtyping". These kinds of discoveries are one of the reasons I write these posts. In that paper they quote another paper "On understanding types, data abstraction and polymorphism" by Luca Cardelli and Peter Wegner:
a type A is included in, or is a subtype of another type B when all the values of type A are also values of B, that is, exactly when A, considered as a set of values, is a subset of B
The ideas about types and subtypes covered in these papers extend beyond which fields a class or object has, however, I thought it would be interesting and beneficial to limit the discussion to that case. One reason is that if you take the fields of an object or class then all subtype collections of fields will be found in the powerset and all subtypes will be a subset relation and these can be drawn as my favorite lattice, yes I have favorite lattice, the powerset lattice. Also in this case covariance and contravariance are now defined as the subset and superset operations on the powerset lattice.
Types as Sets of Fields in the Real World
Now I always feel that a real example helps quite a bit so I have created a set of example classes, Scala traits actually, which illustrate the above ideas using a quasi-real-world example. Please note that these code examples are designed for the purposes of illustrating these ideas and may contain design issues that one would not implement in the real world, but they should be close enough to bridge the conceptual gap, if you will. Also this first example is should be applicable to dynamically typed languages that might use duck typing or structural typing.
The following Scala traits define a possible domain object hierarchy that could be used to persist data to a database and render it back to a web page among other possible uses:
trait BaseDomain {
override def equals(that: Any) : Boolean
override def hashCode : Int
}
trait PersonInfo extends BaseDomain {
var firstName : String
var lastName : String
}
trait Address extends BaseDomain {
var street : String
var street2 : String
var city : String
var state : String
var country : String
var zipCode : String
}
trait PhoneNumber extends BaseDomain {
var phoneNumber : String
var extension : String
}
trait Person extends BaseDomain {
var personInfo : PersonInfo
var address : Address
var phoneNumber : PhoneNumber
}
These traits yield the following field powerset lattice:
Now suppose we would want to define a type for each of the above lattice points, which we probably would not do but there may be cases to do similar types of things in the real world. Let’s define the following Scala traits that wrap the above domain object hierarchy elements:
trait PersonInfoTrait extends BaseDomain {
var personInfo : PersonInfo
}
trait AddressTrait {
var address : Address
}
trait PhoneNumberTrait extends BaseDomain {
var phoneNumber : PhoneNumber
}
trait PersonInfoAddressTrait extends AddressTrait with PersonInfoTrait {
}
trait AddressPhoneNumberTrait extends AddressTrait with PhoneNumberTrait {
}
trait PersonInfoPhoneNumberTrait extends PhoneNumberTrait with PersonInfoTrait {
}
trait PersonTrait extends PersonInfoAddressTrait with AddressPhoneNumberTrait with PersonInfoPhoneNumberTrait {
}
Since Scala traits support multiple inheritance we can define the above type hierarchy which can be drawn as the powerset lattice that it is:
Again we can see covariant and contravariant types defined on this lattice and each relation is actually the subset superset relation on fields.
I feel the basic observations and the above examples on order duals and covariance and contravariance make the ideas pretty strait forward for field set context. In writing this I delved into a number of papers, adding some of them to my "to read list", on Types in programming and Type Theory and I feel there are probably some deeper insights and implications of all this.
Decent post. As for the concluding paragraph, if you want to see order theory applied to computer programming on steroids, check out abstract interpretation: http://web.mit.edu/afs/athena.mit.edu/course/16/16.399/www/ Abstract interpretation is a powerful generalization of data flow analysis.
ReplyDeleteThe basic idea is that each variable in the program lies in some domain, which becomes a lattice when we consider the powerset of all possible values for variables in that domain. Consequently, we can view the operators in the program (such as the integer addition operator) as being operators defined upon lattices. Abstract interpretation is concerned with overapproximating all possible behaviors of a given program. By "overapproximating", we mean that, if "all possible behaviors of a given program" is represented by a lattice element b, then we want to compute some lattice element that is greater than or equal to b. In such a way, we are able to render decisions about the program that are "sound", i.e. that they take into account all possible behaviors of the program, if not "complete", i.e. that they are approximate and hence account for behaviors that are not possible at runtime.
We do this by computing an approximation of the program's "collecting semantics", which loosely means "the set of all values that each variable might take on at each program point". The collecting semantics is formulated in a similar way to data flow equations in compiler theory (although dually): that the set of all possible values at any program point p is "at least as big as" (order-theoretically) the values at the locations immediately leading to p. This takes the form of a recursively-defined equation, which uses the aforementioned lattice transformers (i.e. for the addition operator) and the lattice meet/join operators. We seek the least solution to this equation, since the least correct solution describes the smallest set of values, whereas the greatest correct solution would be top (all states are possible at all points) and hence useless. The solution to this equation does exist mathematically (see e.g. Tarski's fixedpoint theorem), although it is not computable. That's where abstract interpretation comes into play.
We establish a Galois connection between the powerset lattice representing the concrete values, called the "concrete domain", and some "abstract domain" that overapproximately represents those concrete values. For example, one abstract domain representing an integer variable is the "interval lattice"; it represents the integral lower and upper bound that a variable might assume at that point. Next, we design abstract lattice transformers that mimic the concrete ones: for example, the concrete addition lattice operator must be re-designed for the abstract domain such that it operates on interval values instead of sets of integers. Then, we set up a recursively-defined equation identical to the concrete collecting semantics equation, except using the abstract transformers and abstract meet/join in place of the concrete ones. And this abstract equation not only has a solution (if certain properties are ensured, such as completeness of the abstract lattice), but the solution is computable using chaotic iteration.
That's the basic idea. There are more complexities to abstract interpretation, such as how to ensure termination of the fixedpoint computation when dealing with incomplete lattices, designing abstract domains that describe relationships between multiple variables, and other topics.
It's serious business; abstract interpretation is used in the static analyzer Astree to prove the absence of runtime errors in safety-critical systems (e.g. flight-control software).