27 November 2011

Eleven Equations True Computer Science Geeks Should (at Least Pretend to) Know




This idea is a complete rip off an article that appeared in Wired a little while ago and it got me thinking what would my list for Computer Science look like?  Plus I thought it might be a fun post and unlike the Wired list this one goes to eleven.  So here they are in no particular order:



Binomial Coefficient



The Binomial Coefficient equation generates Pascal’s Triangle and gives you the coefficients for the Binomial Theorem these ideas are often attributed to Pascal but in fact they have been known in part for over a millennia.


As I mentioned that this list is no particular order and I don’t wish to play favorites but if there is one equation here that you should really consider learning and committing to memory this is it. It is central to Combinitorics which has connections to many areas of math, I guess I should qualify that if you are serious about computer science and math related programming topics then you should be able to recite and apply it from memory, even when you are drunk, sleep deprived or being held at gun point. Ok that might be a bit much but you get the idea.  If you are programmer and you haven’t learned it, or need a refresher, I have a post that relates it to the Java Collections API.



Demorgan’s Laws


Logical form:


Set Form:



Ok so that’s like four "equations" for DeMorgan’s Laws, one can’t help but to struck by the similarity between the two sets and this is not a coincidence these two sets of formulas are essentially the same thing, they are special cases of complemented distributive lattices  which means technically it’s really just two formulas:




In this case the ∨ symbol means lattice join operation and the ∧ symbol is the lattice meet operation and the dash with the right downward mark means lattice complementation, I used this to differentiate from the tilde for Boolean complementation.  Also these two formulas are categorical duals of each other, which means that if one were smart and had a good grasp of Category Theory we could probably be really OCD about it and get it down to one formula.


Understanding Set Theory and Boolean Algebra are very important basic concepts in our profession and the Boolean version can make you better programmer, as you can use it to refactor logical if statements


Eigenvector and Eigenvalue



This equation and these concepts are not only central to Linear Algebra but these Linear Algebraic ideas and others are both used and extend into many other areas of math including Linear Transformations and Graphs in terms of matrices like the Adjacency Matrix and much more.


Linear Algebra is becoming increasingly central to the types of things we are doing in our profession, it is used heavily in Statistics and Machine Learning not to mention the Page Rank Algorithm is based in part on this equation.



Pumping Lemma for Regular Languages



No Comp Sci list would be complete with at least one formula from either Formal Language Theory or Automata Theory.  The problem is that these two are pretty different from other areas of math in terms of "Equations" I tried to think of some basic Ideas and The Pumping Lemma came to mind, and I found the above formula concisely expressed on Wikipedia.  This version of the Pumping Lemma is a more limited version of Another Theory which is used to check whether a Language is Regular. Also this is maybe not the best choice as it is pretty dense, if you want a better explanation I recommend this.


While this "equation" is pretty specialized, it is actually a special case of the more general Iteration Theorem, the real point here is really about more general domain knowledge.  It is central to what you do every day you are programming such as compilers and regular expressions not to mention the almost ubiquitous Kleene Star.


Information Entropy


I confess after all my machinations to try to reduce Demorgan’s Laws down to less equations I am totally cheating here by pulling a twofer in including two formulas, both Shannon’s Information Theory:



And the formula for Chaitin’s Constant, which is associated with Algorithmic Information Theory and Kolmogorov Complexity:



Interestingly this area has a parallel in the physical word, in terms of Thermodynamic Entropy, which also parallels the Boltzman Equation mentioned in the Wired Article.  Seth Loyd draws some interesting connections between computation and the physical world including Entropy and Boltzman’s Equation.


In programming and computer science information is our business and these ideas are central to many things that we deal with especially the storage, transmission, computational transformation (like compression), and computation of information.  We’ve already seen the possible relationship between both of these theories and Reusability and Software Frameworks.



Bayes' Theorem



Of all of equations on the list, Bayes' Theorem may stand out on its own merits as being one of the most recognizable and it might even qualify as a poster child for the statistical revolution sweeping our industry.  It rose from obscurity to being used for spam detection and with such applications as classifiers, Machine Learning, Data Mining and more.


Mathematically it is part of a rift in statistics and probability and I confess I may not yet call myself "an initiate of the bayesian conspiracy" but it hard to deny its utility plus there seems to be a more axiomatic basis that relates to Boolean Logic Set Theory, which makes it all the more alluring.



Fermat’s Little Theorem




These two equations of Fermat’s Little Theorem are really the same equation written in two different forms, where (a ⊥ p) means coprime and (p ∈ P) means prime.


This is a beautiful little theorem in Number Theory which can be generalized and written even more succinctly using Eulers Totient Function which is represented by φ (n):



These ideas are crucial for understanding encryption algorithms like RSA and there’s this cool result.


Natural Join



Natural Join, like all of the Relational Algebra Join Operations is a composition of the more basic, operations: Selection (σ), Projection (π), Cartesian Product (×), Set Difference (-) and Union (∪).  Natural Join is the Cartesian Product of two tables selecting matching rows and eliminating duplicate columns.  (In the formula above the projection (π) is the set (A) of attributes with duplicates removed and then selects (σ) the set (C) of common attributes which match in both sets and are equal in both sets.)


The Relational Algebra is in my opinion a bit of an odd beast when it comes to algebraic structures, but if you use SQL you are already using an implementation of it.  As data persistence becomes more heterogeneous I think understanding these concepts, will become more important.  Additionally Relational Algebra also has strong ties to Abstract Algebra[ and Category Theory[.



The Fixed-Point (Y) Combinator



Just as we need a "equation" from Formal Language Theory or Automata Theory we really should have one that is related to Lambda Calculus, in this case this equation is in the untyped lambda calculus.  Even though the Fixed-point Combinator stands up on its own, it is fairly well known, well at least in name due to the use of it by Paul Graham for his incubator (Y-Combinator).


The Fixed-Point Combinator allows one to implement anonymous recursion which is a powerful programming technique.  It also ties into some deeper theoretical aspects of computer science and logic such as Combinatory Logic.



O(n)


Many CS majors may cringe at this, and another possible reaction is that’s not much of an equation.  There’s actually quite a bit to this for example did you know there are some substantial and quite beautiful equations associated with this seemingly simple formula.  They are:




Also the following is a definition of (lim sup) Limit Superior:



Where inf is infimum and sup is supremum two concepts that seem to show up quite a bit especially when you are concerned with bounds like with Lattices and they also seem to show up in relation to Topology.


And we all know why we should care about his one: Algorithms, Algorithms, Algorithms.



Euler’s Identity



Euler’s Identity is also listed in Wired’s article and I have to agree that it is one the most beautiful and intriguing formulas in math.


Now this isn’t really as relevant to CS as the other formulas but if you a true computer geek you are hopefully some kind of math geek, and this is such a great formula, not to mention that it appears in The Simpsons' Homer3. It’s actually a specific case (where θ = π) of Euler’s Formula:



These formulas will give one a better understanding complex numbers which are needed in other areas of advanced math also these formulas relate to Fourier Analysis which does have applications in Computer Science.


I am curious to see how well this list holds up, how will I feel about it in a year?  Also in writing this I realized that I still have quite a bit more to learn about at least half of this list, if not all of it, which made it hard to write as it was quite a little (unfinished) research project. I have already written individual articles on several of these equations and have at least two in the works for Linear Algebra, at least two on the Relational Algebra and one planned for Pascal’s Triangle, and those are just the ideas I’ve already had, who knows what the future holds.  I hope you enjoyed it as much as I did writing it.


The Set version may not apply to all cases in set theory, this assumes Union and Intersection over the Power Set of a Set Universe, this would be closed under these operations and would include complements of all elements.  The Set Universe is the upper bound and the empty set is the lower bound.