23 July 2014

A Survey of Graph Theory and Applications in Neo4J


A while ago I had the idea to do a graph theory talk at our local Neo4J Meetup, the Baltimore Washington Graph Database Meetup, especially since Neo4J and graph databases are based on graph theory.  So I finally sat down and spent quite a bit of time putting a talk together. I decided to go with the idea of an overview of graph theory with a primary focus on the theoretical aspects and with some applications like network theory and some specific examples of applications to Neo4J.  My talk is titled: "A Survey of Graph Theory and Applications in Neo4J" and I presented it on June 24, see below for video, slides, and references. This post is my thoughts on it and things leading up to its creation.

Some of my blog posts are created from a desire to make math more accessible and more relevant especially those areas of math that seem to be neglected by the traditional educational system.  One huge gaping hole in my math education both in our American institutional learning facilities and in my college computer science curriculum was that I was never exposed to graph theory.   Once you start studying graph theory you realize that many of these concepts could be taught in middle and elementary schools.  As I was preparing my presentation, I saw this: "Math for seven-year-olds: graph coloring, chromatic numbers, and Eulerian paths and circuits", reaffirming what I already knew.  Sadly many of these concepts were new to me as an adult, and I’d guess the same was probably true of much of my audience.  

Graph theory has been a topic that I have included in my blog, I have written two posts about it one is a general math post: "Triangles, Triangular Numbers, and the Adjacency Matrix" and the other post: "The Object Graph" attempts to put it in part in a programming context .  One post or series of posts that I have wanted to do is an introduction/overview of graph theory.  I was thinking that my talk could be turned into those posts.  However, I am not sure if and when I will get a chance to do that, so I figured until then I will make the talk itself that post and that post is this post.

I found myself torn about publishing the video of my presentation, on one hand I feel that it could be much better but on the other, as a friend pointed out, people might find it informative and enjoy watching it.  That being said, I reserve the right to replace it with a better version if I ever do one.

Writing a post about the talk provides me with an opportunity make a few corrections, add some more source material and mention a couple of things that I would have liked to include but that got cut in the interest of time.

I’ll start by adding some additional source material.  I followed the introductory remarks with a brief historical overview.  For that, especially 20th century contributors I made use these power point slides of John Crowcroft’s "Principles of Communications" course.  I also referenced Graph Theory, 1736-1936 by Norman L. Biggs, E. Keith Lloyd, and Robin J. Wilson.  I mentioned the Watts and Strogatz Model (pdf of original nature paper).  I also included Fan Chung who is married to Ronald Graham and is known for her work in graph theory especially spectral graph theory.   I mentioned snarks not to be confused with Lewis Carroll‘s snarks from which he coined the name.  When I was talking about the topology part I mentioned Eulers Gem.  Since the World Cup was going on during the talk I have to include these topology links "The Topology and Combinatorics of Soccer Balls(pdf)" and some more "soccer ball math".

Due to time constraints I cut a few topics,  two of them were the ideas of graph isomorphisms and graph automorphisms, also ideas like: Graph families defined by their automorphisms and the automorphism group.  Another area that I didn’t really get into, except the linear algebra parts was the area of algebraic graph theory.

I did want to offer up a couple of corrections: I called a truncated icosahedron a dodecahedron, how embarrassing. I called the Travelling Salesman Problem, the Travelling Salesman Program, doh! When I said a bipartite graph has no connections between edges, it should have been vertices.  And finally I misstated Kirchoff’s Theorem, it should be n-1 non-zero Eigenvalues, obviously you only use the non-zero ones!  There are probably more and if I decide to replace these videos in the future, I will just replace this paragraph as well.

On a lighter note, when I was creating my "Graphs are Everywhere" images and sharing with my friend Wes, I created this one for fun:


Finally I’d like to thank our meetup sponsor neotechnology refreshments. The part where we thanked them at the meetup didn’t get recorded.


Videos


Part 1




Part 2





References and Further Reading



15 March 2014

Are Your Developers Treating Your Project as Their Own Personal Science Project?


Years ago I worked on a project that had a custom role based security system and for this system we needed a way to load menus according to the users’ assigned roles.  Two of the dominant developers on the project decided that this was a perfect use for AJAX mainly because they wanted to learn and use AJAX.  So they build a separate deployable menu application to serve up the menu by emitting XML from a Spring Controller URL, it was very comparable to a simple Restful service. It was called directly from JavaScript in the application’s web page.  Since there would be a lot of calls for any given user’s menu they built a cache for each user’s menu options, it was a custom cache where objects never expired.  This work was completed as I joined the project and as I got acquainted with this project my first thought was why do all this work?  Why not just load the menu when the user is authenticated and then cache it into the HTTP Session, I raised the issue but was ignored.  It was their pet project and it was sacrosanct. 


Later, while testing the apps that used the AJAX menu call the testers ran into a problem, the cache was interfering with testing because it was keeping the database changes from be propagated and testing the applications created a need to continuously restart the AJAX menu app.  So a new admin interface was added to the AJAX menu application to allow the testers to clear the cache.  This created a whole separate application that the client would have to support just to enable this unnecessary AJAX menu delivery mechanism.  So instead of just caching the data in the HTTP Session which is easier and any cache problem would be fixed by just logging out and logging back in, these two developers built a pet project which  consists of an extra application and codebase that has to be understood and maintained.  Ultimately the biggest problem this creates boils down to simple economics, it more expensive to maintain a solution that requires a whole separate application.


I think there is a tendency for all developers both good and bad to sometimes want to make projects more interesting and there can be a lot of temptations.  It is often tempting to create a custom implementation of something that could be found already written because it’s a fun project or the temptation to shoehorn a new hot technology into a project or even the temptation to add a solution to a problem that doesn’t need to be solved or just strait up over engineering.    I am guilty of some of these transgressions myself over the years. 


My anecdote about the menu app is just one of at least a dozen of egregious examples of these types of software project blunders that I have seen over the years.  My anecdote shows a manifestation of this problem and it also shows that these types of software project issues have a real tangible cost.  They create more code and often more complex code.  In some cases they create the need for specialized technical knowledge that wasn’t even necessary for the project.  While these approaches may only result in a small cost increases during the construction of the software the real consequence will be the long term maintenance costs of a software system that was needlessly overcomplicated.   In some cases these costs will be staffing costs but they can also be time costs in that changes and maintenance take considerably longer.  These problems may also result in costs to business as unwieldy systems cause the loss of customers to competitors and can impede the ability to gain new customers.


These types of decisions can be viewed as software project blunders.  In some cases the developers know better but choose to be selfish in that they put their own desires above the project, in these cases these decisions can potentially be viewed as unethical.  The two developers from the anecdote were extreme examples and were pushing into unethical territory since they treated every new project as a personal opportunity to use some new technique or library or some custom idea that they were interested in.  However, I think most of these blunders are simply that, mistakes in judgment or lack of good oversight.  It is always easier in hindsight to see these problems, but when one is in the heat of the moment on the project one can lose perspective.  The real challenge is trying to avoid these blunders. Really these are problems that fall into a larger spectrum of software project management problems and sadly I have mostly seen terrible software project management throughout my career.


I wish I could say I have a solution to this problem.  I think developers have some responsibility here especially senior developers, well at least the good ones.  Software development is really about decisions.  Developers make many decisions every day including naming, project structure, tools, libraries, frameworks, languages, etc.  Too often these decisions are made in isolation without any review and sometimes they can have serious long term consequences.  When making so many decisions in such a complex domain mistakes are inevitable.  Good software developers and good teams make an effort to deliberate and review their choices.


11 December 2013

Haskell DC





I am proud and excited to announce the meetup group Haskell DC.  Our first meetup agenda is open so I envision some discussion and maybe some basic hacking perhaps using Real World Haskell or Learn You a Haskell for Greater Good. Both are freely available online.  In this post I will put forth some ideas that we might consider for future meetups.  I hope to see this grow into a community and am looking forward to contributions from members especially those that are currently working with Haskell.  I hope to see a mix of practical and theoretical discussions and topics. I will also try to grow the group in terms of sponsorship.  We actually have our first sponsor, the local coworking space company Uberoffices is providing us space for our meetups.  I recently added our meetup to haskell.org under East Coast, which makes it seem a little more official.  I have setup social media accounts on Twitter @HaskellDC also use #HaskellDC for HaskellDC tweets.  I created a Google Plus account with the email haskelldc0 at gmail,  haskelldc was taken. I have also created a Community in Google Plus for HaskellDC. I hope to do a Google Hangout for our meetups which was requested by one of our remote members.  I even created a Facebook account if for no other reason but to claim it.  I admit to not being great at the social media stuff but will do my best or perhaps I will get help there from other members.


Learning Haskell has been on my list of things that I want to do for quite a while.  Haskell is attractive to me because it was developed in academia and embodies some mathematical concepts and now it seems to be gaining ground for commercial use.  So it is interesting to both the computer scientist and the working programmer.  I have a few ideas about future meetup topics both practical and theoretical that I wanted to list out in this post. Of course the future is wide open and I am hoping that as a group we develop future ideas with opportunities for anyone to present or lead a coding session.  In researching this post I became overwhelmed by the copious amount of material that there is out there for Haskell.  I plan to write at least one if not two more posts about Haskell in terms of resources, research, and the ecosystem.


The following are some lists of Programming Language Features, Software Development Related Topics, Practical applications, Advanced Programming Areas and Theoretical Topics. This list is derived from areas of my own interests along with areas that I feel are useful to practical software development.  I put these forth as possible topics for consideration for future meetups:


Haskell Language



Ecosystem


  • IDE Support
  • Building and Deploying, Production deployment
  • Continuous Integration
  • Package Management
  • Testing
  • Implementations, GHC, etc.

Practical Uses


  • Web Development
  • XML, HTML Processing and parsing
  • Network Client/Server, Web Services, Restful, JSON, SOAP
  • Desktop GUI
  • Graphics support
  • Embedded programming
  • Relational Database Access
  • MongoDB
  • Neo4j
  • Hadoop/HBase
  • Lucene, Elastic Search, Solr
  • Other NoSQL: Casandra, CouchDB, Riak, etc.

Advanced Topics and Uses



Theoretical



Interesting (Haskell Related) Publications




Ok, that’s quite a list. I tried to categorize these as best as I could. Monads show up under theory and language features, although I am not sure but I think they may be considered more of a language idiom.  The list is clearly ambitious.  If we have one meetup per month it would probably take several years to cover all of these topics if that were the plan.  As some of the theoretical topics are pretty advanced, and I hope to someday understand them.  Hopefully as a group maybe we make some progress as the theoretical aspects are important for better application of the practical.


I have heard many positive things said about Haskell in regards to good software development practices and that it facilitates the creation of better quality software which is what I, and most likely all good software developers, are striving for.   One topic that is of great interest to me is software reuse, I have written about it in relation to complexity and generic programming.  The generic programming post takes its title from a paper of the same name, listed above as a Haskell related paper.  Interestingly Edward Kmett mentions in his Haskellcast about his Lens Library that using Haskell facilitates a higher ease of reuse that would require significant discipline, especially for a team, when using an OO language like C++.  Another area is that of testing, while I think testing is important, my experience with TDD is limited, but it seems that TDD increases the amount of code that has to be written and understood.  I have read that Haskell’s type system, like any statically typed language, allows for less testing because of static type checking.  This rings true to me as dynamically typed languages like Python, Javascript, and Ruby need tests to enforce things that a type system would give you. Also, purity and referential transparency eliminate side effects. State still exists but it is managed more cleanly.  Haskell’s concise syntax is another potential benefit which allows more complexity to be expressed and read more easily.  Powerful features like list comprehensions, pattern matching, and abstract datatypes (ADTs) allow for smaller idiomatic code.  Of course this requires programmers that understand all of this.


CS research publications are a primary area of interest for me and I have encountered many that use and mention Haskell.  The list above provides some papers covering some fairly pragmatic topics that should be of interest to working programmers like Generic Programming, Design Patterns, and Data Structures.     I also include Brent Yorgey’s Typeclassopedia which gets a lot of mentions from people giving advice about learning Haskell and starts with the following advice:


 “The standard Haskell libraries feature a number of type classes with algebraic or category-theoretic underpinnings. Becoming a fluent Haskell hacker requires intimate familiarity with them all, yet acquiring this familiarity often involves combing through a mountain of tutorials, blog posts, mailing list archives, and IRC logs. The goal of this article is to serve as a starting point for the student of Haskell wishing to gain a firm grasp of its standard type classes. The essentials of each type class are introduced, with examples, commentary, and extensive references for further reading.

In the course of my own general research and what I did for this post, I can attest to the large amount and overwhelming Haskell related topics both theoretical and practical that are available online.


In my above list of possible theoretical interests to the meetup group, I list Homotopy Type Theory.  The relevance here is of course that Type Theory is relevant to Haskell.  The theory and the book are new, coming out this year.  I wanted to give it special mention because there is a lot of excitement about it, including claims that the theory is potentially revolutionary to both mathematics and computer science.  The book is a daunting 600 pages and very heavy going, however, Robert Harper has done a series of lectures.  I have watched the first lecture and found it very enlightening so I would recommend it, so far anyway.     


So for the meetup, as I mentioned we will probably start off with some easy learning and hacking. I thought it might be fun to try to implement something simple, maybe Fizz Buzz.  Ultimately it would be nice to work to get set up with the ecosystem to do productive things in Haskell. Maybe eventually a group project or some higher level hacking, I have thought it might be interesting to something like work though some of the problems in Think Stats possibly using a Haskell statistics package.


Well those are my ideas, feel free to suggest others or correct me if I got anything wrong, this is all new to me.


11 March 2013

Programming and Order Theory





Covariance, Contravariance and Order Theory


In this post I make the observation that covariance and contravariance in programming are what are known as Order Duals.  I am not the first person to make this observation, however, these ideas often tend to be buried in academic research papers like "Adding Axioms to Cardelli-Wegner Subtyping" by Anthony J H Simons, don’t get me wrong I love these types of papers, they give me hope and inspiration that software engineering will someday become a first class engineering citizen.  Unfortunately, these types of papers tend to be too theoretical and thus not very accessible for the average developer.  This is unfortunate as the idea of covariance and contravariance as order duals both puts these concepts into the mathematical context of order theory and possibly gives programmers some native context for order theory.  So hopefully this post will make these ideas more programmer friendly.


I previously wrote a post about lattice theory, which is part of the more general order theory, where I talked about some basic order theory ideas such as duality.   Order theory occurs quite a lot in software and programming and this is part of a series of posts to talk about those occurrences.


Covariance and contravariance receive a fair amount of attention, as they should, in software blogs and some of these posts include some interesting observations. One, perhaps slightly off topic observation, is "Liskov Substitution Principle is Contravariance" which is an interesting observation and interesting post if you overlook the disdainful tone towards OO.  Another more relevant post which is a nice post about "Covariance and Contravaiance in Scala" relates these ideas to category theory which is relevant especially since apparently you can think of "Category Theory as Coherently Constructive Lattice Theory", warning heavy going in that paper.


Defining Covariance and Contravariance


To me one of the most striking and perhaps apropos examples of order theory in software is that of covariance and contravariance which Eric Lippert defines on his blog as:


The first thing to understand is that for any two types T and U, exactly one of the following statements is true:


  • T is bigger than U.
  • T is smaller than U.
  • T is equal to U.
  • T is not related to U.

For example, consider a type hierarchy consisting of Animal, Mammal, Reptile, Giraffe, Tiger, Snake and Turtle, with the obvious relationships. (Mammal is a subclass of Animal, etc.) Mammal is a bigger type than Giraffe and smaller than Animal, and obviously equal to Mammal. But Mammal is neither bigger than, smaller than, nor equal to Reptile, it’s just different.


He has an eleven part series on covariance and contravariance, his posts cover some C# implantation details but the ideas are generally applicable and looking at one language’s details can help with comparing and contrasting to other languages.


Wikipedia includes the following definition, the animal example is pretty popular:


  • Covariant: converting from wider (Animals) to narrower (Cats).
  • Contravariant: converting from narrower (Triangles) to wider (Shapes).
  • Invariant: Not able to convert.

Including this is slightly redundant but this definition captures the conversion aspect and defines the relationships explicitly. 


Covariance and Contravariance as Order Duals


The above are definitions that have order theory written all over them.  In fact that is pretty much a text book definition of an order Relation in that it is reflexive, transitive, and antisymmetric. It is reflexive since Animal = Animal, transitive since Animal ≤ Mammal ≤ Cat implies Animal ≤ Cat, and antisymmetric since Animal ≤ Mammal implies not Animal ≥ Mammal and in the animal example there are cases of both comparability and incomparability as you would find in a partial order.


As you can see from the above definitions both sets of terms, wider/narrower or bigger/smaller, which are the same, define an order dual for comparison.  To write it more formally we will call the set C classes of various types in an OO hierarchy. So covariant would be represented by less than or equals ≤ and contravariant would be represented by greater than or equals ≥ and a set of classes with these order relations can be written with mathematical notation as (C, ≤) = (C, ≥)d .


Types as Sets of Fields


It was in researching this post that I came across the paper "Adding Axioms to Cardelli-Wegner Subtyping". These kinds of discoveries are one of the reasons I write these posts.  In that paper they quote another paper "On understanding types, data abstraction and polymorphism" by Luca Cardelli and Peter Wegner:


a type A is included in, or is a subtype of another type B when all the values of type A are also values of B, that is, exactly when A, considered as a set of values, is a subset of B


The ideas about types and subtypes covered in these papers extend beyond which fields a class or object has, however, I thought it would be interesting and beneficial to limit the discussion to that case.  One reason is that if you take the fields of an object or class then all subtype collections of fields will be found in the powerset and all subtypes will be a subset relation and these can be drawn as my favorite lattice, yes I have favorite lattice, the powerset lattice.  Also in this case covariance and contravariance are now defined as the subset and superset operations on the powerset lattice.


Types as Sets of Fields in the Real World


Now I always feel that a real example helps quite a bit so I have created a set of example classes, Scala traits actually, which illustrate the above ideas using a quasi-real-world example.  Please note that these code examples are designed for the purposes of illustrating these ideas and may contain design issues that one would not implement in the real world, but they should be close enough to bridge the conceptual gap, if you will.  Also this first example is should be applicable to dynamically typed languages that might use duck typing or structural typing.


The following Scala traits define a possible domain object hierarchy that could be used to persist data to a database and render it back to a web page among other possible uses:



trait BaseDomain {

override def equals(that: Any) : Boolean

override def hashCode : Int

}

 


trait PersonInfo extends BaseDomain {

var firstName : String

var lastName : String

}

 


trait Address extends BaseDomain {

var street : String

var street2 : String

var city : String

var state : String

var country : String

var zipCode : String

}

 


trait PhoneNumber extends BaseDomain {

var phoneNumber : String

var extension : String

}

 


trait Person extends BaseDomain {

var personInfo : PersonInfo

var address : Address

var phoneNumber : PhoneNumber

}



These traits yield the following field powerset lattice:




Now suppose we would want to define a type for each of the above lattice points, which we probably would not do but there may be cases to do similar types of things in the real world.  Let’s define the following Scala traits that wrap the above domain object hierarchy elements:



trait PersonInfoTrait extends BaseDomain {

var personInfo : PersonInfo

}

 

trait AddressTrait {

var address : Address

}

 

trait PhoneNumberTrait extends BaseDomain {

var phoneNumber : PhoneNumber

}

 

trait PersonInfoAddressTrait extends AddressTrait with PersonInfoTrait {

}

 

trait AddressPhoneNumberTrait extends AddressTrait with PhoneNumberTrait {

}

 

trait PersonInfoPhoneNumberTrait extends PhoneNumberTrait with PersonInfoTrait {

}

 

trait PersonTrait extends PersonInfoAddressTrait with AddressPhoneNumberTrait with PersonInfoPhoneNumberTrait {

}



Since Scala traits support multiple inheritance we can define the above type hierarchy which can be drawn as the powerset lattice that it is:



Again we can see covariant and contravariant types defined on this lattice and each relation is actually the subset superset relation on fields.


I feel the basic observations and the above examples on order duals and covariance and contravariance make the ideas pretty strait forward for field set context.  In writing this I delved into a number of papers, adding some of them to my "to read list", on Types in programming and Type Theory and I feel there are probably some deeper insights and implications of all this.