05 March 2012

The Software Design Debate and the Futility of Opinion


A Tale of Two Classes






A while back when I was at a previous contract this article showed up on various link sites and one of the architects at that company sent it out. As a result, another developer and I, with the help of a googled code snippet, came up with the following code and added it to the common library:


public class ServiceResponse <ERROR extends ErrorReturn,


VALUE extends ValueReturn> {


private ServiceResponse() {


}


public ERROR error() {


return null;


}


public VALUE value() {


return null;


}


public final static class Error<ERROR extends ErrorReturn>


extends ServiceResponse<ERROR, ValueReturn> {


private final ERROR error;


public Error(ERROR error) {


this.error = error;


}


public ERROR error() {


return error;


}


public boolean isError() {


return true;


}


}


public final static class Value<VALUE extends ValueReturn>


extends ServiceResponse<ErrorReturn, VALUE> {


private final VALUE value;


public Value(VALUE value) {


this.value = value;


}


public VALUE value() {


return value;


}


public boolean isError() {


return false;


}


}


}


To be fair the original code was not documented, like this snippet, and there were a few other things that were a little less elegant but not hugely so, also all code here has been changed as not to infringe on intellectual property and to ensure anonymity.  As a result of the architecture and project schedule the other developer who needed the class ended not using it and also due to the aggressive project schedule there was no other follow up such as finishing the documentation or writing tests, this was partially my fault.


A few months later the architect needed this class and discovered our implementation he rewrote it as follows:


public class ServiceResponse<ERROR extends ErrorReturn,


VALUE extends ValueReturn> {


private ERROR error;


private VALUE value;


private boolean isError;


public ServiceResponse(ERROR error, VALUE value,


boolean isError) {


this.value = value;


this.error = error;


this.isError = isError;


}


public ERROR error() {


return this.error;


}


public VALUE value() {


return this.value;


}


public boolean isError() {


return isError;


}


}


Upon committing it he sent out an email to announce that he felt the original code was too complicated and he had simplified it to the above code.  I responded explaining that I felt that there were some issues with the instantiation of his new implementation and the real issue here was the lack of documentation.  I was shocked by his response in which he stuck to his argument that the new class was simpler to implement and didn’t require a lot of thought and we just need a wrapper class.  I confess I was initially perturbed by this as I felt that he was both wrong and taking a cargo cult attitude which was atypical for him, I later found out that he had been suffering through a bad cold.  Since I knew I would soon be leaving the project, I left it there even though I was pretty sure that his new design was significantly worse than the original.   I am intentionally omitting the details of the debate because I want to explore these in more detail momentarily. 


Throughout my career I have all too often found myself in these types of debates, and the thing I find so frustrating about it is the lack of real ways to accurately analyze these types of situations and the lack of any formal structure to the arguments themselves, they frequently become as constructive as a political or religious debate.  So it got me thinking, these are two pretty simple classes, how can we constructively analyze this situation and perhaps establish dispassionately and objectively which solution is better and what opportunities can this exploration present in a larger context?


First let’s look at the problem in the context of the current prominent thinking, let’s start with Joshua Bloch’s API ideas, which I have and will continue to reference.  Now remember the class in question is part of an API so these points are very relevant.  One point he makes is that an API:


  • Easy to use, even without documentation
  • Hard to misuse

So let’s assume that we have the following two classes and references:


ErrorReturn error;


ValueReturn value;


The original design has the two correct ways of creating it, omitting for the moment creating each with null:


new ServiceResponse.Error<ErrorReturn>(error);


new ServiceResponse.Value<ValueReturn>(value);


The new design has the following correct instantiations:


new ServiceResponse<ErrorReturn, ValueReturn>(error, null, false);


new ServiceResponse<ErrorReturn, ValueReturn>(null, value, true);


Also it is pretty obvious that the new design opens up several other ways that this object can be created.  It seems that new design more easily allows value and error to be mixed up.  The new design puts the responsibility of setting the Boolean isError on the user which opens it up to possibly being set incorrectly.  This breaks the hard to misuse rule.  In the old design it is an artificial value that created by a pure function.   So from the API design perspective it seems that the original design is better. 


Now another piece of conventional wisdom is the minimizing the number of parameters, in Clean Code Bob Martin talks about how you really shouldn’t have more than three parameters, well both instantiations fall under that, but by this logic is not the old design with one parameter then preferable to the new design with three? 


Another possible pro argument for the original design is the use of the "Disjoint Union" Either pattern, now this is not a GOF pattern but it is a pattern, perhaps an even more mathematical pattern, but regardless it is a specific pattern with precedence.


So that’s three pro arguments for the original design based on the current conventional software design wisdom promulgated by the leading software engineers of the field. So the original design is probably better.  Unfortunately I have found that even these arguments can be thwarted by the most recalcitrant cargo cult coder who doesn’t read and will always invoke arguments like "it’s too complex, it’s too abstract".


So I started to think, how could one look at this situation more analytically?  Going back to Joshua Blocks API talk, at some point and I don’t have this documented anywhere, he makes reference to an object’s state space, but he doesn’t go into detail about the concept.  This got me thinking, in this case we could pretty easily enumerate, at least at high level, the state space for each object and then compare them.  I am not sure if this is a known technique, I wouldn’t be surprised if someone has thought of or done this before.  In this case I am going to refer to this as "Enumerative Object State Space Analysis" or just "Object State Space Analysis" for lack of a better name.


So for this exercise we will interest ourselves in the possible values of a value or null for each parameter, where the error value is an instance of ErrorReturn and the value parameter is an instance of ValueReturn or true or false for the Boolean parameters, this is the following sets: {<Error>, null}, {<Value>, null}, and {true, false}. 


For the old design the possible instantiations are:


new ServiceResponse.Error<ErrorReturn>(error);


new ServiceResponse.Value<ValueReturn>(value);


new ServiceResponse.Error<ErrorReturn>(null);


new ServiceResponse.Value<ValueReturn>(null);


There are 2 possible Error instantiations and 2 possible Value instantiations leading to an object state space of 2 + 2 = 4 possible states, this is an application of the Sum Rule.  The new class has 2 possible values per constructor parameter, similarly from the Product Rule of combinatorics we can calculate that this leads to 2*2*2 = 8 possible states in the object state space.


We can enumerate (list them out) the states for the original class as follows:


Object State Comment Correctness Status

{Value, false} Return state for successful operation Correct intended state

{Error, true} Return state for operation with an error. Correct intended state
{null, false} Return for a successful operation with no return value State is a side effect of design, allows null, could be correct or could be considered a degenerate state.
{null, true} Return for an operation with an error with no information State is a side effect of design, allows null, could be correct or could be considered a degenerate state.

We can enumerate the states for the new class as follows:


Object State Comment Correctness Status
{null, Value, false} Return state for successful operation Correct intended state
{Error, null, true} Return state for operation with an error. Correct intended state
{null, null, false} Return for a successful operation with no return value State is a side effect of design, allows null, could be correct or could be considered a degenerate state.

{null, null, true} Return for an operation with an error with no information State is a side effect of design, allows null, could be correct or could be considered a degenerate state.
{Error, Value, false} Return state for successful operation State is correct from a usage standpoint but ambiguous from a programmer maintenance standpoint is this what was intended or is it reversed?
{Error, Value, true} Return state for operation with an error. State is correct from a usage standpoint but ambiguous from a programmer maintenance standpoint is this what was intended or is it reversed?

{null, Value, true} Incorrect Usage This is incorrect but the design allows this state to exist.
{Error, null, false} Incorrect Usage This is incorrect but the design allows this state to exist.


Assuming a uniform distribution the probability for each state in the original class is 25% and 12.5% in the new class, this means that the first design has 50% correct states and 50% null states.  The new design has 25% correct states, 25% null states, 25% ambiguous states, and 25% incorrect states. Of course a uniform distribution would probably not be the case as the correct usage would always be more likely and the incorrect usage would be the most unlikely.  Still, based on this analysis I am inclined to further conclude that the original design is better, perhaps even definitively so as the old design has no incorrect states and the expanded state space, which is twice as large, exponentially larger, also increases the complexity. 


Another possibility would be to eliminate the setting of the Boolean flag on the new design and allow it to be determined by a pure function as in the original design, but that opens up problems with respect to the ambiguous states that would exist.  In reflecting I also realized that the one of ambiguous states of the new design allow a state {Error, Value, false} which might be construed as success with information, which leads me to think that both designs are lacking and it might be better to always return an object that contains the value and some type of status object that allows error or success with info.


Now I know this is a lot and I am probably over thinking this, but I am treating this as an experiment:  Is it possible to come up with more analytical approaches using Enumerative and Combinatorial techniques to guide object design or other aspects of software design in a practical way?

No comments:

Post a Comment