Saturday, July 25, 2009

Bootlegging

Remember back in the days of prohibition when a person wanted a beer they had to know the secret knock to get into the backroom of a club concealed by a false bookcase? Me neither. I’m talking about permitted bootlegging in the business sense of the word.

As I mentioned in my bio I spent 8 or 9 years of my career managing a team of developers. One of the first policies I implemented as a manager was permitted bootlegging. I encouraged every one of my developers to spend 10% of their time working on a software project of their choice that had nothing to do with the standard product we were building. I got the idea from my dad who worked for 3M. Software Developers can be odd ducks (to other people – us developers think you are the odd duck) and like most development organizations we had several kinds of developers. I’m going to use a broad brush and group them into 3 categories.

We had people that insisted on pounding out code 12 hours a day 7 days a week. These folks could do amazing things in short periods of times. They could get a late project back on track or whip out a feature in no time. It is a luxury to have one or two of these people on your team as nobody can reasonably ask for anyone to work these kinds of hours. The difficulty in managing this sort of person was if you pointed them in the wrong direction look out! They were a bit like rhinos in that they would very quickly blaze a trail through the code base doing all kinds of “damage”. Check in with these folks early and often to make sure they are doing what you think they are doing. They don’t like to undo hours upon hours of work because of a miscommunication and who can blame them? It is also quite a challenge keeping them busy. We’ll call this group the rhinos.

Another group of developers wanted to be told exactly what to do, when to have it done, and to be left alone. They will let you know if they have a problem and won’t tend to try to move forward with something they don’t understand – it is just not in their nature. They don’t care too much about what they are assigned to work on as long as it is in their comfort zone. These are the people that can tend to fly under the radar so make sure and share credit for your success with these folks. You need a lot of these people on your team. We’ll call this group the workhorses.

The final group of people I want to talk about could probably best be described as the mad scientists. These are the kinds of people that are not happy unless they are challenged. These are the folks you want working on critical parts of the framework. You would go over a problem with them in painful detail and turn them loose. They would disappear into their dark laboratory only to return days, weeks, or months later when they were done. Schedule means very little to these folks because…well…it is not done until it is done! You want one or maybe two of these people on your team but you can not afford more – you’d never get anything built. The challenge with managing this type of person was that it was hard to schedule their activities and you don’t always have an interesting problem for them to work on. Let’s call this group the Einsteins.

By way of an apology for pigeonholing these good people I will admit that the personality of every individual I managed really fit into all 3 of these groups to one degree or another (well maybe not the Einstein group). My job was to figure out a way to get this crew to build software. I had to ask the Einsteins to write mundane code, I had to keep the rhinos from running further ahead than we could test, I had to ask the workhorses to do work they weren't comfortable with. So how do I keep the Einsteins challenged when they are doing mundane work, how do I keep the Rhinos out of the code during the definition phase and how do I get the workhorses to get comfortable doing something they have never been asked to do before? That is where bootlegging comes in.

The role that bootlegging played for the Einsteins was that it kept things interesting all the time. I never had to encourage these folks to spread their wings and learn something new. I used bootlegging as a way to keep them from leaving the company. They knew that, I knew that, they appreciated it, and most importantly they never left the team. For the rhinos I used bootlegging as a way to keep them busy during slow times. At the end of the day what drives the rhino is they just want to be busy doing something useful. Their bootleg projects were usually related to productivity tools. Tools that made the development team more productive but had nothing to do with our product. Workhorses were not wild about bootlegging. I would really have to push a project onto a workhorse. I would try to recognize a new technology or feature coming up in the project and use the bootlegging task as a way to introduce them to it. To be honest, the bootleg projects didn’t mean much to these folks but it was always there if they wanted to use it.

As a manager I was very proud of the fact that I never had a developer leave my team to join another team. I think giving each team member the freedom to work on whatever software project they wanted with 10% of their time was a big part of it.

Friday, July 17, 2009

Design Review and the Open/Closed Principle

In my last blog I talked about the Single Responsibility Principle. SRP it is an easy concept to understand but breaking down a system into the “correct” objects is difficult to do well. It is imprecise and there is a lot of room for opinions. You seldom know the correct breakdown until you are finished. In this blog I want to take a look at the OCP or Open/Closed Principle. The Open/Closed principle states that Software entities (classes, modules, functions etc.) should be open for extension but closed for change. (“Agile Principles, Patterns, And Practices in C#” Robert C Martin and Micah Martin). In other words this means that we would like to be able to change the behavior of existing entities without changing the code or binary assembly. This concept is not as easy to understand. Did I just say to change something without changing it?

Let’s use a configurable rule engine as an example. Assume the simple rule engine has 2 types of entities, a Condition and an Action. The Action is simple, execute some logic and return true on success or false on failure. The condition is also simple, it checks some binary condition and returns true or false. The rules engine takes a list of conditions and actions and executes them. If an Action returns false, the rules stop executing and a rollback is performed. If a condition returns true, the next action is executed. If the condition is false, the next action is skipped and the following action is executed. Pretty simple so let’s look at some sample code:

Define our Conditions, Actions, and Rule Type with enums



Code Snippet



  1.     public enum Actions
  2.     {
  3.         CreateOrder,
  4.         CreateBackorder,
  5.         CloseOrder,
  6.         ShipOrder,
  7.         StoreOrder,
  8.         ReduceInventory
  9.     }
  10.     public enum Conditions
  11.     {
  12.         IsComplete,
  13.         IsInStock,
  14.         CanShip
  15.     }
  16.     public enum RuleType
  17.     {
  18.         Condition,
  19.         Action
  20.     \




Here is a very crude implementation of a rule engine



Code Snippet



  1.     public class RuleEngine
  2.     {
  3.         
  4.         public void ExecuteRules(int rulesId)
  5.         {
  6.             //Gather all the conditions and actions
  7.             //assume we have a Ruleset table that looks like this:
  8.             //ID - Int
  9.             //Index - Int
  10.             //RuleType - Int (corresponds to RuleTypeEnum)
  11.             //Rule - Int (corresponds to Conditions or Actions enum)
  12.             //Note, transaction logic, creating the parameter
  13.             //Error handling, and safe DataReader.Read() logic
  14.             //has been omitted for brevity
  15.             using (SqlConnection connection = new SqlConnection("ConnectString"))
  16.             {
  17.                 using ( SqlCommand cmd = new SqlCommand
  18.                     ("SELECT RuleType, Rule FROM Ruleset where ID = ? ORDER BY Index"
  19.                     , connection))
  20.                 {
  21.                     SqlDataReader rules = cmd.ExecuteReader();
  22.                     bool executing = true;
  23.                     while (executing)
  24.                     {
  25.                         
  26.                         rules.Read();
  27.                         if ((int)rules[0] == (int)RuleType.Action)
  28.                         {
  29.                             executing = ExecuteAction((int)rules[1]);
  30.                         }
  31.                         else
  32.                         {
  33.                             if(ExecuteCondition((int)(rules[1])))
  34.                             {
  35.                                 executing = true;
  36.                             }
  37.                             else
  38.                             {
  39.                                 rules.Read();
  40.                                 executing = true;
  41.                             }
  42.                         }
  43.                         //Implement some exit strategy here
  44.                     }
  45.                 }
  46.             }
  47.         }
  48.         public bool ExecuteCondition(int theCondition)
  49.         {
  50.             switch (theCondition)
  51.             {
  52.                 case (int)Conditions.IsComplete:
  53.                     //Is the order complete?
  54.                     return true;            //or false
  55.                 case (int)Conditions.CanShip:
  56.                     //Can we ship the order?
  57.                     return true;            //or false
  58.                 case (int)Conditions.IsInStock:
  59.                     //Is this item in stock?
  60.                     return true;            //or false
  61.                 default:
  62.                     throw new Exception("Unsupported Condition");
  63.             }
  64.         }
  65.         
  66.         public bool ExecuteAction(int theAction)
  67.         {
  68.             switch (theAction)
  69.             {
  70.                 case (int)Actions.CreateOrder:
  71.                     //Execute order create logic
  72.                     return true;            //or false
  73.                 case (int)Actions.CreateBackorder:
  74.                     //Execute backorder logic
  75.                     return true;            //or false
  76.                 case (int)Actions.ShipOrder:
  77.                     //Execute shipping logic
  78.                     return true;            //or false
  79.                 case (int)Actions.StoreOrder:
  80.                     //send to warehouse
  81.                     return true;            //or false
  82.                 case (int)Actions.CloseOrder:
  83.                     //Execute order close logic
  84.                     return true;            //or false
  85.                 case (int)Actions.ReduceInventory:
  86.                     //Remove item from inventory
  87.                     return true;            //or false
  88.                 default:
  89.                     throw new Exception("Unsupported Action");
  90.             }
  91.         }
  92.     \




Never mind the problems with error handling, transactions, and the lack of support for nested conditions, the point of the blog is OCP, not creating a rule engine. What we have will work but does it satisfy the OCP? No, there are a couple problems with this type of implementation that will make it difficult to maintain. The first problem is that the logic for the Rule Engine, Conditions, and Actions are all in one class and therefore one assembly. Any system that wants to use any of this logic will be tied to all of this logic. The second problem is that any time you want the rule engine to do something new, you have to modify this assembly which is a violation of the Open/Closed Principle.

Let’s take a look at a more robust design.

We will still use an enum to distinguish between Actions and Conditions:



Code Snippet



  1.     public enum RuleType
  2.     {
  3.         Condition,
  4.         Action
  5.     \




Now let’s declare an interface



Code Snippet



  1.     public interface ISupportRules
  2.     {
  3.         public bool Execute();
  4.         public RuleType TypeOfRule();
  5.     \




We will put both the enum and the Interface in an assembly called Rule.Types.

Now lets add a few classes



Code Snippet



  1.     public class CreateOrderAction:ISupportRules
  2.     {
  3.         #region ISupportRules Members
  4.         public bool Execute()
  5.         {
  6.             //Order Create Logic Here
  7.         }
  8.         public RuleType TypeOfRule()
  9.         {
  10.             return RuleType.Action;
  11.         }
  12.         #endregion
  13.     \




We will put this in an assembly called Rule.Actions.CreateOrder.



Code Snippet



  1.     public class CanShipCondition:ISupportRules
  2.     {
  3.         #region ISupportRules Members
  4.         public bool Execute()
  5.         {
  6.             //Execute Can Ship Logic here
  7.         }
  8.         public RuleType TypeOfRule()
  9.         {
  10.             return RuleType.Condition;
  11.         }
  12.         #endregion
  13.     \




We will put this class in Rules.Conditions.CanShip.

In fact we will create a separate assembly for each condition and action we defined in the enums.

Here is our new rules engine which goes in the Rule.Engine assembly:



Code Snippet



  1.         public void ExecuteRules(List<ISupportRules> rules)
  2.         {
  3.             bool executing = true;
  4.             int ruleIndex = 0;
  5.             while (executing)
  6.             {
  7.                 if (rules[ruleIndex].TypeOfRule() == RuleType.Action)
  8.                 {
  9.                     executing = rules[ruleIndex].Execute();
  10.                     ruleIndex++;
  11.                 }
  12.                 else
  13.                 {
  14.                     if (rules[ruleIndex].Execute())
  15.                     {
  16.                         ruleIndex++;
  17.                         executing = true;
  18.                     }
  19.                     else
  20.                     {
  21.                         ruleIndex += 2;
  22.                         executing = true;
  23.                     }
  24.                 }
  25.                 //Implement some exit strategy here
  26.             }
  27.         }
  28.     \




Notice that the ExecuteRules method takes a generic list of type ISupportRules as a parameter but has no reference to any of the conditions or actions. Also notice that the condition and action classes have no reference to each other or the rules engine. This is key to both code reuse and extensibility. the refactored rule engine, condition, and action classes are completely independent of each other. All they share is a reference to Rule.Type. Some other system may use any of these assemblies independent of each other with the only caveat being they will need to reference the Rule.Type assembly. The other thing we gained with this approach is we can now Extend the rule engine (make it execute new conditions and actions) by simply adding a new Condition or Action that implements ISupportRules and passing it into the ExecuteRules method as a part of the generic list. We can do all of this without recompiling the RefactoredRulesEngine which is the goal of the OCP. By the way, this design approach is called the Strategy Pattern.

If you haven’t noticed yet I’m leaving out one major piece of the puzzle. How does the generic list of rules get generated? I’m going to wave my hands here a little bit and save the details for another blog. We would use a creational pattern (one of the factory patterns). If we assume we are consuming the table outlined in our first solution this factory would accept a Ruleset ID and magically return the generic List<IsupportRules> of rules. The implementation of the factory pattern could be written in such a way that each time you add a Condition or an Action the factory would need to be recompiled or we could use a provider pattern and use a configuration file to allow us to create these new Conditions and Actions without a recompile.

To summarize things a bit: conceptually we have this RulesEngine that is relatively complex (much more complex than I have written) and we want to write it, test it, and leave it alone. At the same time though we have this need to enhance the system by adding more rules. By using using the strategy pattern we now have this stable rule execution engine that can execute any condition or action that implements the ISupprotRules interface. Because we inject a list of conditions and rules into the ExecuteRules method we can do all of this without recompiling the refactored rules engine. Another approach we might have taken to satisfy the OCP is the Template Method pattern. In the template method pattern we would make use of an abstract class to define the skeleton of an algorithm, then allow the concrete classes to implement subclass specific operations.

My design review checklist

One of my favorite design pattern web sites: http://www.dofactory.com/Patterns/Patterns.aspx

Sunday, July 12, 2009

Design Review and the Single Responsibility Principle

Back when I first started managing a team of developers one of the things I instituted was code reviews. We created some coding standards and a list of best practices to follow including naming conventions, error handling, and optimizations around performance and we inserted a review step into our development process. After a while writing code that followed our standards became second nature to the team so we would go into the reviews, validate the code, and wrap up the meeting in a very short period of time. Where the process fell short was we were focusing on the code, not the design. Writing good, clean, uniform, and optimized code is great but what is probably more important is to have a good design.

A good design has a lot of characteristics but one of the most important signs of a good design is a solid breakdown of class responsibilities. The Single-Responsibility Principle states that a class should have only one reason to change. (“Agile Principles, Patterns, And Practices in C#” Robert C Martin and Micah Martin).

Here is an example requirement:

When a file is dropped into a folder, attach some meta-data to it and move it to a secure location (say a file stream).

From a high level there are a few things we need to do

  • Monitor a folder looking for new files.
  • Create meta-data.
  • Associate the meta-data to the file.
  • Stream the file to the database.

A first pass at defining the classes might be as follows:

  • Create a FileHandler class responsible for monitoring the folder and moving the file to the secure location
  • Create a Document class responsible for creating the meta-data and associating it to the file stream.

For the sake of this blog I’m only going to look at the file handling logic. I do think the FileHandler logic defined above is a defensible breakdown of tasks, the FileHandler object takes care of everything having to do with the files we want to process. What I don’t like about the break down is we have put the file monitoring logic and file transferring logic in the same class. If we change the requirements down the road so that we only look for files with certain extensions we would have to change the FileHandler implementation. If we change the requirements and we need to support a different secure storage location (such as a third party document management system that doesn’t support streaming) we would again have to change the FileHandler. With the high level design I defined above we would have 2 reasons to change the class

  1. because the file monitoring logic changes
  2. because the file transfer logic changes.

This is a violation of the SRP.

Still not convinced? Let’s look at another change that in my opinion tips the scales in favor of separating the file monitoring and transferring logic. What if we want to allow documents to be imported into the system via a web service? We don’t want to duplicate the file transferring logic so we would want to employ the services of the FileHandler object. We certainly don’t want or need the file monitoring logic in our web service, therefore I favor putting the file monitoring logic and file transferring logic in different classes.

My design review checklist

 

Monday, July 6, 2009

Refactoring code to break circular references

A few months ago I started a 10 week project for a new client. The two main tasks I had to complete were upgrading the app to the latest .Net framework and to add some document management capabilities. The Application was a 5 year old VB.Net Win Form app, maybe 80K lines of code in 30+ assemblies. Naturally I wanted to get the source code, load it up, and take it for a test drive. Everything was organized in projects and solutions were not used so the first thing I did was make a solution for each layer, then make a solution of solutions so I could step from the UI layer all the way into the DAL. It was not a huge surprise to find out I could not compile the solution because I did not configure the project dependencies. I started going through the pain of configuring project dependencies until I hit a brick wall – there were circular references! How the heck did the assemblies get built?

It turns out there was a custom utility written to manage the compilation of all of the assemblies. To add clarity to the picture I’m trying to draw, assume we have four projects named A, B, C, and D. Assembly A references nothing; Assembly B references assembly A and assembly C. Assembly C references assembly B and A, and finally assembly D references A, B, and C. Assembly B and C reference each other – the circular reference. Before this project I would have said that is impossible unless you use reflection – it will not build. When the project was first started the circular reference did not exist so creating assembly A, B, C, and D was no problem. Now that a reference assembly exists you can introduce the circular reference and get things to build as long as you don’t manage the build with a solution, you have the old assemblies around (for reference), and you do not introduce a public change to assembly B that C depends on at the same time you introduce a public change to assembly C that B depends on. Clear as mud? The utility would compile A, B, C, and finally D in that order. What you would often see was that the compilation of assembly B would fail but the rest would compile but if you run the build again, B will build (because any public change to C that B depends on got into assembly C the last time the build utility was run). With four assemblies and the dependencies I have defined you would have to run the utility two times at most to get everything to build but with 30 + projects you can imagine that the build utility might have to be run four or five times to get everything to build successfully. If you follow all of these rules it is possible to make this work but if you are working in a team environment this would be very difficult to pull off and make no mistake about it – I am not advocating this approach.

Back to my problem, here I am day one or two at a new client and as far as I’m concerned I have uncovered a major problem with the architecture of the app that I’m supposed to be enhancing. It has always been my motto to try not to throw out a problem without having some possible solutions in mind. Rather than hitting a customer with all bad news and causing a panic I prefer to state the problem and seriousness of the problem but then rather than coming across as doom and gloom I like to show them that the sky is firmly in place by being able to give them an estimate of what it will take to fix the issue. From here forward rather than discussing assembly B and C let’s talk about an Order class (in an Order assembly) and an Item and items class (in assembly Item). Now, finally, the point of the blog.

Problem statement

  • There are circular references in the application which makes it difficult and potentially impossible to build. (you could also state this as I want to introduce a circular dependency into my code but .Net won’t let me)
  • Debugging and maintenance is difficult because you cannot run multiple projects in a single solution.
  • Building the application is burdensome because sometimes I have to execute the build utility 3 – 5 times to get all the assemblies to build.

Circular

Other Factors

  • There is virtually no QA time budgeted for regression tests.
  • There are no unit tests.
  • There is no time budgeted for any refactoring other than getting the code to work with the latest .Net framework.
  • I had to convince the other architect (also a contractor-not on staff) that we could pull this off quickly without introducing a bunch of bugs and by the way I have no idea if he was responsible for the problem or inherited the problem.
  • There is no development staff on site. The DBA can go in and make straight forward changes but is not comfortable making major changes.
  • Every few years several contractors are brought in to upgrade and maintain the application.

Following are the solutions I proposed.

1) Combine a bunch of assemblies into a single assembly.

Pros: Relatively easy, probably no fundamental code changes but a lot of housekeeping type changes such as modifying declarations using fully qualified names and that sort of thing.

Cons: The resultant assembly would get pretty large and presumably contain a lot of unrelated objects. You would lose the more logically organized smaller assemblies. Having multiple developers working on a single assembly at the same time can also be a pain therefore I believe maintaining one large assembly is more difficult than maintaining several smaller more logically organized assemblies.

2) Use interfaces and a simple Factory. Create the following interfaces; IItem, IItems, and IOrder. Item would implement IItem, Items would implement IItems and Order would implement IOrder. The assembly containing Item would not directly reference Order, rather it would reference IOrder, The assembly containing Order would not reference Item rather it would reference IItem and IItems. The Factory would have to be in charge of instantiating Item, Items, and Order and therefore the assemblies containing Order and Item cannot reference the Factory assembly so we also need an IFactory interface. The difficult thing is that since Item and Order cannot instantiate the Factory (we'd have the circular reference back), the Factory would need to be created and passed in from a top level object such as a service or executable. A code fragment might be as follows:

{
Public Interface IOrder
...
End Interface
}

{
Public Class Order implements IOrder
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Order(ByVal factory As IFactory)
mFactory = factory
End Sub
''a property that creates Items
Public ReadOnly Property Items() As IItems
return mFactory.CreateLicences(Order id...)
End Property
End Class
}

{
Public Interface IItems

End Interface
}

{
Public Class Items implements IItems
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Items(ByVal factory As IFactory)
mFactory = factory
End Sub
End Class
}

{
Public Interface IItem

End Interface
}

{
Public Class Item implements IItem
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Item(ByVal factory As IFactory)
mFactory = factory
End Sub
''a property that creates an Order
Public ReadOnly Property Order() As IOrder
return mFactory.CreateOrder(Order id...)
End Property
End Class
}

{
Interface IFactory
Function CreateOrder(Order id...) as IOrder
Function CreateItems(Order id...) as IItems
Function CreateItem(Item id...) as IItem
}

{
Public Class Factory implements IFactory
Public Function CreateOrder(Order id...) as IOrder
'Creation logic
Return CType(TheNewOrder, IOrder)
End Function
Public Function CreateItems(Order id...) as IItems
Return CType(TheNewItems, IItems)
End Function
Public Function CreateItem(Item id...) as IItem
Return CType(TheNewItem, IItem)
End Function
End Class
}

SimpleFactoryUML

Pros: We get the looser coupling we are looking for and we'd keep the many smaller assemblies versus the one large assembly thereby simplifying the build and maintenance of the application.

Cons: The top level object would have to be rebuilt any time the Factory is rebuilt which would be any time a dependent Business object changed - not that big of deal. Probably a larger code change required than option one or three.

Derivatives: Using the Builder, Factory Method, Abstract Factory pattern, or even an IOC container may be more appropriate, it might also make sense to have multiple factories but that is really an implementation detail.

3) Create some of the interfaces and some portion of the Factory as defined above. We would break the Assemblies' circular references by selectively using reflection in the Factory to instantiate the objects that are created least often. In the above scenario we might decide that creating Items from an Order is common but creating an Order from a Item is rare. We would allow the Order assembly to continue to reference the Item assembly and leave the code as is with the exception that the Order object would implement IOrder. Now in the Item assembly we remove the reference to the Order assembly and only reference IOrder and Factory. Factory still cannot reference the Order assembly so we would have to use reflection to instantiate the Order.

Pros: The code changes would be modest and fairly safe.

Cons: This is basically a hack, it does not solve the circular reference problem it simply works around it. Applications using reflection tend to perform slower and are prone to see more runtime errors as many problems that are normally caught at compile time can no longer be caught until run time.

Derivatives: Again, multiple factories might make more sense.

As you can tell from the other factors section I had to consider a bunch of things including politics when I proposed the solution. I didn’t think taking a couple weeks and introducing an IOC would fly. I knew we had to implement something that was simple enough for the DBA to still feel comfortable going in and making small changes. I knew we needed to fix the problem in a way that didn’t destabilize the existing application. We went with a combination of option one and option two. A few of the assemblies were very small and related to other assemblies so we combined assemblies where it made sense. The rest of the circular references were broken using the simple factory technique. Using a lot of copy/paste and search/replace the changes were implemented in a few days. There were a few bugs introduced but they were quickly found and turned out to be errors in the copy/paste “recipe”. Things went so well, In fact, that when I asked for two more days to clean up the 500 + compiler warnings I was given the OK without resistance. Why I think you should clean up all of the compiler warnings is a topic for another blog. If you would have done something differently I’d love to hear from you.

Tim