Monday, July 6, 2009

Refactoring code to break circular references

A few months ago I started a 10 week project for a new client. The two main tasks I had to complete were upgrading the app to the latest .Net framework and to add some document management capabilities. The Application was a 5 year old VB.Net Win Form app, maybe 80K lines of code in 30+ assemblies. Naturally I wanted to get the source code, load it up, and take it for a test drive. Everything was organized in projects and solutions were not used so the first thing I did was make a solution for each layer, then make a solution of solutions so I could step from the UI layer all the way into the DAL. It was not a huge surprise to find out I could not compile the solution because I did not configure the project dependencies. I started going through the pain of configuring project dependencies until I hit a brick wall – there were circular references! How the heck did the assemblies get built?

It turns out there was a custom utility written to manage the compilation of all of the assemblies. To add clarity to the picture I’m trying to draw, assume we have four projects named A, B, C, and D. Assembly A references nothing; Assembly B references assembly A and assembly C. Assembly C references assembly B and A, and finally assembly D references A, B, and C. Assembly B and C reference each other – the circular reference. Before this project I would have said that is impossible unless you use reflection – it will not build. When the project was first started the circular reference did not exist so creating assembly A, B, C, and D was no problem. Now that a reference assembly exists you can introduce the circular reference and get things to build as long as you don’t manage the build with a solution, you have the old assemblies around (for reference), and you do not introduce a public change to assembly B that C depends on at the same time you introduce a public change to assembly C that B depends on. Clear as mud? The utility would compile A, B, C, and finally D in that order. What you would often see was that the compilation of assembly B would fail but the rest would compile but if you run the build again, B will build (because any public change to C that B depends on got into assembly C the last time the build utility was run). With four assemblies and the dependencies I have defined you would have to run the utility two times at most to get everything to build but with 30 + projects you can imagine that the build utility might have to be run four or five times to get everything to build successfully. If you follow all of these rules it is possible to make this work but if you are working in a team environment this would be very difficult to pull off and make no mistake about it – I am not advocating this approach.

Back to my problem, here I am day one or two at a new client and as far as I’m concerned I have uncovered a major problem with the architecture of the app that I’m supposed to be enhancing. It has always been my motto to try not to throw out a problem without having some possible solutions in mind. Rather than hitting a customer with all bad news and causing a panic I prefer to state the problem and seriousness of the problem but then rather than coming across as doom and gloom I like to show them that the sky is firmly in place by being able to give them an estimate of what it will take to fix the issue. From here forward rather than discussing assembly B and C let’s talk about an Order class (in an Order assembly) and an Item and items class (in assembly Item). Now, finally, the point of the blog.

Problem statement

  • There are circular references in the application which makes it difficult and potentially impossible to build. (you could also state this as I want to introduce a circular dependency into my code but .Net won’t let me)
  • Debugging and maintenance is difficult because you cannot run multiple projects in a single solution.
  • Building the application is burdensome because sometimes I have to execute the build utility 3 – 5 times to get all the assemblies to build.

Circular

Other Factors

  • There is virtually no QA time budgeted for regression tests.
  • There are no unit tests.
  • There is no time budgeted for any refactoring other than getting the code to work with the latest .Net framework.
  • I had to convince the other architect (also a contractor-not on staff) that we could pull this off quickly without introducing a bunch of bugs and by the way I have no idea if he was responsible for the problem or inherited the problem.
  • There is no development staff on site. The DBA can go in and make straight forward changes but is not comfortable making major changes.
  • Every few years several contractors are brought in to upgrade and maintain the application.

Following are the solutions I proposed.

1) Combine a bunch of assemblies into a single assembly.

Pros: Relatively easy, probably no fundamental code changes but a lot of housekeeping type changes such as modifying declarations using fully qualified names and that sort of thing.

Cons: The resultant assembly would get pretty large and presumably contain a lot of unrelated objects. You would lose the more logically organized smaller assemblies. Having multiple developers working on a single assembly at the same time can also be a pain therefore I believe maintaining one large assembly is more difficult than maintaining several smaller more logically organized assemblies.

2) Use interfaces and a simple Factory. Create the following interfaces; IItem, IItems, and IOrder. Item would implement IItem, Items would implement IItems and Order would implement IOrder. The assembly containing Item would not directly reference Order, rather it would reference IOrder, The assembly containing Order would not reference Item rather it would reference IItem and IItems. The Factory would have to be in charge of instantiating Item, Items, and Order and therefore the assemblies containing Order and Item cannot reference the Factory assembly so we also need an IFactory interface. The difficult thing is that since Item and Order cannot instantiate the Factory (we'd have the circular reference back), the Factory would need to be created and passed in from a top level object such as a service or executable. A code fragment might be as follows:

{
Public Interface IOrder
...
End Interface
}

{
Public Class Order implements IOrder
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Order(ByVal factory As IFactory)
mFactory = factory
End Sub
''a property that creates Items
Public ReadOnly Property Items() As IItems
return mFactory.CreateLicences(Order id...)
End Property
End Class
}

{
Public Interface IItems

End Interface
}

{
Public Class Items implements IItems
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Items(ByVal factory As IFactory)
mFactory = factory
End Sub
End Class
}

{
Public Interface IItem

End Interface
}

{
Public Class Item implements IItem
Dim mFactory As IFactory
''Xtor receives a factory
Public Sub Item(ByVal factory As IFactory)
mFactory = factory
End Sub
''a property that creates an Order
Public ReadOnly Property Order() As IOrder
return mFactory.CreateOrder(Order id...)
End Property
End Class
}

{
Interface IFactory
Function CreateOrder(Order id...) as IOrder
Function CreateItems(Order id...) as IItems
Function CreateItem(Item id...) as IItem
}

{
Public Class Factory implements IFactory
Public Function CreateOrder(Order id...) as IOrder
'Creation logic
Return CType(TheNewOrder, IOrder)
End Function
Public Function CreateItems(Order id...) as IItems
Return CType(TheNewItems, IItems)
End Function
Public Function CreateItem(Item id...) as IItem
Return CType(TheNewItem, IItem)
End Function
End Class
}

SimpleFactoryUML

Pros: We get the looser coupling we are looking for and we'd keep the many smaller assemblies versus the one large assembly thereby simplifying the build and maintenance of the application.

Cons: The top level object would have to be rebuilt any time the Factory is rebuilt which would be any time a dependent Business object changed - not that big of deal. Probably a larger code change required than option one or three.

Derivatives: Using the Builder, Factory Method, Abstract Factory pattern, or even an IOC container may be more appropriate, it might also make sense to have multiple factories but that is really an implementation detail.

3) Create some of the interfaces and some portion of the Factory as defined above. We would break the Assemblies' circular references by selectively using reflection in the Factory to instantiate the objects that are created least often. In the above scenario we might decide that creating Items from an Order is common but creating an Order from a Item is rare. We would allow the Order assembly to continue to reference the Item assembly and leave the code as is with the exception that the Order object would implement IOrder. Now in the Item assembly we remove the reference to the Order assembly and only reference IOrder and Factory. Factory still cannot reference the Order assembly so we would have to use reflection to instantiate the Order.

Pros: The code changes would be modest and fairly safe.

Cons: This is basically a hack, it does not solve the circular reference problem it simply works around it. Applications using reflection tend to perform slower and are prone to see more runtime errors as many problems that are normally caught at compile time can no longer be caught until run time.

Derivatives: Again, multiple factories might make more sense.

As you can tell from the other factors section I had to consider a bunch of things including politics when I proposed the solution. I didn’t think taking a couple weeks and introducing an IOC would fly. I knew we had to implement something that was simple enough for the DBA to still feel comfortable going in and making small changes. I knew we needed to fix the problem in a way that didn’t destabilize the existing application. We went with a combination of option one and option two. A few of the assemblies were very small and related to other assemblies so we combined assemblies where it made sense. The rest of the circular references were broken using the simple factory technique. Using a lot of copy/paste and search/replace the changes were implemented in a few days. There were a few bugs introduced but they were quickly found and turned out to be errors in the copy/paste “recipe”. Things went so well, In fact, that when I asked for two more days to clean up the 500 + compiler warnings I was given the OK without resistance. Why I think you should clean up all of the compiler warnings is a topic for another blog. If you would have done something differently I’d love to hear from you.

Tim

No comments:

Post a Comment