"Simplicity is the ultimate sophistication."
Leonardo da Vinci
There has been much discussion about the benefits of Agile Software Development and how to make it work. This is all good process and organizational stuff, but nobody ever discusses how to actually design agile software. I will introduce an architecture that I call Malleable Systems Architecture (MSA) which provides system architects and developers a physical framework that builds on Agile Software Development concepts. The MSA framework has many benefits over other frameworks for building business applications. Some of these benefits are:
- Allows a system to quickly and easily adapt to changing business needs
- Maximum code reuse, minimum waste
- Reduced system complexity
- Reduced data redundancy
- Reduce development time and cost
- Increased quality, fewer bugs, reduced testing time
- Increased performance, scalability, reliability and robustness
All good stuff right? But how does it all work? It all starts with some basic design principles.
The Malleable Software Architecture starts with underlying principles that guide the physical design of the system. I will discuss these principles and how they are used to build a typical MSA system.
- Separate Code from Content
- Remove process from the code
- Strive for simplicity and efficiency
- Ensure data integrity
- Create table driven designs
- Know your data (what is static, what is dynamic)
- Define efficient data access methods and standards
Separate Content from Code
The basic idea behind separating code from content is to separate the business from the programming; distinguish between what and how. Decouple the tool from the task. For example, don't just build a daily invoice faxing solution. It is much better to build a way to fax documents in batches that runs daily. The subtle difference is that the first method faxes only invoices, while the second approach can fax any document or report which inherently includes invoices. In this example, the "code" is the ability to fax documents. The "content" is the actual invoice. By deliberating making a distinction between the tool or feature and the business content, it is much easier to reuse the technology and much easier to make changes since you are not tied to a specific implementation or process.
This concept can also be extended to the persisting of data. For this analogy, data is the code. The content is the format of the data for consumption by the user or another system. It is usually a bad idea to store the data formatted because it becomes awkward to "unformat" the data for use in different systems and formatted data usually occupies more storage space.
If format is included with the data, preparing formatted data for use in other systems is a two step process. First you have to strip the formatting away, and then reformat it for the specific application. If data is stored unformatted, not only does it take less storage space, but formatting for different systems is only a one step process since the formatting doesn't have to be removed first. An example of this is a telephone number. Storing a formatted telephone number might look like (905) 555-1212, while just storing the data would be only include storing the number as 9055551212. This is discussed in more detail below in the Simplicity section.
Remove Process from the Code
Computers are great at crunching data into information, storing and retrieving information, but are terrible at solving problems. People are good at solving problems, but terrible at crunching numbers. Our grey matter computers can also be inconsistent at the storage and retrieval of data. Design your computer systems to do what they do best and leave the problem solving to the humans.
This really made sense to me when the year 2000 crisis loomed. People where talking about elevators not working, sewage backing up in the streets, massive power outages and plagues of locusts. Having built computer systems for the past decade, I was aware that there may be some truth to these worries. My wife, who is not very computer literate, had a simple answer to all these worries. She said "That's why there are people". People control the machines, not the other way around (we survived August 29th, 1997).
People are generally flexible; computers and software are inherently not. Let people control the process. The computers, software and systems are just tools to help people get their jobs done. We all have seen examples where a process has been built into a system and it is obsolete before it even gets implemented. Often, the first thing users do to new processes is to figure out ways around them. Don't pretend that your users will use the system as you have intended, they are generally much more resourceful than that.
Instead of building processes into your systems, focus on building data integrity and let the users control the process that works for them. Remove business rules where possible to keep the system simple and flexible. Design systems to empower users, not restrict them. Don't rely on a process to ensure good data. Use data integrity at the database level to prevent bad data from entering the system due to processes that don't work.
Strive for Simplicity and Efficiency
Ironically, it is usually easier to build a more complicated solution than a simpler one. Almost always, the simpler solution is the better one. Simpler designs have fewer pieces to maintain, few lines of code and fewer sources of bugs. In order to create a simpler design, the requirements are distilled down to what is actually needed, which builds a more flexible component that is easier to reuse. It may take longer to engineer a simple solution, but in the long run you will save time and money when it comes to implementing, maintaining and reusing the code.
While a solution is evolving, I like to do what I call the "Convergence Test". If the solution is becoming more and more complicated as it is being developed, and fixes to problems are escalating, the solution is diverging. This is a sign of trouble. Complexity is breeding more complexity and you are getting further away from the best solution. It is time to take a different approach. If the solution is eliminating complexity and with it potential problems, the solution is converging to the best answer.
For an example, consider a simple telephone number. Most systems persist the data in a character field of perhaps 20 characters. Therefore only a certain subset of characters are valid (0 to 9, brackets, dashes). Other characters need to be prevented from getting persisted in the telephone number field or somebody will use it store the name of their clients' dog if given a chance. Not only does a telephone number have a limited set of valid characters, but the position and location of the characters is important as well.
One solution is to your assume the database will only receive valid numbers. Each data access point would have to ensure the data is correct. In the world of Service Oriented Architecture, this can prove to be a challenge. Data can come for many different sources. Legacy data dumps, Web applications, Web services and Windows applications can all be thrown into the mix and each method and service would have to have its' own complex validation method; a coding maintenance and testing nightmare. This is also not a good solution since the database, by design, should not allow bad data to get into it and leaves itself open to whatever logic the other systems provide.
A better solution would be to put some data acceptance code on a database trigger for inserts and updates to eliminate or reject bad characters and formatting. This would protect the database from bad data and provide a single point of logic to parse telephone numbers. That would work, but the database trigger logic could become complicated for different phone number lengths (just extensions, no area codes, etc.). In the SQL Server world, TSQL programming can get cumbersome for string parsing and validation. SQL Server 2005 provides the ability to use custom functions written in the language of your choice and compiled from Visual Studio. This works surprisingly well, but there is even a better, more "convergent" solution.
Numbers are Numbers
Telephone numbers are just that, numbers with a certain formatting applied. North American telephone numbers look something like (412) 555-1212. This is really the number 4125551212 with a format applied. By separating the code or data (4125551212) from the content ((412) 555-1212) we get the number and the format. The format is consistent for ten digit numbers so we don't have to persist it. The number can still be rendered with a format applied as required to suit the user, but there is no need to store the formatted number. It contains no value or extra information. Formatting the number just makes it easier to read.
If we persist the telephone number in a numeric type field, all the valid character storage issues go away since inherently a numeric field can only accept numbers within the range of the numeric datatype. If you really want to ensure valid data, you could check for a range of numbers (greater than zero, less than 9999999999) as a constraint on the telephone data field. Another big benefit is that the data storage requirement for a numeric type over can be as little as one third the size of character based storage. (For more detailed information, see my other article "Telephone Numbers in SQL Server 2005".)
The telephone number storage solution is converging. We have eliminated complicated data validation routines for each service and complex server-side parsing to accept data. We have reduced the data storage requirement by a third. That is an elegant, simple solution.
In Malleable Software and Systems Design - Part 2 (coming soon!), I will continue discussing design principals.
- 30th October, 2007: Initial post