This article is actually about two things. First, it is mostly focused on a checklist for carrying out due diligence when bringing a new technology system into your organization, but secondly, I have also found it extremely useful when taking on a new role/job in an organization where I have to take responsibility for an existing legacy system. Running through this checklist gives me a lot of structured insight into what I am facing into personally if I am taking ownership, or the organization if you are simply advising.
One of the standard 'big questions' in relation to technology requirements for an organization is do they 'buy, build or license' the solution they need. Each of these options has its pros and cons depending on the organization, what they need and what's available. Regardless of which way one ends up going, there is normally a process of evaluation of the best solution for the organizations requirement. As one gets closer to making a decision, especially for big ticket purchases, the evaluation moves beyond a superficial kicking of tyres, and a proper due diligence process is entered into. I have been involved in both sides of the company and technology acquisition and licensing process over the years, most often in relation to this technology due diligence process. I have touched on it again recently, so thought it might be interesting to share the checklist of things I expect to walk through. As always, these items are based on my experience, yours may differ. It is good to share knowledge and constantly improve so if you see something missing and worth adding please leave a comment below!
When a company is spending a lot of money on a technology, either to license the solution for a fixed period, or as part of a company acquisition, there is normally a requirement to evaluate the technology itself and the development, production and maintenance processes. This is to ensure that it is fit for purpose and will hopefully meet the needs of the purchasing organization. This should not be a confrontational process, rather one of positive engagement designed to ensure that both sides of the deal know what is being sold and what is being purchased. This is especially important when company founders are giving personal commitments to the purchasing organization in relation to product and service liability.
I have broken the checklist into a number of different sections, from high level legal things that need to be checked on a technical level, to low level code investigation, through production, maintenance, scalability and support.
A lot of things can fit into the catch-all general category. Normally, I treat this as a 'warm up' session, getting to know things from the top down before I start working again from the bottom up. It helps give a good perspective on things and also provides a foundation to raise questions as you start to dig deeper.
- Architecture overview
A useful way to look at the architecture is using the '4+1' architectural view model. Designed by Philippe Kruchten, this model takes a holistic view of the overall system, and aspects of the system, from different points of view,
- development - this is the programmers implementation view of how things hang together
- logical - outlines the functionality that is presented to end users
- physical - a system engineering/physical infrastructure view of the architecture
- process - refers to the runtime implementation, communications, scalability, concurrency
- scenarios - a series of use cases that describe different uses of the system, putting the other four views into perspective
- Vendor platform specific requirements
This refers to any requirements imposed by the choice of platform the system operates on. If its mobile, are Mac dev tools required, if cloud based, is it specific to Azure or AWS or Google, does it use infrastructure specific to that vendor.
- End to end workflow
This is where we document the end to end workflow of the system in general (which depends on the complexity of the system of course). There is overlap here with the process view of the architecture overview but in this case it dives deeper. One of the critical things to try to get a clear picture of is any processes that involve manual intervention, and conversely, areas where processes have been automated.
This is where we investigate the the nuts and bolts of the system. We are interested in how high or low the quality of code/technology in the system is. In the case of systems that have large data assets, we should also be concerned with looking at the quality of the data itself. A badly built system can introduce a heavy 'technical debt' on an organization and their engineering department, so its important to pay special attention to this area. If the cost of ongoing maintenance and future development is hampered by poor code or data asset quality, this needs to be called out and taken into consideration both financially and operationally in the acquisition process.
- Code coverage
While not everyone agrees with this metric, it is at least one measurement we can look at, and measurement is critical to understanding a system and being able to evaluate its current state and future maintenance considerations. If the system in question has little or poor testing in place, you need to question the robustness of the system.
- Code quality / optimization
Its all very well having unit testing, but if the code itself is sub-optimal then this will also cause long term issues. In this category we are not looking for shaving milliseconds off processes, rather, we are looking for sensible coding practices and watching for badly constructed processes and calls.
How many times have we as developers taken on someone else's work, or, even looked at our own work from months or years ago and wondered - what's going on here? ... this is the job of documentation. Look for inline code commenting, overall development reference documentation, notes about setup and maintenance of the overall physical/services architecture, specific notes if relevant relating to data (inputs and outputs), APIs provided and used, and finally, end user documentation.
Standard unit testing is of course expected, but not only on the backend. If this is a multi layered system, look for testing processes on the frontend and data repository as well. There are also places that are difficult to test, such as the UI, so see what approaches have been used here and how regression testing is handled.
- Defect management
Insight into both historical and open defects in the system can give great insight not only into how robust the system is, but where problems generally occur, what severity of issues they are, and if there is a general identification of types of issues and overall improvement over time.
- Error reporting
Waiting for customers to report problems from a live environment is a dangerous game. Best practice is to have a system of logging of internal system errors and ongoing monitoring and reporting on errors encountered. Examine how errors are handled in code, and what mechanism is in place for managing the logs.
I can think of very few systems these days that are not connected to the wider Internet in one way or another. I normally advise clients to think not what will happen 'if' their system gets hacked, but 'when'. Depending on the type of system, security may be of lessor or greater importance. For example, in the medical and financial sectors data may be deemed more sensitive than in others. Any investigation of the security of the system should examine the system and its data in operation, in transit, and at rest. Particular attention should be given to any areas directly open to the outside world.
It is rare that a bit of technology is taken onboard for its current status, usually the acquirer wants a return on their investment and wants to grow their service using the technology they acquire. This being the case, in the due diligence process we generally need to concern ourselves with the scalability of the technology, how this will work and look at any problems that might be encountered during growth. There are a number of headings in this area that we can look at to inform the process.
- Operating platform
Depending on the technology, we need to examine what the operating platform is and how it scales out. Even if we say, oh, its fine, its mobile, we only have to worry about device performance, well, what about the backend? .. does that need to scale? If the service is cloud based, has it been built using managed services such as DocumentDB on Azure or Elastic beanstalk on AWS, does it rely on a series of dev-ops managed virtual machines that have to be carefully monitored, orchestrated and maintained and parched manually?
Here we need to examine the system to see if it takes future growth into consideration. Are there any particular things that stick out as barriers to growth? For example, if the customer numbers were doubled in the morning, what kind of problems would this cause? .. how long does it take to onboard a new customer (technically), are there any on-boarding processes carried out for the customer that are not automated?
- Disaster recovery
No matter how well we plan for things, stuff always happens unexpectedly. When it does, from a diligence point of view, we want to know how can the technology or system be recovered. What plans are in place, that steps need to be taken for a full system rebuild, how about the urgent need to change deployment/production platform or cloud provider? What if there is a major data loss, what are the key backup and critical data/service/configuration storage methods?
- Diagnostic monitoring
One of the things I constantly tell people is that 'what we don't know is important'. Too many people trap and log errors, but how often do we look at and examine these logs to see what we can learn and improve upon? Here I am looking for any processes in place for logging and for analysis of logs that are generated to improce system quality and performance.
Its rare for a system not to have bottlenecks. Invariably, something creeps in that hasn't been thought through properly, or was never meant to be left in production for long, or was an ad-hoc feature that became important. These bottlenecks can be a killer for scaling. Look in particular at areas where the outside world touches into the system. How are customers taken onboard? Is there external data imported in a regular basis, if yes, is this automated, and how is it setup up? what monitoring of these systems are there? what happens if data fails to come in, or if too much comes in, is the system constrained by data volume in this manner? (etc!)
- Network diagram
Although strictly part of the architecture, I like to dive deeper into how the network and communications are constructed, with a particular focus on any area that might cause a slowdown if things were to ramp up quickly.
- Future plans
Knowing what we have today is one thing, but what's planned for the system into the future? ... how will the implementation of these new plans affect the existing system, will any area have to be reworked/revised as a consequence?
- Data repositories
Assuming the system handles data, how is this stored and managed? what is the current data load? what are the data storage limits? what is the current per day/month data growth rate, how can the repository handle 10x, 100x growth? what kind of indexing schemes are used? is data pre-prepared and denormalized for fast lookup if necessary? are in memory databases utilized? how are unique identifiers/primary keys generated, will this cause a problem with fast moving data?
- Load testing
Has the system been load tested under stress? if yes, what methodology was used and can iut be replicated? if not, what would it take to scale up a test and can it be done? Load testing like this can quickly identify scaling issues and are well worth considering.
Once you have examined the core of how the system hangs together, you need to examine it live in production. What are the running system metrics like? is the system under pressure? is it well utilized? are there any IO or network bandwidth problems? Careful examination of the system in a live production environment is extremely important - sometimes what people think is happening is not, and running some performance monitors on a system while carrying out maintenance operations for example can give valuable insight.
Once code is complete, how is it pushed into production? Check if continuous integration and deployment are utilized, carry out some minor changes or tests to see it in operation. If you had to create a complete new instance of the system or move it to another provider, how could this be done, confirm the steps required.
- Third party dependencies
It is important to identify and have documented any third party dependencies in the production environment. Not only is knowledge of what is used, and how, but also any licensing requirements of same. In addition, you need to check any third party items for potential problems with scalability and support and maintenance.
- Infrastructure health monitoring
We know we need to look at error trapping, logging and reporting on a code level, but equally important is what is done in relation to this on a system production level. Enquire as to what OS/VM/Container/Infrastructure health monitoring is carried out, how frequently logs are checked and what kind of automated reporting on more severe log items is carried out.
- Data archiving/purging lifespan
Determine what processes are in place for data archiving and where necessary purging. Check if archiving requires any systems to be moved offline or can it be carried out live, in addition, confirm the procedure for data and overall system restore tests and look for logs in relation to this to confirm.
It is rare that a system runs itself without issue, in this section we examine what structures are in place for development, maintenance and end user support.
- Existing development support
When new development resources are brought onto the team, is there a formal technical induction process, if yes, investigate what it is. Are there any contracts in place with third party suppliers or vendors to assist the development team, if yes, get sight of these and evaluate terms, conditions and costs.
- End user support
Unless the system you are evaluating is simply technically facing, you will have some kind of end user support requirement. One would expect that the developer of the technology you are evaluating has their own support systems in place, perhaps a database of customer issues, bugs, frequently asked questions etc. Getting sight of user support history can greatly assist in evaluating future support requirements within your own organization and how these might (or not!) scale, costs involved etc. User support also tends goes in a curve, with more hand holding up front as a user on-boards to a system, and as you introduce new features, but leveling off thereafter. It is important to get an appreciation of how user support fits into the overall picture as heavy user support can kill or dramatically slow down an integration project.
The legal requirements of due diligence can be broken into two parts. On one hand there is the big legal picture that talks about contracts, financials, etc., but from a technical due diligence point of view we are more interested in legal aspects of the technology itself; who owns it, are there third party licenses involved, are there limitations on use etc.
- Code ownership
On the face of it, who owns the code is not as clear cut as one might think. The Internet provides an open repository of code we can cut and paste from, and such code may or may not have certain conditions attached to it. Some open source code allows full commercial use, others restrict the use to non commercial or other open source projects, and of course a whole range of options in between. There is a useful list of open source licenses at opensource.org. The important thing to check here is what is the origin of the code, is it used under license, and if so, who owns the license and is its use permissible.
- License infringements
It is prudent to specifically ask for and inspect any indemnity insurance carried against third party licensing infringements. You want to ensure where possible that you don't transfer or spread liability for something that goes wrong, or if you do, that you mitigate the risk.
- Compliance and regulation
Depending on the country and industry in which you operate, it may be the case that your technology needs to adhere to certain compliance or regulatory laws or rules. This can range from privacy laws in relation to data, to standards and quality regulation in the case of medical and aviation technology. Talk to the business and business legal team to see if there is anything that needs to be covered off here on a technical level.
- Recurring licensing
Many products use third party licenses that are upgraded each year. Sometimes they are bought for one purpose and never upgraded, other times they come with an annual recurring license fee that is required for continuing use. Regardless of what type of licensing is there, any third party licenses need to be identified, checked and verified to see if there is any ongoing liability.
- License transfer
Some forms of technology license specifically forbid license transfer. This is done for a variety of reasons, and the upshot is normally a new license agreement needs to be put in place, usually with an associated cost. Most contracts will have a clause regarding ownership and transfer of ownership. If its not clear, pass it over to the legal bods for clarification.
Fit for purpose
This category is a moving target depending on the technology itself, where it is to fit into the organization, and the sector the organization is in etc. The bottom line is you want to know if the technology solution you are taking onboard is fit for the purpose that your organization intends to use it for, and ultimately, is it a good match or not.
- Gap analysis
Put together a table that lists out the expectations that your organization has of the technology you wish to onboard. Rate each expectation as 'must have', 'must be capable of having', and 'nice to have'. Must haves are items that are in general show stoppers, 'capables' are features that can be added on with little effort, and 'nice' to have are an added bonus. Don't fall into the trap of making everything a 'must have', in reality its difficult to find a perfect match and you should expect to have to change things once you acquire a new technology. Once you have your list, you can simply check things off as you go along and will very quickly see if you have a match or not (regardless of how everything else comes together in your overall evaluation).
- Technology match
Unless you are acquiring a development team along with the technology, be very careful about ensuring there is a technology match. You don't want to give the go-ahead for a system written using a Java or Python framework if your in-house team is entrenched in the .net world (or vice versa) without making appropriate recommendations for maintenance and development resources, training etc.
- Skills match/gaps
In addition to looking for matches and gaps in the base technologies used, also consider any skill matches or gaps. For example, your in-house team may have experience in MS SQL but the incoming technology is built on a NoSQL MongoDB and Redis cluster.
To reiterate, this checklist is based on personal experience and won't suit everyone, in addition, I'm sure others out there will have gems of wisdom to add - please leave comments so we can make it a better resource for all!
22-April-2017 - Version 1