In this article I am going to be talking about my personal experiences and observations. Much of my experience has centered around internal enterprise software, not shrink-wrap software. I feel that it is important to tell you this up-front because I believe that while many of the observations are valid in both worlds, others may not be.
A few weeks ago, I posted a question in the lounge asking how others defined "high quality software". The responses I received were interesting and very helpful. (If you're interested, the link to this thread is here). Since I posted that message I have come to believe that "high quality software" is not the ultimate goal, what I want is high quality results. High quality software is an important component of this, but just one of the components.
Who wants low-quality results? Of course, nobody does, but what does high-quality results really mean and how can it be achieved by a standalone developer? I for one often find the results of my labors to fall short of the level of quality I strive for. This is not necessarily a terrible thing, though. I strive for high quality and I am constantly learning what that means and how to achieve it. Being ready to accept missed targets and being ready to learn from those experiences is critical to long term success.
Since I posted the lounge message I have been spending much time thinking about what the term "high quality results" really means and how it can be achieved. This article is the my first attempt to share some of my basic beliefs on this subject and hopefully provide some much needed techniques for others.
My Perspective on Things
About one and a half years ago I started a new job working for a small market research firm. My boss and I came from a previous company where we tried to build a consulting business together. (Technically, it was his business but he and I worked together as partners and are close friends.) When we came onboard, we believed that we could best serve our new employer by opening up opportunities for sales through new and improved technology. To than end, we have developed numerous technological solutions (web based interviewing, reporting, portals, etc.) which can be used to make new sales. Over the course of the past year, we have gone from having no real clients for our new technologies to having more than we know what to do with. (I am the only actual software programmer, though my boss is an ex-programmer and provides excellent feedback, design concepts and support.)
Here is a brief overview of our software implementation. There is a total of 4 ATL COM Service applications running on 5 different servers. There are 4 additional COM dlls located on some of these servers. There is a total of well over 50 COM interfaces. There are 6 ASP based web sites. We use SQL Server 7 as our back end database server. 2 of our web servers are load-balanced using Microsoft Network Load Balancing services. 3 of these 5 web sites is expected to be up and running 24/7 and are exposed to external users. One is not used much but must be 24/7 when it is being used. The other 2 web sites are for internal use only. In addition to the web site required software, there are 3 GUI applications and over 20 console applications associated with the total system.
One of my personal (and evolving) objectives has been to create a work environment which encouraged a good process, and resulted in software which could be managed by others. In this article, I am going to look at each of these 2 objectives and try and share my experiences and recommendations about each. Please keep in mind that I am no expert, just a working class programmer like most of you who just wants to go home at night, play with my kids and safely assume that everything is working well at the office (or at least someone else is dealing with the problems.)
A Good Development Process
Most (many, some???) large development firms have a well defined development process consisting of some basic reproducible set of procedures and behaviors. This is fine and good for large companies, but I suspect that this is highly unlikely for the standalone developers out there. For us there is no outside enforced process to the development cycle. Instead we often work with managers and clients who don't understand the complexities and subtleties of the process and don't care even a little bit about how we accomplish our jobs (though some may try and measure progress in idiotic ways.) For us the only process to our jobs is the one we self-impose.
If you're anything like me, you either currently do not or at least in the past you have not had a real defined development process. Who needs it anyway? Right??? That's basically the way I have felt about it for many years. Not anymore, and I'll tell you why. Remember earlier when I told you about my current job. Well, it is getting out of hand. The workload is increasing, the bug list (what's that?) is getting longer, the changes are coming much faster and the bodies are getting harder to hide (insert evil laugh).
So what simple, practical and time-conscious practices can we self-impose to make things easier? For me it is very important that the process I follow does not take a tremendous amount of time or red-tape. I have struggled with many of these issues and still do. What I have attempted to do is to take the advice from this site, many books and web sites and make those suggestions work for me. Here is my observations and suggestions.
I think this is pretty much self explanatory, but for some reason I suspect that there are many 1000's of developers out there who do not backup their code, binaries, support files, databases, scripts, etc. I know that for a long time my backup practices were, how you say, non existent. The reality is that the source code and supporting executables, libraries, files are the life blood of my employer.
Personally, I backup to a CD about every 2 weeks. I always create 2 copies of the backup CD. One I keep at the office in my file cabinet and the other goes home with me and is kept in a fire-proof box. I realize that backing up every day would be better than twice a month, but the reality is that I just don't have time to do this everyday. Sure I could setup an automated system for doing this, but I would need more than just the CD writer on my development PC. If you can do it everyday efficiently and without significantly decreasing your "work time", by all means please do.
One thing I have learned is that backups are not for reverting back to previous versions of the code. Backups are for the "sky is falling" days which will eventually arrive. Another thing I can recommend is to backup not just the source code, files, etc, but to also backup any databases, the scripts to create those databases and any critical configuration for the applications I write and maintain. I often backup screen shots of critical configuration screens where there is not easy "Export configuration" option.
I can't count the # of times that I have had to revert back to a previous version of code because I made a mistake or introduced an unexpected bug. Also, sometimes I need to test a particular implementation to determine how feasible it is. Source control is really the only good way to do this.
Personally, I use MS SourceSafe but there are many alternatives out there which are at least as good and many are better. I have struggled with finding the best way to use source control and have waffled between not using it all, checking every little change which compiles and only checking in changes on a schedule.
What I have found to work is to only check-in changes when I perform a production build. When I have tested my changes and am ready to move the final binaries into our production environment then, I check in all my files and check all of them back out. Because I am the only software developer at my employer, I can safely keep all files checked out all the time. This works well for me but may not be appropriate if you are working with a team of developers.
One positive side affect of this is that it is relatively easy to label all the files in a project at the same time with a version number of build date.
Design then Code
I feel that it is important to have a philosophy for each major system I design. The philosophy should be simple and easy to follow and there may be more than one. An example of this can be found in a web based interviewing package I have developed. One of the philosophies of this system was that it should support the broadest range of browsers possible. Another was that I wanted to make changes to one portion of the system (the rendering engine) very easy and painless because I felt that this area of the system would be most susceptible to change. When I implemented the system, I designed it so that thread concurrency issues did not have to be dealt with in the rendering code and that all rendering functions/classes followed a well defined pattern.
Once I have a philosophy for a system, I usually find it easy to design an architecture that supports the philosophy. My philosophy on architectural design is that the architectural design should be as simple as reasonably possible, but no simpler. It does not need to define the names of every class or even the low level interaction between these classes, it needs to define the basic structure of the system (abstract from classes, code) and the desired behavior of the system.
Having a philosophy helps to make designing an architecture for a system much easier. More than that, though, having a philosophy helps to direct development and maintenance work to the project because the new work is done in keeping with the original design philosophy.
Plan for Changes
Change is an inevitable part of our jobs. Failing to recognize and plan for change is a common mistake and one which I have suffered from all too often. The problem with changes is that nobody appreciates the difficulty a "simple" change can entail. If a "simple" change requires altering key components of a system, it is no longer simple anymore.
When you are designing a system, coding a class or library, or whatever expect and plan for changes. When possible, part of the overall design of a system should facilitate changes at as many levels as possible.
Some simplistic steps you can take to plan for change:
- Only use Boolean flags when you are 100% sure it will always be an ON/OFF situation. A common mistake is to use Boolean flags and not anticipate the addition of a third or fourth, etc condition. Instead, before using a Boolean condition, consider using a DWORD and using a bit mask.
- Encapsulate flag checking code into classes instead of trying to check flags in numerous places. When "exceptional" conditions occur, they can be dealt with in a single place instead of all over the code.
- Avoid duplicating computations in various parts of the code regardless of how simple they are.
- Develop common libraries which are designed to offer non-intrusive extensions. One simple, yet powerful example of this is a symbol substitution parser that I developed for our web interviewing software. I designed the parser so that it would recognize symbols like [SOMESYMBOL], but I later extended it by allowing for [COMMAND:SYMBOL], thus allowing old-style symbols to continue to work as usual while adding new capabilities. By leveraging this technique you can eliminate common "traps" in your situation.
Making sure that key components are as flexible as possible can reduce late nights, make you look better to manager/client/co-workers and improve your ability to schedule changes.
"An ounce of prevention is worth a pound of cure."
Testing changes in a "real world simulation" is critical to insuring changes will work in the "Real world". It is not enough to just test what I think is affected by my changes, I must test the whole system after each change to be sure there are no unexpected side effects. It is not uncommon for a seemingly simple change to have very unexpected results.
In my situation, I have 4 custom Windows NT services, 4 ASP apps and several GUI and console apps. The NT services run on a dual processing system with lots of memory which can expect quite heavy loads. It is very important that I test my changes on a similar machine. Testing on a single processor machine under no-load is just not enough.
One thing I can say from experience is that there is no silver-bullet software package or toolkit which can make testing fully automatic. Even the best tools require significant configuration and must be thought out very well to insure that the tests are adequate.
In my situation, I am working with a mission critical web-site which must work under a certain amount of load and simply cannot fail. I use the Microsoft Web Application Stress Testing Tool to simulate load, but I do not really on it to do the test for me. Its responsibility is not to automate testing, but to assist in the process.
Expect changes to cause problems
Eventually, some seemingly innocuous change will case an unexpected problem. No matter how well you test changes, eventually one will bite you. Having a good source control and backup procedure is critical here, but there is another step you can take to make life a little easier.
I suggest that you keep a "last known good" log on your "production" systems. At my work, we have 3 servers which host our internet sites. All of the ASP code, configuration files and service exe's, DLLs, are located under a "mysoftware" directory on one of the machines and the other machines access these files through shares. One of the sub-directories is called "LastKnownGood" and inside that directory is a set of directories named after the date they were created on (I use the "YYYYMMDD" format, but another date format may be appropriate for you) for the date of each change I make to the production system. Whenever I make a change, I create a new directory under the "LastKnownGood" directory and copy everything to that new directory. (I choose to copy everything because it is not too many MB and it is easier to restore everything than 1 or 2 things and be sure that I did it correctly.)
If a change I make causes the system to crash or have other problems I can relatively quickly bring the system down, restore the LastKnownGood configuration and restart the system.
This has saved my butt at least once and I expect it will do so again.
Maintain an "issue" list
I have come to the point where I simply cannot remember all the issues that I or others have come across. Maintaining a list of these issues is the only way to keep myself aware of what I need to know and give me quick access to that information.
I have tried several approaches to keeping an issue list. At first, I tried keeping a "bug" list, but I quickly discovered that this is not enough. I also tried having a highly defined issue list with lots of fields to describe the "issues". I never used this solution because it was just too cumbersome. I have tried using Excel, MS Word, notepad, and a few issue tracking tools
I am now using DevTrack which I like pretty well and would at least recommend that you take a look at it. It allows for plenty of customization and has pretty good reporting capabilities.
The one thing that I can tell you that is the most critical point is that it has to be usable by you and it can't interfere or be a pain in the butt. For me, that meant creating a minimum of entry fields. The fields I use are:
Type (Issue, Bug, New Feature)
Component (big components, not low level ones)
Priority (ASAP, High, Low)
It is not possible to remember all the issues that arise. Being able to recall issues, whether they be bugs, features, ideas, etc is critical to being able to cope with them and plan ahead.
Using an issue list allows for a ready-made documentation library. You can easily include (in the description or activity records) details on the issue as well as ongoing research and notes regarding it. This information can be easily accessed by you and others for later reference.
Having an issue list also means that when someone asks "what else needs to be done", you can refer them to the list and not have to try and recall everything from memory or be expected to.
Issue lists can be shared so that others besides yourself can enter issues.
I for one don't want to always be the one person who knows "what is going on" in the systems I develop. I do not want to be solely responsible for the day-to-day operations and I absolutely do not want to be the person who is called at 2am every time something happens.
Having someone else ready to take over a system after it is developed means that you can move onto the next project relatively unfettered.
Involving someone else in our work is almost always beneficial (if it's the right person). Not only can that person help with common issues, provide feedback and test things, they can share (or take over) the responsibility of maintaining or operating the final product.
I am not talking about other software developers here, though. I am talking about whoever can and should be responsible for the final product. They will have to use it and they can provide valuable feedback as to how it needs to work and allow for them to interface with it.
Having someone involved early in the process means that you do not have to be the only person who "knows" the system.
Seek advice and review
I find that talking to someone else about a particular issue can help to illuminate issues which I did not originally think of and help to eliminate the potential problems my short-sightedness may have causes. Also, by expecting my work to be reviewed I tend to think more about it and try and make it the best it can be.
When possible, I try to consult with my boss on important technical, architectural, user interface and philosophical decisions. (My boss is a tech guy and ex-programmer so it helps, but being a tech person is not a mandatory requirement.) If you do not have a suitable person in the office, it may be possible to ask some questions here on CP and newsgroups. At the very least you may find confirmation of your ideas.
One very important thing here is to not be overly sensitive to criticism. The goal here is not to build your ego, but improve yourself and the system you are working on.
There are many other processes which you can apply to your development work, but these are some of the ones which I have found most helpful. Please tell me about other processes you follow which you find helpful. Also, please let me know if you disagree with any of my processes.
Software Someone else can Manage
If you have ready the article up to this point, you are probably thinking that I am being redundant with this issue. In fact, I have been redundant with regards to this because I have found it to be of significant importance. The bottom line for me is that I do not want to always be the person called at 2 AM when something goes wrong, nor do I want to be the person with all the answers. What I want is to delegate the responsibilities of the final product to someone else so that I can move onto the next new code or system.
I just can't stress this enough. The software I/we develop is usually not intended for just our use. Other people who will probably not be software developers need to be able to use the software, manage and operate it in an efficient and effective manner.
One of the best ways to accomplish this goal is to work with the people who will receive the responsibilities early in the development process. By involving them in the design and implementation of the system, not only will we receive useful feedback, but we will have a ready made tester and we will be training this person to answer the questions which will inevitably be asked.
There are numerous steps we can take to insure that our final product can be managed by others. Most of these are quite simple while others may require some special attention to get it right.
User Interface Design
Entire books have been written on this subject so I won't go into great detail, but I will say that since the goal here is to make sure someone other than yourself can manage the final software, UI design is very important.
I have already written a couple simple articles on UI design and posted them to CP. You can find them at the following links.
Some other web-sites I would suggest are:
Reliable programmatic feedback and diagnostics
One thing that constantly frustrates me about software development is the issues involved with debugging production (release) systems. The fact is that there will be problems and that those problems must be dealt with rapidly. Most of the time, it is not reasonable to bring down a production web-site to run a program in "debug mode". Not only is it unreasonable to "debug" on a production web server, I/you don't want to have to do it anyway. I want someone else to locate problems and correct them. The only time I need to be involved is when a code change is required to correct the problem.
One of the key design goals of my software implementation has been to provide excellent run-time diagnostics and statistics to myself and others. Upon examining the requirements I decided that what was needed was 2 different forms of information delivery.
Basic run-time client-centric counts/amounts/times, etc are updated in a database. A web site exists for internal use which can query this database for various information. This web-site is targeted at internal company staff who are not technology savvy such as customer service reps, sales people, CEOs, etc. The information delivered through this site is (for the most part) not technical in nature.
In addition to this client-centric feedback, I decided that every application, service, COM DLL, console app, etc needed to produce a detailed event log. This event log outputs much more technical information that is targeted primarily at myself and the operator of the system, not end users. I decided that because one of the more important messages the system could and would produce related to database failures that the event log needed to be file based instead of database based.
At first the event log contained mostly just error events, but quickly I found out that this was just not enough to diagnosis most problems. There is too much going on in the system to just log errors. I have since moved to logging much more information including technical diagnostics about each application such as memory usage, CPU utilization, various internal average execution times, bytes transferred, received, etc.
What information is important in your event log? Well, I think that it depends on your specific needs, but I would shoot for more instead of less. When writing code, look for scenarios that could be problematic, be performance concerns, exceptions, etc. As many of these situations as possible should be logged to the event log when they occur in production. There are numerous logging mechanisms already posted on this web site which are quite good. I would strongly advise that you check these out before writing your own because others may have thought of issues that you did not.
If you decide to write your own logging mechanism (or when selecting an existing one), let me make some suggestions as to what information to include with log entries and some basics of how it should be implemented.
Even if your target application is a single-threaded one, I would strongly recommend that you implement your logging class to be thread safe. You do not want to have to re-implement this logging mechanism if you do not have to and this would certainly be an issue if not thought of early in the design/implementation of the logging mechanism.
Use message codes
I strongly recommend that you plan to create numeric based error/message codes and that these message codes be maintained in a database table somewhere. Error codes are great things because they are often easily remembered (for more common messages) and when they are not easily remembered, they can be well documented in the afore mentioned table.
Use text messages
Even though the event log will contain the message #'s, I strongly recommend that you include a detailed message in the log about the specific message. Even if this is just the text found in the DB, that is still better than no message at all. There are many reasons why I think this is important, not the least of which is that it makes it much easier for someone other than yourself to look at the log and basically figure out what it is saying.
Log the file name and line # where the log entry came from
This can make debugging a problem much easier because you can quickly look at the log and see that the event was generated in SomeFile.cpp at line #256.
Use one log format for all programs
It is much easier to train someone to read a specific type of file than to read 10 different ones. Also, see the next note for additional reasons why the logs should all be the same.
Make sure that the log can be programmatically read, not just written
This is a mistake I made early on in my choice of logging implementations which I have had to go back and correct. The problem with log files which cannot be programmatically read is that they are limited to direct human consumption. Let me explain a little more. In my situation, I am generating as many as 30 different logs across all the different applications, services, web sites, etc. Each of these logs is stored on a single server. For someone to read these logs, they must have access to this server which means remote access is an issue.
What I have chosen to do (actually, I am just now in the process of doing this) is to keep my event logs in a file, but I am writing a log reader service (yes, yet another service) which will read all of the different log files and filter the contents placing desirable messages in a database where it can be viewed through a web-site or GUI application. Also, this log reader has the capability of automatically sending out emails, pages, etc when certain events occur. This way, I can further empower a user to not only check this log but receive automatic notifications for specific events.
In addition, my log reader service can read an IIS log and filter those messages placing important ones in the DB as well.
Events should be flushed to disk immediately
This is very important. Anytime an event is written to the log, that entry must be immediately written to the physical file. It is useless to write an entry to the log if it never makes it to the disk before the application crashes.
Some other things that I would recommend you consider are:
Service type applications should have a pulse
I firmly believe that every service type application I write must write a pulse message to its event log at least every 10-15 minutes. Currently, I choose to write this entry every 5 minutes, but regardless of what interval you choose, this is very important.
The readon this is important is because if you have a log reader (or plan to write one later), the log reader can detect when pulses stop. So not only can the log reader detect problems while the system is up and running, it can detect downtime quite easily.
Let the Windows performance monitor run all the time
Often when a problem occurs in a production environment, the cause can be easily discerned by simple performance statistics. Network load, memory usage, CPU usage, hard disk space, etc. All of these values can be easily monitored through the Windows NT performance monitor application.
Documentation is an area where I still am not very good. What I can tell you, though, is that it is a very important aspect, especially when turning over responsibilities to someone else. I am not talking about code comments here, I am talking about documentation about the system the person will be responsible for. Configuration options, desired performance characteristics, known issues, things to watch for, what to do when X happens, etc all need to be documented to eliminate the need for constant interaction with this user.
One way documentation can be made simpler is by using your issue tracking database (you do have one of those, don't you?) to keep track of issues as they arise. As you solve these issues and implement new features, you can easily update the database with the details of what you did and why you did it. This way, the new person can look here before approaching you.
I am sure that there is a good way to handle documentation as a total process, but I have not found a method that integrates well with my current work environment and work load. Hopefully, I will discover the way to do this and share that knowledge.
Defined processes for changes, maintenance, crash recovery, etc
When we train someone to take over the responsibilities of a new software application or system, it is often not reasonable to expect them to automatically enforce good processes into the day-to-day operations of that software. This is not because the trainee is stupid or uninterested, but they likely do not understand the complexities of the system to the degree that you do and they do not understand the ramification of improper procedures.
At my current employer there are many procedures which are very important to the overall day-to-day operations of the system. These processes are not well documented yet, but I am working on them. Some of the key areas to define are:
A process for dealing with changes
As I've said before, change is an inevitable part of everything we do. Many changes, though, can be dealt with by others and should be. Before someone else can properly make changes, though, they need to understand that there is a process for making changes, testing them and placing them into the production environment.
The exact steps will vary depending on your specific needs, but whatever the steps are, they need to be documented and defined.
A schedule/plan for maintenance
As with changes, there is always a need for maintenance. When working with web sites, it is likely that there will be a need for checking disk usage, memory usage, CPU usage. Installing service packs and security patches, rebooting the servers and routers are also often important.
Again, the exact steps will vary depending on your specific needs, but whatever the steps are, they need to be documented and defined.
What to do when/if the system crashes (or becomes unavailable)
We would all like to live in a perfect world where everything works all the time and never crashes, but sooner or later reality will bite you and something will go horribly wrong. When working with web-sites there is a myriad of problems which can arise that will cause the system to be unavailable. Router problems, problems with the service provider, software issues, disk space, memory leaks, DOS attacks, hacker attempts just to name a few.
There has to be a plan for how to diagnosis the cause of a crash or system unavailability and how to deal with the various problems that arise. It isn't possible to cover every possible scenario, but as many as possible need to be covered and procedures defined for how to restart the system when they happen.
One thing that I strongly suggest you to is to maintain a down-time log. This log should include entries for all unscheduled system crashes or problems as well as entries for scheduled maintenance. Currently, I have created a project within DevTrack just for down-time logging. In this log, I record when the problem occurred, why it happened (ISP problem, hardware failure, software failure, etc), what I did to restore the system if applicable and any other information I see fit.
When to call me/you
While I want the person responsible for the software to feel that I am accessible, I do not want them to lean on me for things that they can do themselves. The preceding list of procedures and practices is intended to empower them to do it themselves. The only time they should need me is when they have exhausted their knowledge and the documentation I have provided.
High quality results are an achievable reality even for the standalone developer. To reach these results, though, we must have a good development process and we must have a mindset that drives us to empower others to use and manage the products we develop. Once we understand and implement these things we can begin to see their benefits. Hopefully we will become better rounded in our abilities and will be better software developers to boot.
As I have said before, these are my observations and suggestions based on my experience and work situation. I am quite interested in hearing from others about how well/poorly my solutions work for them and other solutions that they have found. So, please feel free to comment on this article and the subject matter I am trying to cover.