![]() |
Development Lifecycle »
Design and Architecture »
Methodologies
Intermediate
Agile ProgrammingBy peterchenWhat it is, why you need it, and how to sell to your boss? |
C++/CLI, Windows, .NET, Visual-Studio, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
I want to provide thorough information for the everyday coder - without the "I want to sell you something so I have to look extra smart" obfuscation layer. These are my personal views, acquired by analyzing my own development "challenges", browsing the web, discussing it at CP and elsewhere, and trying it myself. It won't be a "brief introduction", so here's an overview in case you want to skip something:
We all know this: Your project has a neat design you're really proud of. You did care for all eventualities that came up in the early design studies, the schedule is approved, there's even an extra week "padding for the unexpected" - and you are happy you can finally start coding. Two weeks into it, the first change requests arrive. Nothing special, just the usual "can we do this, too?" - "Yes, no big problem, we just need to plug an Carbunkulator into the Arglebargle".
Halfway through it, things look less shiny. A few more functionality tweaks, a few bugs, your best coder one week in the hospital - the schedule lags behind big time. Your boss returns from a talk with a client, after they played around with the first beta. It turns out they never really needed an arglebarge, it is just in the spec because their old system had a big one that was very expensive. What they really need is a big gonkulator, and it must be fast - much faster than now. Oh, and the one feature that gave you headaches while designing - you can scrap that: the only one guy who insisted on this feature (although no one understood why) moved on to greener pastures.
Whatever the reasons - the application ends up different from what it was envisioned. Chances are, it's a mess of crooks and shortcuts across a baroque, utterly inefficient infrastructure. You might even get afraid of touching it - 'cause a little change here breaks something there. Every time you try to fix some nasty behavior, you have to wad through tons of interdependent code, and every function, every class you see screams "rewrite me". Far from what you wanted.
Interestingly, you can arrive at the same place by leaving out the
formal design process altogether: You have an idea, a rough plan how you can
make it, and start coding. It starts well, but after some time, it gets tricky:
an important library refrains doing what you expect, some things didn't work out
as you thought, you're forced to hold much more distributed state information
than you can juggle in your head.
The whole thing turns out a bit fragile,
and although it mostly does what you want it to, it's a pain to use. As much as
it's brittle to the user, the code feels brittle to you, probably no one will be
able or willing to continue working on it, you're reluctant to change anything
yourself, because, once you start to weed out the crooks, you wish you had the
strength to start over again.
What went wrong? In the first scenario, the design (likely perfect for the initial requirements) did not live up to the changes that are inevitable in the course of a project. In the second, a reasonable design failed to evolve.
The solutions I discuss here are aimed at the course of the project, to help you avoid situations like this. Once you are stuck with a huge unmaintainable code base, it's much harder to stay on the success track (or get back on it again). At least, even when you feel you're stuck, many of the techniques here can help you not to give up on the way - neither economically nor stress-wise.
AP is a collection of principles and techniques that try to overcome the inflexibility of the strictly-design-based development cycle. Three things make AP very powerful:
Here is what I understand as the core rules:
|
I considers this the very heart of the AP approach - and the one with the biggest potential to change the development process.
Instead of planning ahead for all nooks and crannies, make sure "Version 0.1" works out well. Concentrate on your next task, and pick the most simple design that makes it possible. This does not mean forget about design! Design remains an important part of the entire process, and classic good/bad rules still apply. The additional rule is: make your next step happen, not the 10th. Don't go far out of your way for something you think you need later. When you really need it, new possibilities will have opened, and priorities sure will have changed.
To keep the design evolving with the project, you always need to pay
attention to the code base. With only some primitive techniques and trust in
your instincts, you can get along very well for most of the time - so you're
less burdened when you have to face te real challenges. Exercising the part
you're working on means: over time the "hot spots" of your application get most
attention automatically.
Although individual things, like renaming
variables, might appear silly as itself, the cumulative effect is impressive.
It's wonderful when the feeling of understanding your code base kicks in - don't
miss it!
The initial design will have a great impact on your project as well (although you typically end up more flexible than with a strict design based approach). But don't worry too much: Different designs can support the same product, a simple one will give you something to work with, and refactoring will make sure your design grows with the application.
If the analogy is allowed: Agile Modeling is replacing the intention of a "perfect creation" with an evolutionary process: Although 7 neck vertebras can't be the perfect design for both the mouse and the giraffe, it does it's job very well in both cases.
A good designer/developer can achieve the same with a "less agile" approach. But chances are, a wizard will get very close to the agile approach himself, if you let him do as he pleases. And for us non-wizards, we're all fallible to the stress and strains of development, and forget to follow idolized "good practice" in those dreaded one-nighters.
Incremental changes are the key to happiness, and the core idea of refactoring. However, I want to separate the principle from the techniques, that's why it gets it's own paragraph. To repeat the two rules:
Take the smallest step possible into the direction you want to go. Then compile and do a basic test that it's still working. And always do only one step - don't try sneak in a feature while you refactor - tempting as it may be.
For me, these rules still require some discipline, and a conscious effort. Sometimes it just seems easiest to scrap a class, and write it anew. Yet, when I get interrupted, it's much easier to say "five minutes" - and finish the search&replace at hand; or jot down a quick note what I was doing. When I return to my desk, I can continue without looking back and forth where I left of, without the fear I forget something.
Advantage: You always have working code you can deliver. Don't take this literally and skip QA - but in case of emergency (e.g. a bug at a customer site) you're much more ready to leave your current task in a working condition. In-house testing can get a new version anytime. You are quicker to react to new requirements: No more "I need to finish the Gonkulator rewrite before I can add this graphic feature that everybody suddenly seems to need urgently."
Also, your code passes much more often through the compiler, and a basic "does it work?" test - especially so if you do Automated Unit Testing. This gives a bit more confidence in complex code, and can be a real live saver.
There's one human reason behind this rule: Only a limited amount of state information is present in your mind (the often-mentioned "seven things"). Conscious splitting into steps with the least state information tries to saves you from a "short term memory overflow", which makes you forget things you wanted to do, and feel overwhelmed by the complexity of the code. And there is a Murphy reason: Every step you take will be a little bit more complex, have a few more dependencies and side effects than you expected. E.g. when rewriting two classes into one, a problem with header inclusion order can sidetrack you so far that you just forget to initialize an important variable again.
Besides writing code, many things belong to the development process: Design, Documentation, QA...
The first question should be: Why do I do that? The importance of these artifacts is as well known as a rich number of techniques and methods for them, that all to often claim or at least suggest to be exclusive. But, to take an example: why do you actually document your code? Do you still want to understand your code in 6 month? Should a 3rd party be able to write plugins based on your API? Is it to inform co-workers of changes in the interface or implementation specifics? Is it because you plan to retire to the Bahamas, so the code base needs to be passed on to a still-to-be-hired guy? These are quite different goals, and for each of them, different techniques are appropriate.
In the example, documentation comes in many flavors. UML charts, formal code comments that can be extracted by a parser, inline comments, a separate Word file describing your intentions, Source Control change logs, etc. You are much better off when you understand and use more than one tool. Look out for new tools, and don't forget about unused features of the tools you have.
Advantage: The time spent on non-coding tasks is used more effective, and doesn't feel wasted. Again: It's just to make you happy!
Refactoring is nothing magic, refactoring is a fancy word for cleaning up the code. A more formal definition would be:
|
Refactoring means continuously improving the design and appearance of your code base in small steps confined to surveyable areas. |
All techniques are allowed that:
Refactoring has two major uses: first, to keep an application well designed, to enable the "design as you go" principle. Second, instead of rewriting a larger module or class, you can refactor it into something much better. This requires much more discipline (controlling one's enthusiasm to make it better) than a rewrite, but is often the less risky yet more rewarding route.
I split the discussion in two parts - a formal list of basic techniques, and a real life example that contains suggestions for less automatable ones.
Notice a theme in c-e: We introduce a base class only when it seems necessary. Early design decisions are often intentionally immutable: because the design guru said it so, because it was the result of heated discussions, etc. In this course, the technical reason for a decision often gets lost, and with it: simplicity.
All these steps will take around 5..10 minutes - usually including "compile and test". You can take them anytime: when you're bored, while you're waiting for another project to compile, when you don't want to leave shortly before your boss. Whatever. Even taking one step will make your code base a little bit better, and you will have working code. They are easily undone (assuming you know to use your tools: editor, and source control)
While the decision what to do requires that you understand the structure of the code you're working on, executing it does not: they are simple search-and-replace or copy-and-paste tasks, and under VS.NET there are nifty tools available that can automate them safely.
Also, AP does not tell you how to design, only when. You still need to know what makes a good design, and find it yourself.
When I plan to refactor a complex class or module, I start with the things mentioned above. This has two purposes: First, the code gets easier to read, more compact, and unnecessary arabesques are removed in these steps, so I have a much easier time later on. Second, I get fairly accustomed to the class again, refreshing my memory. I find out which members are hot spots, discover old comments telling me what I wanted to do, etc.
Only when I'm through with the basics, I begin the actual changes. Again, I try to take the smallest step that takes me closer to my goal and keeps the code working. Here you need to be more creative - the techniques are not that straightforward anymore, and you need to plan ahead. That's why I'll take a real example, to illustrate some possibilities.
Recently, I refactored a class implementation that simulated a map<int, struct> by two arrays: a data array holding the values, and a key array, holding the key for each value at the same index as the data array. To speed things up, I tried to store the values at their "native" position: e.g. the value for key 17 I would first try to insert into index 17. To look up a value, I checked the "native position", then I had to search the key array for the index where the key was stored, then retrieve the value from the same index in the data array. The whole thing looked like this:
if (keyArray.size() > key && keyArray[key] == key) // look up "native" position return dataArray[key]; else { int index = FindKeyInKeyArray(key); // linear search! (ugh) if (index >= 0) return dataArray[index]; }
(This atrocity to common sense grew from a quick side hack into a generic datakeeper class. I'm really ashamed of this - well, no more)
The first step were to rename the arrays (originally named data and map) to the ones above, so I wouldn't get a name clash later on - neither in the code, nor in my mind.
Ultimately, I would have to remove the keyArray index
lookups completely, and replace the dataArray lookups. So I did a
"Find in Files" for "keyArray[" and "dataArray[", just to see how often they
were used. I was shocked - over 20 times each. I needed to break this down a bit
further, before I "injected" the map<>.
So I moved some rarely used extra functionality that affected most functions to a derived class - due to the prior usage this wouldn't break any client code. While this moved no "hot spots" out of the class, the complexity of the hot spots itself was greatly reduced. Compile and run - still working. (Later I found I introduced a bug in this step, that even escaped my quickly written unit test. But due to the new cleaner code structure, it was found quickly).
The remaining lookup complexity, especially when inserting/changing values, was dominated by the "native position" handling - it probably didn't help much, and made everything ugly. I decided to remove this altogether. While the code would still work, performance might take a hit - this was a small risk I had to take. The worst thing that could happen would be rolling back to before this step (so I made a check-in at this point).
After removing the extra lookup, most of the hot spot functions did something similar to this:
int index = MapID(key); // lookup the key if (index >= 0) { // when found... // do something to dataArray[index] } else { // when not found... // do something else }
I figured, to replace this with a map, it wouldn't be a simple m_map[key] -
the dataArray[index] was often used multiple times but I wanted the map lookup
to happen only once, and I didn't need the operator[]'s feature to insert a new
element silently. So I wrote a helper function, that contained all the
functionality that I intended to change:
ValueType * GetValPtr(int key) { int index = MapID(key); // lookup the key if (index >= 0) { // when found... return dataArray[index]; else return NULL; }
And started replacing the lookups by
ValueType * pVal = GetValPtr(key); if (pVal) { // when found... // do something to *pVal } else { // when not found... // do something else }
Again, very simple replacements, especially since I had made sure before
local variable and parameter names are consistent. I renamed dataArray
and MapID() in the class declaration and the GetValPtr
implementation, so the compiler caught all occurrences where I was still
relying on them. I picked "pVal" as name for the new local variable, since this
name was used nowhere in the class.
After this step, I had a sleek implementation of a horrible idea. Quite an
improvement.
Everything worked fine, so I took the last step: introducing an
std::map<int, ValueType> member into the class, commenting
out the the dataArray and keyArray declaration, and
replacing the GetValPtr implementation with a std::map.find
call:
ValueType * GetValPtr(int key) { std::map<int, ValueType>::iterator it = m_map.find(key); if (it == m_map.end()) return NULL; else return &(it->second); }
Of course, replacing the two arrays with a map had some other side effects,
temporarily breaking the storage functions (that needed to iterate over all
values), and turning the array allocation/cleanup functions into syntax errors.
This was a single big step, I had no ideas how to break this down further (and
maybe started to get a little bit impatient). But due to all the preparation, it
took no more than 40 minutes to do the change, replace the keyArray
iteration with an map iterator, and get the code compile and run
again. The thing is working fine now, I felt very happy, and I sleep much
better.
While scrubbing the code, I marked commented-out sequences with a special comment tag, so I could search for these places. Thus, removing all the dead code (that I left in initially for reference and rollback), was a matter of a minute or two.
Of course, a few things still could be done. There's still a naming
inconsistency in the "insert new item" implementation, and the GetValPtr
function could be removed altogether, replacing the ValueType *
with anmap::iterator. But the task at hand was done, and a
new task was waiting for the next day, so I left it at that.
A short overview of the things I used:
OK, since you didn't fall asleep yet, you'll probably pondering one question: How do you convince your boss that renaming variables is worth your pay?
AP is not the holy grail either. There are some requirements that must be met to make it work:
*) One strain of Agile Techniques (Xtreme programming, the one that gets all the press) strongly emphasizes knowledge propagation. Definitely worth looking at when you have a diversely skilled team.
**) In my (still small) experience, data-centric designs with well separated layers tend to go along best with Agile Techniques - but that might be due to the fact that I personally prefer them.
Although there are different explanations, the one that feels most natural to me is this one: Refactoring stems from the mathematicians "desire" to reorganize a term like
F = xyz + 2xy -7xz + 3yz - 14x + 6y - 21z - 42
into it's factors:
F = (x+3)*(y-7)*(z+2)
While both are absolutely identical, the second one exposes it's inner structure and important information on one look. Also, there are parallels between the processes.
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 3 Jan 2003 Editor: Nishant Sivakumar |
Copyright 2003 by peterchen Everything else Copyright © CodeProject, 1999-2010 Web21 | Advertise on the Code Project |