|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionI want to provide thorough information for the everyday coder - without the "I want to sell you something so I have to look extra smart" obfuscation layer. These are my personal views, acquired by analyzing my own development "challenges", browsing the web, discussing it at CP and elsewhere, and trying it myself. It won't be a "brief introduction", so here's an overview in case you want to skip something:
Where do we come fromWe all know this: Your project has a neat design you're really proud of. You did care for all eventualities that came up in the early design studies, the schedule is approved, there's even an extra week "padding for the unexpected" - and you are happy you can finally start coding. Two weeks into it, the first change requests arrive. Nothing special, just the usual "can we do this, too?" - "Yes, no big problem, we just need to plug an Carbunkulator into the Arglebargle". Halfway through it, things look less shiny. A few more functionality tweaks, a few bugs, your best coder one week in the hospital - the schedule lags behind big time. Your boss returns from a talk with a client, after they played around with the first beta. It turns out they never really needed an arglebarge, it is just in the spec because their old system had a big one that was very expensive. What they really need is a big gonkulator, and it must be fast - much faster than now. Oh, and the one feature that gave you headaches while designing - you can scrap that: the only one guy who insisted on this feature (although no one understood why) moved on to greener pastures. Whatever the reasons - the application ends up different from what it was envisioned. Chances are, it's a mess of crooks and shortcuts across a baroque, utterly inefficient infrastructure. You might even get afraid of touching it - 'cause a little change here breaks something there. Every time you try to fix some nasty behavior, you have to wad through tons of interdependent code, and every function, every class you see screams "rewrite me". Far from what you wanted. Interestingly, you can arrive at the same place by leaving out the
formal design process altogether: You have an idea, a rough plan how you can
make it, and start coding. It starts well, but after some time, it gets tricky:
an important library refrains doing what you expect, some things didn't work out
as you thought, you're forced to hold much more distributed state information
than you can juggle in your head. What went wrong? In the first scenario, the design (likely perfect for the initial requirements) did not live up to the changes that are inevitable in the course of a project. In the second, a reasonable design failed to evolve. The solutions I discuss here are aimed at the course of the project, to help you avoid situations like this. Once you are stuck with a huge unmaintainable code base, it's much harder to stay on the success track (or get back on it again). At least, even when you feel you're stuck, many of the techniques here can help you not to give up on the way - neither economically nor stress-wise. Agile ProgrammingAP is a collection of principles and techniques that try to overcome the inflexibility of the strictly-design-based development cycle. Three things make AP very powerful:
Here is what I understand as the core rules:
Simple Design and Design as you goI considers this the very heart of the AP approach - and the one with the biggest potential to change the development process. Instead of planning ahead for all nooks and crannies, make sure "Version 0.1" works out well. Concentrate on your next task, and pick the most simple design that makes it possible. This does not mean forget about design! Design remains an important part of the entire process, and classic good/bad rules still apply. The additional rule is: make your next step happen, not the 10th. Don't go far out of your way for something you think you need later. When you really need it, new possibilities will have opened, and priorities sure will have changed. To keep the design evolving with the project, you always need to pay
attention to the code base. With only some primitive techniques and trust in
your instincts, you can get along very well for most of the time - so you're
less burdened when you have to face te real challenges. Exercising the part
you're working on means: over time the "hot spots" of your application get most
attention automatically. The initial design will have a great impact on your project as well (although you typically end up more flexible than with a strict design based approach). But don't worry too much: Different designs can support the same product, a simple one will give you something to work with, and refactoring will make sure your design grows with the application. If the analogy is allowed: Agile Modeling is replacing the intention of a "perfect creation" with an evolutionary process: Although 7 neck vertebras can't be the perfect design for both the mouse and the giraffe, it does it's job very well in both cases. Advantages
A good designer/developer can achieve the same with a "less agile" approach. But chances are, a wizard will get very close to the agile approach himself, if you let him do as he pleases. And for us non-wizards, we're all fallible to the stress and strains of development, and forget to follow idolized "good practice" in those dreaded one-nighters. Incremental, independent ChangesIncremental changes are the key to happiness, and the core idea of refactoring. However, I want to separate the principle from the techniques, that's why it gets it's own paragraph. To repeat the two rules: Take the smallest step possible into the direction you want to go. Then compile and do a basic test that it's still working. And always do only one step - don't try sneak in a feature while you refactor - tempting as it may be. For me, these rules still require some discipline, and a conscious effort. Sometimes it just seems easiest to scrap a class, and write it anew. Yet, when I get interrupted, it's much easier to say "five minutes" - and finish the search&replace at hand; or jot down a quick note what I was doing. When I return to my desk, I can continue without looking back and forth where I left of, without the fear I forget something. Advantage: You always have working code you can deliver. Don't take this literally and skip QA - but in case of emergency (e.g. a bug at a customer site) you're much more ready to leave your current task in a working condition. In-house testing can get a new version anytime. You are quicker to react to new requirements: No more "I need to finish the Gonkulator rewrite before I can add this graphic feature that everybody suddenly seems to need urgently." Also, your code passes much more often through the compiler, and a basic "does it work?" test - especially so if you do Automated Unit Testing. This gives a bit more confidence in complex code, and can be a real live saver. There's one human reason behind this rule: Only a limited amount of state information is present in your mind (the often-mentioned "seven things"). Conscious splitting into steps with the least state information tries to saves you from a "short term memory overflow", which makes you forget things you wanted to do, and feel overwhelmed by the complexity of the code. And there is a Murphy reason: Every step you take will be a little bit more complex, have a few more dependencies and side effects than you expected. E.g. when rewriting two classes into one, a problem with header inclusion order can sidetrack you so far that you just forget to initialize an important variable again. Know your Tools, and know your reasonsBesides writing code, many things belong to the development process: Design, Documentation, QA... The first question should be: Why do I do that? The importance of these artifacts is as well known as a rich number of techniques and methods for them, that all to often claim or at least suggest to be exclusive. But, to take an example: why do you actually document your code? Do you still want to understand your code in 6 month? Should a 3rd party be able to write plugins based on your API? Is it to inform co-workers of changes in the interface or implementation specifics? Is it because you plan to retire to the Bahamas, so the code base needs to be passed on to a still-to-be-hired guy? These are quite different goals, and for each of them, different techniques are appropriate. In the example, documentation comes in many flavors. UML charts, formal code comments that can be extracted by a parser, inline comments, a separate Word file describing your intentions, Source Control change logs, etc. You are much better off when you understand and use more than one tool. Look out for new tools, and don't forget about unused features of the tools you have. Advantage: The time spent on non-coding tasks is used more effective, and doesn't feel wasted. Again: It's just to make you happy! Refactoring TechniquesRefactoring is nothing magic, refactoring is a fancy word for cleaning up the code. A more formal definition would be:
All techniques are allowed that:
Refactoring has two major uses: first, to keep an application well designed, to enable the "design as you go" principle. Second, instead of rewriting a larger module or class, you can refactor it into something much better. This requires much more discipline (controlling one's enthusiasm to make it better) than a rewrite, but is often the less risky yet more rewarding route. I split the discussion in two parts - a formal list of basic techniques, and a real life example that contains suggestions for less automatable ones. Basic Refactoring Techniques:
Notice a theme in c-e: We introduce a base class only when it seems necessary. Early design decisions are often intentionally immutable: because the design guru said it so, because it was the result of heated discussions, etc. In this course, the technical reason for a decision often gets lost, and with it: simplicity. All these steps will take around 5..10 minutes - usually including "compile and test". You can take them anytime: when you're bored, while you're waiting for another project to compile, when you don't want to leave shortly before your boss. Whatever. Even taking one step will make your code base a little bit better, and you will have working code. They are easily undone (assuming you know to use your tools: editor, and source control) While the decision what to do requires that you understand the structure of the code you're working on, executing it does not: they are simple search-and-replace or copy-and-paste tasks, and under VS.NET there are nifty tools available that can automate them safely. Also, AP does not tell you how to design, only when. You still need to know what makes a good design, and find it yourself. Other Refactoring techniques - A real life exampleWhen I plan to refactor a complex class or module, I start with the things mentioned above. This has two purposes: First, the code gets easier to read, more compact, and unnecessary arabesques are removed in these steps, so I have a much easier time later on. Second, I get fairly accustomed to the class again, refreshing my memory. I find out which members are hot spots, discover old comments telling me what I wanted to do, etc. Only when I'm through with the basics, I begin the actual changes. Again, I try to take the smallest step that takes me closer to my goal and keeps the code working. Here you need to be more creative - the techniques are not that straightforward anymore, and you need to plan ahead. That's why I'll take a real example, to illustrate some possibilities. Recently, I refactored a class implementation that simulated a map<int, struct> by two arrays: a data array holding the values, and a key array, holding the key for each value at the same index as the data array. To speed things up, I tried to store the values at their "native" position: e.g. the value for key 17 I would first try to insert into index 17. To look up a value, I checked the "native position", then I had to search the key array for the index where the key was stored, then retrieve the value from the same index in the data array. The whole thing looked like this: if (keyArray.size() > key && keyArray[key] == key) // look up "native" position return dataArray[key]; else { int index = FindKeyInKeyArray(key); // linear search! (ugh) if (index >= 0) return dataArray[index]; } (This atrocity to common sense grew from a quick side hack into a generic datakeeper class. I'm really ashamed of this - well, no more) The first step were to rename the arrays (originally named data and map) to the ones above, so I wouldn't get a name clash later on - neither in the code, nor in my mind. Ultimately, I would have to remove the So I moved some rarely used extra functionality that affected most functions to a derived class - due to the prior usage this wouldn't break any client code. While this moved no "hot spots" out of the class, the complexity of the hot spots itself was greatly reduced. Compile and run - still working. (Later I found I introduced a bug in this step, that even escaped my quickly written unit test. But due to the new cleaner code structure, it was found quickly). The remaining lookup complexity, especially when inserting/changing values, was dominated by the "native position" handling - it probably didn't help much, and made everything ugly. I decided to remove this altogether. While the code would still work, performance might take a hit - this was a small risk I had to take. The worst thing that could happen would be rolling back to before this step (so I made a check-in at this point). After removing the extra lookup, most of the hot spot functions did something similar to this: int index = MapID(key); // lookup the key if (index >= 0) { // when found... // do something to dataArray[index] } else { // when not found... // do something else } I figured, to replace this with a map, it wouldn't be a simple ValueType * GetValPtr(int key) { int index = MapID(key); // lookup the key if (index >= 0) { // when found... return dataArray[index]; else return NULL; } And started replacing the lookups by ValueType * pVal = GetValPtr(key); if (pVal) { // when found... // do something to *pVal } else { // when not found... // do something else } Again, very simple replacements, especially since I had made sure before
local variable and parameter names are consistent. I renamed After this step, I had a sleek implementation of a horrible idea. Quite an
improvement. ValueType * GetValPtr(int key) { std::map<int, ValueType>::iterator it = m_map.find(key); if (it == m_map.end()) return NULL; else return &(it->second); } Of course, replacing the two arrays with a map had some other side effects,
temporarily breaking the storage functions (that needed to iterate over all
values), and turning the array allocation/cleanup functions into syntax errors.
This was a single big step, I had no ideas how to break this down further (and
maybe started to get a little bit impatient). But due to all the preparation, it
took no more than 40 minutes to do the change, replace the While scrubbing the code, I marked commented-out sequences with a special comment tag, so I could search for these places. Thus, removing all the dead code (that I left in initially for reference and rollback), was a matter of a minute or two. Of course, a few things still could be done. There's still a naming
inconsistency in the "insert new item" implementation, and the Refactoring techniques used in the exampleA short overview of the things I used:
Selling to your bossOK, since you didn't fall asleep yet, you'll probably pondering one question: How do you convince your boss that renaming variables is worth your pay?
Limits of the Agile ProcessAP is not the holy grail either. There are some requirements that must be met to make it work:
*) One strain of Agile Techniques (Xtreme programming, the one that gets all the press) strongly emphasizes knowledge propagation. Definitely worth looking at when you have a diversely skilled team. **) In my (still small) experience, data-centric designs with well separated layers tend to go along best with Agile Techniques - but that might be due to the fact that I personally prefer them. AppendixLinks
Why is Refactoring called Refactoring?Although there are different explanations, the one that feels most natural to me is this one: Refactoring stems from the mathematicians "desire" to reorganize a term like
into it's factors:
While both are absolutely identical, the second one exposes it's inner structure and important information on one look. Also, there are parallels between the processes. | ||||||||||||||||||||||||||||||