Click here to Skip to main content
Click here to Skip to main content

You duplicating? You wrong…

By , 29 Mar 2010
 

This is not the best example for the issue, but here we go: imagine you are going to sweep your house, and that you have one of those modern brooms that can be disassembled to fit into small broom cupboards. When you start, let’s say in the kitchen, you take the broom from the cupboard, assemble it, and start sweeping. When you are done with the kitchen, you don't disassemble it and place it into the cupboard, because you have other rooms left, do you? You would feel stupid assembling the broom for each room, wouldn't you?

Then why do you do that when you write software?

Introduction

This post is about some very basic thoughts, about engineering behaviors students tend to forget… It might sound obvious for experienced people, so if you are one of them, better go get a beer with friends instead of reading it. So, back to what students forget. Among others, one of the most important things to keep in mind, one that will always guide you through the right path is:

DUPLICATION IS THE SOURCE OF ALL EVIL IN THE WORLD

And this applies not only to software development, but also to content management, organization and engineering in general. Damn! It even applies when you tiny up a wardrobe or clean the kitchen, as we saw in the preface.

Why is this misbehavior so usual in software?

Easy. Assembling a broom and taking it to the cupboard takes time, and therefore it is trivial for any dumb-ass to see that’s a waste of time. In software, many times it takes less time duplicating things than thinking how to avoid the duplication. It’s just a matter of Copy-Paste, when avoiding duplication can take weeks of work. That’s why duplication appears all around in bad software.

The student counter part

At this point, it’s when students tend to say:

- <<Weeks! then it is indeed better to duplicate code than losing weeks>>

And then you play the role of the old, wise grandpa, telling them:

- <<Believe me, it’s better this way. You will thank it, when you grow older…>>

Example 1: Duplication of Work - Duplication of Decisions

Chronicles of a bad developer

  • Company “A”
  • Project “B”
  • You are the “Bad developer #1”
  • A simple operation: obtain the concatenated name of a user, composed as: “Name” + “Second Name” + “Last Name”

It’s just one line of code, so you come to the conclusion that it’s not worthy to think much about it. So, you leave it like that and copy-paste the line when you need it.

Some months later, project “B” has grown, becoming “big”, with let’s say 150 sub-projects. Your famous name composition operation is scattered all around, in more than 1500 places. Now imagine that the worst happens (the worst always happens), and that your boss demands a change in the way you are concatenating names. Let’s say, he wants it in the form: “Last Name, First Name”, instead of the previous. You are screwed, easy as that.

As you are a bad bad developer, first thing you say is:

- <<Boss, are you sure we need that change? I like names in the way they are right now>>

- <<Our clients don't>> He replies…

No prob, you are a bad, but clever developer, so you think:

- <<Visual Studio will fix this, I heard somewhere that there’s a thing for these cases called Refactoring>>

Then you try to Refactor a whole sentence, and of course, it doesn’t work.

- <<Damn Visual Studio! Damn Microsoft! For things like these is why I prefer Linux! Sure that Linux knows how to Refactor a whole sentence>>

And then, you end up making massive text replacements, investing one week trying to find all the instances of the operation, and in fixing the resulting compilation errors. Of course, you forget to change 3 or 4 instances of it. That’s when you go to your boss again and say:

- <<Hey, not that bad. 3 out of 1500, that’s a 99% of success!>> 

- <<You are a 99% of an idiot>>… He replies…

Diary of the good developer

<<Monday, 36 of October of year 2124

Today, I slept well. I had breakfast and went to the office. My job for today was changing the way names are concatenated in our software. As I was clever enough a couple of months ago, and I isolated this task inside a method, all my job for today has been done in 10 secs. The rest of the day I was in the beach, getting some sun… No phone calls from my boss. No phone calls from clients… No news is good news. Life is wonderful…>>

You learned the lecture?

What lecture?

Pay attention Padawan to the following line of code:

FullName = Name + SecondName + LastName;

It doesn’t only make a name composition. It also decides how this concatenation will be done, and this is the key point. That’s what you need to understand and learn. Because duplicating work is bad, but duplicating decisions is one of the worst things you can ever do in software. I'll put that in big so you remember it:

Remember: Duplicating work is bad, but duplicating decisions is one of the worst things you can ever do in software.

Example 2: Duplication of Contents

As we mentioned, duplication is not only a software development issue.

Copy-pasting files is slower than copy-pasting code, but for sure it’s still faster than assembling and disassembling the broom ;)

That’s why many dumb-asses still don't get it, making duplication of contents to be pretty frequent too. It’s the same stuff. You make a texture, you use it in hundreds of 3D models, and then one day you need to change it. You are screwed again man!

How to avoid duplication in this cases?

You should think about having global repositories for contents that are shared between many entities: texts, textures, sounds, XML files, whatever. Remember, duplication is the source of all evil in the world. If you find yourself doing a Copy-Paste, ask yourself if it is really necessary or if there’s a way to avoid it.

Example 3: Other, Not So Obvious, Forms of Duplications

Imagine we have a game with the class Player, and with a property which holds his status. You know, things like “Dead”, “Alive”, “Running”, etc. At design time, we need to choose how to store that status:

  • The core C++ kings that want his player to be dead and alive at the same time (may be a zombies game?), will want his status to be an Hexadecimal Flag
  • Those who want their game to be fast and “cross-platform with their toaster”, will want his status to be an Int, you know: 0, 1, 2, etc
  • A Visual Basic developer will probably want it to be a String, actually writing the value “Dead”, “Alive”, etc.

Anyone is right? Nope (unless special circumstances). Let’s debate...

Speed vs Comprehensiveness

- << Who said to use strings? Oh yes, the VisualBasic people… Come on man, do you know how slow it is to compare two strings? Player is a core class, that is used all along the game, even in performance critical parts. You cannot be serious… strings… The solution is to use an Int, or an Hex flag.>>

- <<Who said to use an Int or an Hex Flag? Of course! The native people. Why not we all just turn back into binary, you prehistoric monkeys? Will we need to check what “State = 0” is every time we need to use it?  Man you cannot be serious… ints… The solution is to use strings…>>

As you probably already notice, one can stay like that forever, when there’s a much better solution… Yes, you clever boy, we will use Enumerations. But, do you know why? And what is more important, are you aware of all the benefits (and cons) of an Enumeration? We will talk more on this later, by now let's get deeper in the cons of the other alternatives.

Anyone said duplication again?

We have seen duplication of work (bad), and duplication of decisions (very bad). What none of the contenders in the previous discussion have seen is that both solutions imply a new, hidden form of duplication. It’s what I call: duplication of abilities.

Duplication of Abilities (also very bad)

When you choose to store the status of your player as an Int, or as a String, with values like “Dead” or “Alive”, you are delegating in all your developers the responsibility of writing those values correctly. In other words, you are duplicating the need for the ability to write it correctly in many many different places, and by many different people, instead of centralizing in a single point.

It’s quite likely that, from time to time, someone makes a mistake. So if any of them misspells a string value (“Deaz” instead of “Dead”), or simply misplaces an int, writing “1” instead of “0”, the compilator won't say a thing, and the application will seem to run just fine. But... What are the effects of setting the status of a player incorrectly in a single, deep point of your code? Absolutely unpredictable. That can manifest as any kind of error in the application, what makes it very difficult to debug, trace and fix.

You must try to avoid this kind of duplication too, centralizing decisions, and reducing the number of places where a mistake can be made. 

Strong-type, baby

We said it before, but let’s just analyze another alternative for storing the “State” of the player in our game. What about Enums?

  • Enums are as fast at runtime as an Int or an hex flag
  • Enums are as comprehensible as Strings at debug time
  • Enums are convertible to and form Int, that makes cross-platform easy
  • Enums are convertible to and from String, that makes printing information easy
  • Enums are much more comfortable to use, thanks to the Intellisense, you don't need to remember the values, nor type them, just select them on the list
  • Enums are strongly typed, which will:
    • Minimize the so-called Duplication of Abilities. Since you select values on a list (thanks to the Intellisense) instead of typing them, you centralize the responsibility of making no mistakes in the enumeration
    • Even if any one ignores the Intellisense, and types the values manually, the presence of “hard to find” errors is also minimized, as a miswritten Enum value will be detected at compilation

So, YES! Enums ROCK! And thanks to the static helper methods found in the .NET Framework (Enum.Parse, Enum.GetNames, Enum.GetUnderlyingType, Enum.GetValues, Enum.IsDefined, etc.), they are very versatile too.

So… use them!!!

An Additional Security Shield

Even if you are already strong-typing, there’s another thing you should always do, specially in big projects that cannot be tested completely too frequently… exactly, that’s correct: UNIT TESTING.

Again, designing good unit tests can take weeks, and again, students will say: <<Weeks! The it’s better not to use unit tests>>, and again you will have to tell no no no…

Unit test are the definitive way for giving big projects the robustness they need, going into places where the compiler can’t go, and testing deeper just that syntax. A unit test can detect a wrong player State, but of course, it has to be well designed. If you want to know more about unit testing, please refer to the following article by Roy Osherove:

Write Maintainable Unit Tests That Will Save You Time And Tears

Conclusion

Every time you find yourself doing a Copy-Paste, ask yourself:

  • Why are you doing it
  • Is it really necessary to duplicate what you are duplicating
  • Can it come to problems if something changes in the future
  • Is there any way of avoiding that duplication

Try to find implicit or hidden duplications that can be problematic in the future. Try to reduce the number of places where mistakes can be made. Try to reduce the number of people that can make mistakes (and I mean something deeper than firing dumb-asses!). Strong-type and Unit-Test when possible!

Finally, apply some common sense to all of this. Sometimes, you should do just the opposite to what I'm telling… ja ja… ;)

Take care!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Inaki Ayucar
Software Developer (Senior)
Spain Spain
Member
Inaki Ayucar is a Microsoft MVP in DirectX/XNA, and a software engineer involved in development since his first Spectrum 48k, in the year 1987. He is the founder and chief developer of The Simax Project (www.simaxvirt.com) and is very interested in DirectX/XNA, physics, game development, simulation, C++ and C#.
 
His blog is: http://graphicdna.blogspot.com
 
To contact Inaki: iayucar@simax.es

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberegenis22 Jul '12 - 18:29 
Nice article!
GeneralMy vote of 5membernikhil _singh8 Jul '12 - 19:38 
Nice One.I think every Developer should Read this and understand Evil effects of COPY-PASTE
QuestionMy vote of 4memberzenwalker198519 Oct '11 - 18:28 
Not that the content is bad, but being experienced guy i knew these info pretty much. And as you said yes upon growing old things have strongly settled in my brains.
GeneralWell Crafted - I LOVE It!memberKentuckyEnglishman1 Apr '10 - 2:11 
As an educator and senior software engineer, I can definitely relate and say - Spot on! I always try to guide and encourage my students and co-ops to think for themselves - which sadly is becoming harder to do as each year passes by. How many times I've come across complacency in people to who choose to rehash (if even that much) what they easily find, copy and paste from someone elses thinking and work - without evaluating a) it's validity, b) it's efficiency or applicability, and C) whether it fits the problem or not in the "right way".
 
"But what about all that 'code-reuse' everyone hypes about?" Code reuse is still good - when used in an appropriate context, provided the original serves the purpose in a well-thought, meaningful way. Otherwise, it only expands and accumulates unnecessary clutter of the original problems that were introduced with that code (or idea) from the beginning. Not a good thing! (At least, that's my personal opinion...)
 
You got my 5 for this - and if nothing else, it will also make an excellent tool for me to pass along to the runts who refuse to want to think for themselves any. Smile | :)
Ky Englishman
"Be Nobody But Yourself - A Let The World Learn How Wonderful YOU Are!"

GeneralSmall updatememberInaki Ayucar29 Mar '10 - 21:51 
Thanks to everyone for your kind comments.
 
I made a small update. Thanks to M. Barnett who pointed out that "mounting a broom" was more like "riding a broom", like in Halloween, than assembling it... jeje... Smile | :) It seems that was late when I wrote the post.
 
Cheers!
Inaki Ayucar
http://graphicdna.blogspot.com

Generala 5 from mememberu-li29 Mar '10 - 21:45 
you got my 5 and my respect for the very good presentation
 
what shares the craftsman from the hobby-DIY.
 
Success, Uli
Uli Merkel
www.uli-merkel.de

GeneralDetermining what's really "the same"membersupercat929 Mar '10 - 5:48 
Things should often be duplicated, rather than used by reference, if they may in some cases be the same but in other cases not. For example, one it may be a good thing to have multiple identically-behaving functions which take different parts of a person's name and concatenate them, if the concatenated names will be used for different purposes. Otherwise, if it is necessary to e.g. change the way people's names appear on some reports but the way they appear on the screen, it may be difficult to change one without inadvertently changing the other. To use your texture-mapping game example, if one sometimes a metallic texture for wall surfaces inside an enemy base, and uses a pixel-identical texture for the part of the "skin" of the robots that inhabit a different base, making them the "same" texture would mean that changing the skin of the robots may also change some of the walls in the enemy base. While it may be useful to have the world-building "compiler" detect things that were coincidentally a perfect match (just including one copy) one should keep the files distinct in any editable version of the map.
GeneralRe: Determining what's really "the same"memberInaki Ayucar29 Mar '10 - 22:27 
You see? There´s always someone that still thinks that duplicating is Ok... je je... Thanks a lot for your comment, but I´m sorry, I don´t agree. In general, you should ALWAYS try to avoid duplication (except int very very rare situations).
 
In the example you comment, you should realize (as you did) that you have two different conceptual operations: let´s say ConcatenateUserNameForScreen and ConcatenateUserNameForReports. As we mentioned, they are different, because they have different purposes (you expect in the future they might do different things). So they are not dupplicated, and they both should exist, no matter if today they do the very same thing. That´s good. That´s planification and anticipation to the needs of the future.
 
However, if today they do exactly the same, then you have duplication inside each, and you should avoid it. How? both functions should call a third one that actually makes the name concatenation, basing on other parameters. To avoid making the mistakes you mention, you should also properly identify what each operation exactly does and how, and name it accordingly, to reflect as precisely as possible what is making (and how). So, each one of the previous functions should call a generic, private concatenation func. called, for instance, ConcatenateUserNameByJoiningParts().
 
I´m afraid you are mixing things up a little bit. What we mean by "duplication" is a full duplication: same behavior, same decisions, same purpose, same usage. So, if you have two identical textures, but with different purposes and different usages (one for the robot and one for the base), then they are not duplicated, and both should exist. However, even in that case, a purist would tell you that you should do something to handle that. May be storing a third, base or reference texture in your working repository. Because, what happens just in the opposite case (hundreds of those identical textures, which have to be updated, on by one)?
 
One way or the other, as I mentioned in the post, this is an issue where you should apply some common sense. However, always always keep duplication in mind. It´s the way of the Jedi... Wink | ;)
 
Thanks a lot for your comment.
Inaki Ayucar
http://graphicdna.blogspot.com

GeneralRe: Determining what's really "the same"memberRozis30 Mar '10 - 3:30 
To my opion Supercat9 is right. He adresses the fact that altough the code is the same it is not a duplicate. Because the usage of the code is different those code is not 'isomorph'. There's a false sense of duplication.
 
Actually the essence of our work is to find out which parts of a system are isomorph and which not. As an experienced programmer Supercat9 knows that there might be behavioural aspects that qualify some code as not duplicated.
 
Your comment on that is too easy. So let's stretch your reasoning to the absurd: Should MS disable the possibility to copy files and change it to links? We all know that copying files may potentially lead to all the things mentioned in your article. So let's get rid of it! Or maybe there is a reason why copying a file is permitted. Maybe a copy is not always a duplication.
 
The process of avoiding duplication is actually using the concept of abstraction. What unexperienced people don't know is that there's a limit to abstraction. There's a point where an abstract model looses it connection with the concrete situations it should solve. Problem is that the skill of using abstraction in a right way is not so easy to gain, and to my opion some will never learn. There are 2 types of unexperienced people. The stupid ones that copy and paste by default and the smart ones that abstract everything they can. Both are wrong if they don't assiociate their work with the problemarea there are in.
 
Of course in a general sense i do agree with you. But it takes experience to know the prize of copy and paste and the prize of overdoing avoiding duplication. The only way i see is put them in a position where they will experience all the consequenses of their copy/paste or abstraction. Or teach them philosophy...
GeneralRe: Determining what's really "the same"memberInaki Ayucar30 Mar '10 - 3:53 
Yes, I agree with you.
 
However, this post was written for unexperienced people, as it´s stated in the beginning. The comment "someone always think duplication is Ok" was just for fun. Just kidding, no ofense... Wink | ;)
 
But I agree with you. Of course, I also think that abstraction has a limit. That´s why I said to apply some common sense to avoid taking things to the absurd. If I didn´t agree with you, no common sense would be necessary, and I'd have said "always avoid any form of duplication, regardless the efforts and by any means necessary".
 
The point I wanted to transmit in this article is that, unexperienced people tend to judge when to avoid duplication or not depending on the complexity of the code in study. And unexperienced people can´t usually see that a line of code can be more than that, and that duplicating it can put you into trouble in the future.
 
Of course, if someone is sure to have enough experience, he can decide when to avoid duplication or not. But for me it´s important to transmit the idea that, if you are not experienced enough and you have to choose, you will win more often avoiding duplication than doing the opposite.
 
Thanks a lot for your comment.
Inaki Ayucar
http://graphicdna.blogspot.com

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 30 Mar 2010
Article Copyright 2010 by Inaki Ayucar
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid