Click here to Skip to main content
13,148,418 members (54,936 online)
Click here to Skip to main content
Add your own
alternative version

Stats

19.8K views
57 bookmarked
Posted 14 Dec 2014

Modernizing Legacy C++ Code

, 14 Dec 2014
Rate this:
Please Sign up or sign in to vote.
Experiences and recommendations from modernizing legacy C++ code using C++11/14

This article presents experiences and recommendations from modernizing legacy C++ code using C++11/14.

I've been working on modernizing several C++ projects of various sizes, some smaller, some larger, some started 20 years ago with a code base that had about 4MLOC, more than 90% of it in C++ (the other in C#). All these projects were built with MFC and some with ATL, used MFC containers (some exclusively) and programming styles from the 90s.

The purpose of refactoring these projects were to make them:

  • simpler: the simpler the code gets the easier to read and understand it which boosts maintainability
  • more maintainable: use of MFC containers and especially CPtrArray creates big issues with inspecting objects during debugging. An important part of refactoring the code was to get rid of pointers to void and unfriendly containers and make it much easier to debug.
  • more robust: replacing raw pointers and dynamically allocated memory with smart pointers helps avoiding memory leaks; similarly using the RAII idiom helps avoiding resource leaks. Also by giving up on pointers and using references or values the code is less prone to problems such as access violation due to use of null pointers.
  • more performant: the performance gain was not the primary driver, but any gain in performance is welcomed.

Standard C++ Containers

One of the primary problems with the legacy code was its heavy use of C-like arrays and MFC containers. When a container was needed CArray (or one of the non-template variations such as CDWordArray or CStringArray) was the primary choice. Sometimes CList or CMap were used. But the most heavily used container was CPtrArray. CPtrArray stores pointers to void. The most annoying consequence is that during debugging you cannot inspect it’s content. The only way to do that is per element: find the value of the pointer stored at some index, and then explicitly add it in the watch window and cast it to the appropriate pointer type.

Here is an example. Let’s consider the following code:

struct foo
{
   int      a;
   double   b;
   CString  c;

   foo(int a, double b, CString const & c):
      a(a), b(b), c(c)
   {}
};

CPtrArray arr;

arr.Add(new foo(1, 1.0, L"one"));
arr.Add(new foo(2, 2.0, L"two"));
arr.Add(new foo(3, 3.0, L"three"));

When you watch this in the debugger, you can’t directly see the elements of arr, you need to check its size and then add arr.m_pData,3 (you can’t say for instance arr.m_pData, arr.m_nSize) and then pick each pointer and cast it explicitly to foo*. This may work with several elements, but what if you have tens or hundreds?

It also needs mentioning that when using a CPtrArray you need to explicitly iterate though its elements and delete each object pointed by the element when the container goes out of scope and needs to be destroyed.

for(INT_PTR i = 0; i < arr.GetSize(); ++i)
{
  foo* temp = (foo*)arr.GetAt(i);
  delete temp;
}

By replacing the CPtrArray with a std::vector<foo*> you can immediately solve this problem with the debugger and you can easily inspect the content of the vector.

std::vector<foo*> vec;
vec.push_back(new foo(1, 1.0, L"one"));
vec.push_back(new foo(2, 2.0, L"two"));
vec.push_back(new foo(3, 3.0, L"three"));

Of course by using a std::vector<foo*> you still have to iterate through the elements of the vector and delete the objects when the vector goes out of scope. But using a std::vector instead of CPtrArray is just a first step. Depending on the particular context of your code you can use values instead of pointers (so std::vector<foo>) or smart pointers (std::vector<std::shared_ptr<foo>>).

std::vector<std::shared_ptr<foo>> vec;
vec.push_back(std::make_shared<foo>(1, 1.0, L"one"));
vec.push_back(std::make_shared<foo>(2, 2.0, L"two"));
vec.push_back(std::make_shared<foo>(3, 3.0, L"three"));

Here is another example where standard C++ containers can improve the code. In this version a C-like array of wide chars is used to retrieve the name of the Windows user.

wchar_t buffer[50];
DWORD size = 50;
GetUserNameW(buffer, &size);

What if the buffer is not enough? The following is supposed to be an improved version.

wchar_t* buffer = NULL;
DWORD size = 0;
GetUserNameW(buffer, &size);

if(size > 0)
{
  buffer = new wchar_t[size];
  GetUserNameW(buffer, &size);
}

// do something with buffer

delete [] buffer;

The problem here is that the memory is explicitly allocated and released. If an exception occurs after allocating the memory for buffer and before deleting it, it will not be properly deleted and will leak.

The better alternative is to use a container, such as std::vector<wchar_t> (that guarantees contiguous memory) or in this particular case even std::wstring. These containers makes the allocation and release of memory transparent to the user and if an exception occurs the container object is destroyed during stack unwinding and when the container is destroyed it automatically releases memory. So the below code is exception safe.

std::vector<wchar_t> buffer {};
DWORD size {0};
auto ret = GetUserNameW(nullptr, &size);

if(ret == 0 && ERROR_INSUFFICIENT_BUFFER == GetLastError() && size > 0)
{
  buffer.resize(size);
  GetUserNameW(buffer.data(), &size);
}

// do something with buffer

In general the following standard containers should be a replacement for:

  • std::vector for C-like arrays (when the size is known at runtime), CArray, CDWordArray, CStringArray, CPtrArray, etc. and even CList, depending on how the list is used
  • std::array for C-like arrays when the size is known at compile time
  • std::list for CList
  • std::map for CMap

As for the other standard containers use them based on your particular needs.

References and values over pointers

I have encountered many cases when pointers were overused, whether as local variables or as function arguments.

Here is a simple example (just as a stub for simplicity):

void checkString(CString* pstrText)
{
   char ch = 0; // some character to find
   int pos = pstrText->Find(ch);
   if(pos != -1)
      *pstrText = "recompose the text";
}

So this function takes a pointer to a CString because in some case it has to update the string. But on the other hand it is not checking the value of the pointer. You could supply a null pointer and then the application will crash. This can be easily changed so that it takes a reference.

void checkString(CString& text)
{
   char ch = 0; // some character to find
   int pos = text.Find(ch);
   if(pos != -1)
      text = "recompose the text";
}

I have encountered lots of functions that took pointers for no valid reason. Most of them can be changed to use references or values (move semantics are now available). But there are cases when you have to check whether the argument received is “valid” or not. That is easy with a pointer, because you check it against null. But that’s not possible with a reference. In the following example we have a LogError function that logs a message to a specified target. But if no target is supplied a default one is used. Supplying the target as a pointer makes it easier to check if is valid or not.

void LogError(ILogTarget* target, std::string const & text)
{
   ILogTarget* t = target != nullptr ? target : &_defaultTarget;

   t->Log(text);
}

Even though this may look like a good candidate for using pointers, in many cases it could be refactor with two overloads: one that takes a reference to a target, and one that takes no target at all and uses the default one.

void LogError(ILogTarget& target, std::string const & text)
{
   target.Log(text);
}

void LogError(std::string const & text)
{
   LogError(_defaultTarget, text);
}

Sometimes pointers to heap objects are used to avoid copying objects. Move semantics were not available prior to C++11 so values could only be copied. Having the object on the heap and passing around pointers to it improved performance by avoiding unnecessary copies. However, with move semantics in place this is no longer the case. Heavy objects can be moved instead of copied. You should follow the Rule of Five that says that when a type needs to implement one of destructor, copy constructor, copy assignment operator, move constructor and move assignment operator it should implement all of them.

Smart pointers

There are situations when pointers are still necessary. In those cases you should use smart pointers, not raw pointers. They help you automatically managing the lifetime of objects and avoid creating memory leaks.

There are several smart pointers provided by the standard C++ library:

  • std::shared_ptr: for objects whose ownership must be shared, destroys the pointed object when the last shared pointer object that points to the object is destroyed.
  • std::unique_ptr: for objects that do not need shared ownership, destroys the pointed object when the smart pointer is destroyed (since it retains sole ownership of the object)
  • std::weak_ptr: holds a non-owning reference to an object managed by a std::shared_ptr

The standard library also provides two functions, std::make_shared and std::make_unique for creating a std::shared_ptr and std::unique_ptr in an exception safe manner.

No pointers to void

This point has been discussed already, but I am mentioning it again as a stand alone issue to stress its importance. A pointer to void represents raw memory that can be anything and requires explicit casting to an appropriate type. It is rarely useful and unless you write something like malloc you shouldn’t use it. The compiler has no information about what type of objects it may actually point to and cannot prevent you from casting to the incorrect type. That also makes it impossible for the debugger to show you the content of the pointed object, unless you explicitly cast to the correct type (as we’ve already seen above).

Range-based for loops

Many times when you iterate through a range you don’t care about the index and you may not even care about the type of the object. However, most loops look like this:

void foo(CStringArray& arr)
{
   for(INT_PTR i = 0; i < arr.GetSize(); ++i)
   {
      CString str = arr.GetAt(i);
      
      // do something with str
   }
}

or

void foo(std::vector<CString>& arr)
{
   for(std::vector<CString>::iterator i = arr.begin(); i < arr.end(); ++i)
   {
      CString& str = *i;

      // do something with str
   }
}

Many loops like this can be re-written in a simpler and more readable way. For instance the second example is equivalent to:

void bar(std::vector<CString>& arr)
{
   for(auto &str : arr)
   {
      // do something with str
   }
}

This is syntactic sugar. The compiler transforms this into a loop similar to the one shown above. But it requires that a begin() and end() function exist for the object/container that is iterated. So if you attempt to do the same with the first example using CStringArray you get errors:

void foo(CStringArray& arr)
{
   for(auto std : arr)
   {
      // do something with str
   }
}

1>error C3312: no callable 'begin' function found for type 'CStringArray'
1>error C3312: no callable 'end' function found for type 'CStringArray'

This can be solved by defining such functions for the MFC containers. The following example is for CStringArray. You can similarly define for the many other MFC containers, but notice that you should define both const and non-const iterator classes.

template <typename A, typename T>
class CTypeArrayIterator
{
public:
   CTypeArrayIterator(A& collection, INT_PTR const index):
      m_index(index),
      m_collection(collection)
   {
   }

   bool operator!= (CTypeArrayIterator const & other) const
   {
      return m_index != other.m_index;
   }

   T& operator* () const
   {
      return m_collection[m_index];
   }

   CTypeArrayIterator const & operator++ ()
   {
      ++m_index;
      return *this;
   }

private:
   INT_PTR  m_index;
   A&       m_collection;
};

inline CTypeArrayIterator<CStringArray, CString> begin(CStringArray& collection)
{
   return CTypeArrayIterator<CStringArray, CString>(collection, 0);
}

inline CTypeArrayIterator<CStringArray, CString> end(CStringArray& collection)
{
   return CTypeArrayIterator<CStringArray, CString>(collection, collection.GetCount());
}

I have created an open source project called MFC Collection Utilities that enables MFC developers to use MFC containers (arrays, lists, maps) with range-based for loops. The library requires Visual Studio 2012 or a newer version and has support for all MFC collections (both template and non-template). A NuGet package is also available here.

Virtual specifiers

C++ does not enforce you to specify the virtual keyword on derived classes in a hierarchy to indicate that a function is overriding a base class implementation. Having virtual in the class where the function is first declared is enough. Many developers tend to ignore the virtual keyword on derived classes and that makes it hard to figure, especially on large code bases or large hierarchies which function is virtual and is actually overriding a base implementation.

class foo
{
protected:
  virtual void f();
};
 
class bar : public foo
{
protected:
  void f();
};

When you work with deep hierarchies (let’s say more than 5 levels) and many classes on each level and each class having lots of functions it’s hard to figure what is virtual and what not just by reading the code if virtual is not specified. You need to go one level up, check there, and if you don’t find the answer go one more level and so on until you get to the root of the hierarchy. I had to do that countless times and I always wished virtual was mandatory.

C++11 has introduced two new keyword specifiers for virtual functions: override, to indicate that a virtual function overrides another implementation and final, to indicate that a virtual function can no longer be overridden.

These new virtual specifiers have a double value. First, they allow the compiler to immediately flag errors. For override the compiler can detect that the signature does not match a base class signature of a virtual function and issue an error. For final it can detect that a derived class is re-implementing a virtual function that should not be overridden any more. Second, they provide a better self documentation of the code, that I find equally important.

This is how I believe the above code should always be written:

class foo
{
protected:
  virtual void f();
};
 
class bar : public foo
{
protected:
  virtual void f() override;
};

Even though you are not able to go through large legacy code bases and make the changes to use the virtual, override and final keywords in derived classes you should always do that for new code that you write, or on code that you work to maintain.

Const correctness

The const keyword should be used on all variables that do not change their value and all member functions that do not alter the state of the object. This helps not only better documenting your code, but also allow the compiler to immediately flag incorrect use of immutable variables or functions and also give it a chance to better optimize your code. It should be used on all variables or function parameters that are not suppose to be changed, as well as on non-static member functions to indicate they do not and should not change state (of the this object).

Besides const you should also use constexpr, a C++11 specifier that indicates that the value of a function or variable can be evaluated at compile time, and therefore used in places owhere only compile time constance expressions are allowed (such as the size of an array).

constexpr mat_size(int const rows, int const cols) 
{
   return rows * cols;
}

int matrix[mat_size(10, 5)] = {0};

Notice that constexpr implies const on objects and class member functions, and inline on all functions.

Conclusions

Modernizing legacy C++ code with C++11 features (and style) can improve performance, robustness, readability and maintainability. If you work on large projects you won’t be able and probably don’t want to change everything, but you can adopt a strategy to refactor and modernize key parts (if time and budget allows), and then when you work to maintain pieces of code you refactor them as you go. As for the new code I strongly recommend you write it only with C++11/14/17.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Marius Bancila
Architect Visma Software
Romania Romania
Marius Bancila is the author of Modern C++ Programming Cookbook. He was a Microsoft MVP for VC++ and later Visual Studio and Development Technologies for 11 years. He works as a system architect for Visma, a Norwegian-based company. He is mainly focused on building desktop applications with VC++ and VC#. He keeps a blog at http://www.mariusbancila.ro/blog, focused on Windows programming. He is the co-founder of codexpert.ro, a community for Romanian C++ programmers.

You may also be interested in...

Pro

Comments and Discussions

 
Question5/5 Pin
loxa17-Sep-15 0:22
memberloxa17-Sep-15 0:22 
GeneralInspecting objects in MFC containers while debugging Pin
SoMad20-Jan-15 19:15
protectorSoMad20-Jan-15 19:15 
GeneralRe: Inspecting objects in MFC containers while debugging Pin
Marius Bancila16-Mar-15 12:42
memberMarius Bancila16-Mar-15 12:42 
GeneralMy vote of 5 Pin
Raul Iloc11-Jan-15 1:17
mvpRaul Iloc11-Jan-15 1:17 
GeneralMy vote of 3 Pin
filippov.anton21-Dec-14 19:40
memberfilippov.anton21-Dec-14 19:40 
GeneralRe: My vote of 3 Pin
loxa17-Sep-15 0:20
memberloxa17-Sep-15 0:20 
QuestionGood article Pin
LWessels17-Dec-14 1:43
memberLWessels17-Dec-14 1:43 
Questionpointers vs references Pin
Member 255500616-Dec-14 7:51
memberMember 255500616-Dec-14 7:51 
AnswerRe: pointers vs references Pin
GKarRacer16-Dec-14 9:10
memberGKarRacer16-Dec-14 9:10 
GeneralRe: pointers vs references Pin
Member 255500616-Dec-14 10:01
memberMember 255500616-Dec-14 10:01 
GeneralRe: pointers vs references Pin
GKarRacer16-Dec-14 10:19
memberGKarRacer16-Dec-14 10:19 
QuestionMy vote of 5 Pin
kanalbrummer16-Dec-14 5:51
memberkanalbrummer16-Dec-14 5:51 
GeneralMy vote of 5 Pin
Caerwyn16-Dec-14 2:41
memberCaerwyn16-Dec-14 2:41 
QuestionReally good Pin
DanielRomax16-Dec-14 2:37
memberDanielRomax16-Dec-14 2:37 
QuestionThank you! Pin
PCC75115-Dec-14 20:35
memberPCC75115-Dec-14 20:35 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170924.2 | Last Updated 15 Dec 2014
Article Copyright 2014 by Marius Bancila
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid