1. Introduction

Most of the large-scale projects start from a small project, and gradually evolve into a larger one. The issues one might face in a large-scale project may not be very prominent when its size is small. Therefore most of the projects, which initially start with the smaller size, may not handle those issues properly when its size grows. One such problem that may arise in large-scale C++ project is physical dependencies, also known as compilation dependencies, of a project. Compilation dependencies, if not managed properly, can increase the compilation time of a project unnecessarily.

Design patterns [2] are usually used to discuss the logical design of the project, but it is also helpful to manage the physical design. Although prototype hierarchy [1] was the first design pattern to discuss the compilation dependencies, but there were already some techniques, and idioms [3], which discuss this issue. PImpl principle [4], also known as pointer to implementation, is also one of that, which can be said to be a variant of Handle/Body idiom [3]. Changes are inevitable in a large project. Here we are going to introduce some techniques which are useful to minimize the compilation time during development.

2. Separate Compilation

It is common practice of C++ programmers to break the code in multiple implementation files (usually extensions with .c, .cxx, .cpp, etc.), and definition files (usually extensions with .h, .hxx, .hpp, etc.). It is the responsibility of the preprocessor of a language to make contents of all the required definition files available in the implementation file before compilation.

We do this because we want to reduce the compilation time during development as well as reuse the code written in different files. For example, we want to develop some project, which has 10,000 lines of code, now during the development of the project or after it if we change any single line, then the compiler has to recompile all the 10,000 lines. In today’s computers, it might not be a big problem, but it will eventually become a nightmare when projects become larger and larger. On the other hand, if we split our project into more than one file, such as 10 files each contain roughly 1000 lines, then any change in one file ideally should not affect other files. It is very common in large-scale projects, that it have some general-purpose classes, which are useful in other projects too. So the natural solution to use those classes in other projects is to make those classes in separate files.

On the other hand, if we don’t develop the program carefully then sometimes it is impossible to just include these two files in other project, and use it. One of the most common problems that may arise is to include some other definition files too in our project, which we might not need. And other files may also need some other files; therefore at the end we may have to include a bunch of files to just use one single class.

From a compiler's prospective, an implementation file with all expended preprocessor directives is called a translation unit. In other words, translation unit is an implementation file with all the definition files included, and macro expended. If we change anything in any definition file, then all the files in which this definition file is included, needs to be recompiled, weather it is a definition file or implementation file.

If one definition file is included in other definition file, then change in the first definition file will alter all the files that include either the first file or the second file. The situation becomes even worse when the definition file included another definition file, which includes another definition file and so on. Now change in one file may mean that the compilation is not limited to one file only, but it may involve recompiling the whole project. This diagram shows this concept clearly.

It doesn't matter that our camera class does not include Point.H or ViewPort.H directly; it is included in the camera translation unit. A change in point header file will compile not only camera translation unit, but also all translation units in this example.

3. Applying Patterns to Minimize Compilation Dependencies

The above dependencies can be minimized with the help of forward decelerations [4]. But sometimes, it is impossible to use classes with only forward deceleration. Let’s take an example to better understand this. It is not unusual for a program to communicate with different databases such as Oracle, Sybase, and SQL Server, etc. at a time, and change the database at run time. To gain the maximum speed benefit, we can use the native APIs of these databases. To give the similar and polymorphic interface to the client, we make an abstract base class called Database, which contains the pure virtual functions of all required interface, and inherit all of the database specific classes from it. We can also keep all these classes in separate components, if necessary. Just to keep things simple, here is our database class.

class __declspec(dllexport) Database
{
public:
	Database(void);
	virtual ~Database(void);
	virtual bool OpenConnection(std::string connectionString) = 0;
	virtual void CloseConnection(void) = 0;
	virtual void ExecuteCommand(std::string command) = 0;
};

We inherited three classes from it for Oracle, SQL Server and Sybase implementation. Here is a code of Oracle class, others are very similar to this.

class __declspec(dllexport) Oracle :
	public Database
{
public:
	Oracle(void);
	~Oracle(void);
	bool OpenConnection(std::string connectionString);
	void CloseConnection(void);
	void ExecuteCommand(std::string command);
};

In the implementation of these methods, I simply display the message about which functions are called. Here is our implementation:

bool Oracle::OpenConnection(std::string connectionString)
{
	std::cout << "Oracle::OpenConnection" << std::endl;
	return false;
}

void Oracle::CloseConnection(void)
{
	std::cout << "Oracle::CloseConnection" << std::endl;
}

void Oracle::ExecuteCommand(std::string command)
{
	std::cout << "Oracle::ExecuteCommand" << std::endl;
}

This is a class diagram of our classes:

In this design, we have to include the definition file of a child class in the client program, because without that we won’t be able to create an object of it [5]. If the client of these classes does not know in advance which database to communicate, or wants to give this flexibility to the user, then it has to include definition files of all the child classes. Here is a simple client code to demonstrate this:

Database* pDataBase = NULL;

switch (choice)
{
case 1:
	pDataBase = new Oracle();
	break;

case 2:
	pDataBase = new SQLServer();
	break;

case 3:
	pDataBase = new Sybase();
	break;
}

if (pDataBase != NULL)
{
	pDataBase->OpenConnection("This is connection string");
	pDataBase->ExecuteCommand("This is command");
	pDataBase->CloseConnection();

	delete pDataBase;
}

In addition, if we want to add one more database support, then we need to inherit its class from Database, and also include its definition file in the client, which results in a lot of recompilation.

We can reduce the dependencies between these classes and clients by introducing indirection. We introduce a Factory method [2] to create the object of the child classes instead of client. Now client only communicates with factory method to create instances of the required class. We create DatabaseFactory class with one static method CreateObject. Now it is the responsibility of this method to create the object of appropriate class and return its address. Here is a code of our factory method (CreateObject method in DatabaseFactory class).

Database* DatabaseFactory::CreateObject(int databaseType)
{
	if (databaseType == 1)
		return new Oracle();
	else if (databaseType == 2)
		return new SQLServer();
	else if (databaseType == 3)
		return new Sybase();
	else
		return NULL;
}

Here is a class diagram of this:

The client of the database classes will need to create instances appropriate database with CreateObject methods of DatabaseFactory class depending on the information passed in form of parameters. The advantage of this technique is that client of database classes now need the definition files of only two classes, i.e. DatabaseFactory and Database. Here is a client code using the factory method:

Database* pDataBase = NULL;

pDataBase = DatabaseFactory::CreateObject(choice);

if (pDataBase != NULL)
{
	pDataBase->OpenConnection("This is connection string");
	pDataBase->ExecuteCommand("This is command");
	pDataBase->CloseConnection();

	delete pDataBase;
}

In future, if we want to add support of one more database such as DB2, MySql, etc., then we don’t need to include its definition file at the client side.

With the addition of new database support, the only thing we need to change is the implementation of CreateObject function in DatabaseFactory class. If this function is not made in-line, then it will not affect the client of database, and reduce compilation. It is also a better practice to write the function body in the implementation file, even if it is an in-line function, to reduce the physical dependencies [6]. If performance is concerned, then this function can be declared inline explicitly. If there is any change in the implementation of the function, then the compiler will only recompile that translation unit. On the other hand, the change of implementation of function means the recompilation of all the translation units that contain this definition file.

4. Conclusion

Most of the compile time dependencies can be removed with the proper use of design patterns. Design patterns are not only useful to improve the logical design of the project, but they can also make the physical design of the project better to minimize the compilation time of the project. There is a rule written in “The Elements of Style”, “Omit needless words” [7]. We can apply the similar rule here, “Omit needless headers”.

Most of the things discussed here are used to reduce the compile time dependencies of a project. This work can be further enhanced to minimize the link time dependencies too.

5. References

Large Scale C++ Software Design
John Lokos
Design Pattern, Elements of Reusable Object Oriented Software
Erich Gamm, Richard Helm, Ralph Johnson, John Vlissides
Advance C++ Programming Style and Idioms
James O Coplien
Exceptional C++
Herb Sutter
The C++ Programming Language 3rd edition
Bjarne Stroustrup
Manage Physical Dependencies of a Project to Reduce Compilation
Zeeshan Amjad
http://www.codeproject.com/KB/cpp/ZeeshanPhysical.aspx
http://www.codeguru.com/Cpp/Cpp/cpp_mfc/files/article.php/c6859/
The Elements of Style
William Strunk Jr, E.B. White, Roger Angell

6. History

31^st May, 2011: Initial version