Click here to Skip to main content
14,831,776 members
Articles / Programming Languages / C++
Posted 7 Oct 2019


60 bookmarked

A Static Analysis Tool for C++

Rate me:
Please Sign up or sign in to vote.
4.98/5 (19 votes)
7 Apr 2021GPL321 min read
Automating Scott Meyers' recommendations, cleaning up #include directives, and analyzing code dependencies
This article is a user guide to a static analysis tool for C++ code. Among other things, the tool can clean up #include lists, highlight violations of C++ best practices, and analyze dependencies within the code base. It can also implement some of its suggestions by editing the code. The article also provides a high-level overview of the tool's implementation.


C++ is a large language—too large, some would argue. Because it's a superset of C, it's easy for developers with a C background to build a hybrid OO/non-OO system. C++ also kept the preprocessor, which is sometimes used in what can only be described as despicable ways. And rather than risk offending legacy systems, the C++ standards committee seems very reluctant to deprecate anything—but not at all reluctant to keep adding what seems like one pedantic feature after another, at least to those of us struggling to keep up.

As a result of all this, there are often many ways to do something in C++, and figuring out which way is best can be difficult. Without guidance, it can only be learned through torturous experience. It is therefore unsurprising that there are many books about C++ best practices, such as Scott Meyers' Effective C++. But it's easy to forget their recommendations when you're immersed in coding, especially when new to the language. Of course, some developers don't even bother to read such books, being of the "If it works, it's correct—so don't touch it!" school. Having a tool that could serve as an automated Scott Meyers code inspector would go a long way to addressing these issues.


When I started to develop the Robust Services Core (RSC), I had a reasonable knowledge of C++ but was far from proficient. The code grew very organically and was continually refactored. As I became more familiar with C++ and needed to revisit areas of the code that had lain dormant for a while, I kept finding things that I would now do differently. But there was always more code to develop and never enough time to do a tedious code inspection to find and "fix" all the things that could be improved.

Eventually I decided that, at the very least, it would be nice to clean up all the #include directives. Surely there was a publicly available tool for this. This was circa 2013, and the only thing I found was a Google initiative called "Include What You Use", which appeared to have been mothballed.1 I therefore decided to write such a tool as a diversion from the main focus of RSC.

Some diversion! It soon became apparent that fixing #include lists, to add the directives that should be there and remove those that shouldn't, meant writing a parser. And not just a parser, but something closer to a compiler, because it would also have to do name resolution and other things. Another option was to take an open-source C++ compiler and either modify it or extract the necessary information from files that it might produce.

Rather than give up, I decided to try writing the tool from scratch. It would be a learning experience, even if the attempt ultimately had to be abandoned. This article describes the current state of the code that emerged.

Using the Code

Not only does the code clean up #include directives, it serves as an automated Scott Meyers code inspector that can implement some of its recommendations by suitably editing the source code. Its main drawback is that it only supports the subset of C++11 that RSC uses. Although this is a reasonable subset of the language, what's missing will hamper its usefulness to projects that use unsupported language features. Adding one of these missing language features can be anywhere from moderately easy to quite challenging. Nonetheless, feel free to request that a specific language feature be supported—or even volunteer to implement it! This will make the tool useful to a wider range of projects.

Unlike previous articles that I've written, this one focuses more on how to use the code, and not much on how it works. However, it will provide a high-level overview of the design as a roadmap for those who want to dig into the code.


Defining the Library

Before the tool can be used, the files that make up the code base must be defined. This can be done right after RSC starts by entering the command >read buildlib from the CLI. That ">" is RSC's CLI prompt and is not entered, but this article uses it to denote a CLI command. A dump of all CLI commands is available in help.cli2; scroll down to somewhere around line 1246, to "ct>help full", to see those in the ct directory, which is where the tool is implemented.

What >read buildlib does is execute the script buildlib, which contains a sequence of CLI commands. This results in the execution of the following commands, which are copied from the console transcript file that RSC generates, with commands not relevant to this article removed:

nb>read buildlib
ct>read lib.create
ct>import subs  subs
ct>import nbase nb
ct>import ntool nt
ct>import ctool ct
ct>import nwork nw
ct>import sbase sb
ct>import stool st
ct>import mbase mb
ct>import cbase cb
ct>import pbase pb
ct>import onode on
ct>import cnode cn
ct>import rnode rn
ct>import snode sn
ct>import anode an
ct>import diplo dip
ct>import rsc   rsc

The tool is in the ct directory, so the command >ct is used to access the CLI commands in that directory. The script lib.create is then read. It contains a series of >import commands that add, to the code library, all of the directories that are needed to compile the project (RSC, in this case). For example, the command

ct>import ctool ct

imports the code in the ct directory, which can subsequently be referred to as ctool in other CLI commands. The path to this directory is relative to the SourcePath configuration parameter. When RSC starts up, it obtains its configuration parameters from the file element.config. So to use the tools on your own code, you need to

  • Modify element.config by setting its SourcePath entry to a directory that subtends all of your project's code files.
  • Create a file similar to lib.create in the same directory as RSC's lib.create. Each of the >import commands in that file must specify a directory that is relative to your new setting for SourcePath.
  • Copy the subs directory from RSC into your own project, just below your SourcePath directory, and include the command >import subs "subs", as found in RSC's lib.create, in your version of lib.create.
  • Modify the buildlib script to >read your version of lib.create.

Each >import command ends up creating a CodeDir instance for its directory and a CodeFile instance for each code file3 in that directory. There are currently two restrictions:

  • Each file name must be unique (i.e., the same name cannot be used in more than one directory).
  • All of the code files in a directory get imported (i.e., there is no way to exclude a code file).

Parsing the Code

Once all of the source code directories have been imported, the entire code library can be parsed, which is a prerequisite to checking it with the static analysis tool. This is done with the command

>parse - win32 $files

in which

  • - specifies that no parser options are being used (the only options are ones that enable debug tools)
  • win32 specifies that the target is 32-bit Windows (currently, the only other target is win64)
  • $files is a built-in library variable that contains the set of all code files

If $files is replaced with f ctool, meaning all the code files in the ct directory, the result (again taken from the console transcript file) looks like this:

ct>parse - win32 f ctool
  std::bitset<unsigned char>
// [many lines deleted]
  std::iterator_t<const CodeTools::Cxx::Keyword>
Updating cross-reference...
  Total=181, failed=0

As each file is parsed, its name is displayed. Template instantiations are indented (and indented further, when one template causes the instantiation of another).

The first RSC file to be parsed is FunctionGuard.h. The files that precede it are either from the standard library or Windows. However, they are not the actual instances of those files. Rather, they are taken from the subs directory, which contains simplified versions of them. These versions avoid the need to

  • >import files that are external to the project from a wide range of directories
  • #define all the names that would be needed to correctly navigate all the #ifdefs in external files
  • support C++ language features used by external files but not by the project
  • parse lots of things that the project doesn't use

Consequently, before you can >parse your own project, you must ensure that the subs directory contains a stand-in for each external header that your project #includes, and that each stand-in declares the items that you use from it. Note that in the case of templates, subs headers do not need to provide function definitions.

Performing a Code Inspection

Now that all of the code has been parsed, it can be checked for violations of design guidelines:

>check rsc $files

This produces the file rsc.check, which contains all of the warnings that were found. Basic documentation for each of the 130 or so warnings that >check can produce can be seen in the file cppcheck.

If >check is run on a subset of the code, it will first >parse any unparsed code that would be needed in a successful build. This avoids false positives, such as warnings that a function is not defined or is unused.

Before merging into the master branch, I usually run >check on all of the code and use the diff tool in VS2017's GitHub plug-in to see if any new warnings have arisen since the last merge.

At present, the only way to suppress a warning is to modify the function CodeWarning::Suppress.

Because headers in the subs directory do not provide function implementations for templates, >check can erroneously recommend things such as

  • removing an #include that is needed to make a destructor visible to a unique_ptr template instance
  • declaring a data member const even though it is inserted in a set and must therefore allow std::move
  • removing most of the things in Allocators.h (which is only invoked from the STL, not from within RSC)

Applying the Recommendations

The >fix command is currently able to resolve about half of the warnings:

fix               : Interactively fixes warnings detected by >check.
  (0:133)         : warning number from Wnnn (0 = all warnings)
  (t|f)           : prompt before fixing?
  <str>           : a set of code files

For example, the following modifies all code files by deleting unnecessary #include directives, which is warning W018:

>fix 18 f $files

To select which occurrences of a warning to fix, ask to be prompted. For example,

>fix 53 t $files

will prompt before fixing each occurrence of warning W053, "Data could be const".

Warning: Before using >fix, be sure that you can recover the original version of the file if something goes wrong. It works on RSC's code, but that doesn't mean it's been thoroughly tested!

Exporting the Library

After the code has been parsed, the >export command can generate any combination of the following files:

  • A .lib file displays parsed code in a standard format and includes

    • the underlying type for each auto variable;

    • the number of times each item was

      • referenced,

      • initialized, read, or written (for data),

      • called (for functions); and

    • the file in which each item was defined (for data and functions).

  • A .trim file lists the external symbols used within each file, as well as the recommendations for which #include directives, using statements, and forward declarations the file should add or remove. Those recommendations also appear as warnings in the .check file.

  • An .xref file contains a global cross-reference (each symbol, followed a list of the files that use it, along with the line numbers where the symbol appears).

Analyzing Code Dependencies

Many of the CLI commands in the ct directory take an expression as their last parameter. So far, we've only mentioned $files, but an expression can contain both variables and operators. The user defines a variable with the >assign command, and the library also provides the following variables, which cannot be modified directly:

Variable Contents
$dirs directories that have been added to the library by >import
$files all code files (headers and implementations) found in $dirs
$hdrs headers in $files
$cpps implementations (.c*) in $files
$subs headers that declare items which are external to the code base
$exts headers that appear in an #include directive but whose directories were not added to the library by >import (which will cause >parse to fail)
$vars all variables (those above, and any that the user has defined)

An expression is evaluated left to right, but parentheses can be used to override this. A variable is a set of either directories or files. The following notation is used in the expressions that appear below:

Set Contents
D the name of a directory (as defined by >import) or a set of directories
F the name of a specific file or a set of files
C the name of a specific C++ code item or a set of such items
S any of the above (D, F, or C)

Here are the operators that can be used as soon as >import commands have built the library. The Expression column specifies the type of parameter(s) that the operator expects. The Result column is what the operator returns, which can be used as the input to other operators or commands such as >assign and >list.

Operator Name Expression Result Semantics
union S1 | S2 S set union of S1 and S2 (the '|' is optional)
intersection S1 & S2 S set intersection of S1 and S2
difference S1 - S2 S set difference between S1 and S2
files f S F the files in S
directories d S D the directories in S
filename F fn <str> F files in F with the file name <str>*
filetype F ft <str> F files in F with the file type *.<str>
matches F ms <str> F files in F whose name partially matches <str>
in F in D F files in F whose directory is in D
users us F F files that #include any in F
used by ub F F files that any in F #include
affecters as F F ub F, transitively
affected by ab F F us F, transitively
common affecters ca F F (as f1) & (as f2) &(as fn), where f1…fn are the files in F

After the >parse command has run, additional operators become available on the compiled code:

Operator Name Expression Result Semantics
implements im F F for each item declared (defined) in F, add the file that defines (declares) it
needers ns F F files that also need F in a build (im ab F, transitively)
needed by nb F F files that F also needs in a build (im as F, transitively)
declared by db S C code items declared within S
declarers ds C C code items that declare the items in C
definitions df C C distinct definitions of the items in C
referenced by rb S C code items referenced within S
referencers rs C C code items that reference those in C

The main purpose of these operators is to analyze dependencies among code files and C++ items. Here are some simple examples to serve as an introduction.

In the first image, the first command lists the users of Thread.h. If you searched all of RSC's code files for include "Thread.h", these are the files that you would find. Next, the .cpp files in the sbase directory are assigned to the variable sbim, and the files that could be affected by changing Thread.h are assigned to thrab. The intersection of sbim and thrab is assigned to sbthr and the result is displayed. These are the .cpp files in the sbase directory that could be affected by changing Thread.h. Finally, the files that implement Duration.h are listed. What's Thread.cpp doing in there?! Well, if you looked at the code, you would find that a number of constants declared in Duration.h, and used even before main is entered, are initialized at the bottom of Thread.cpp to avoid the "static initialization order fiasco" for which C++ is so infamous.

Image 1

We continue by assigning Thread to thr. But Thread can refer to many things: the Thread class, its constructor, or one of its forward declarations. So we have to indicate that we want the constructor. If we had entered Thread::Thread instead, it would have been unambiguous. When we list thr, we see that it's a function in Thread.h. If we list its definition, we see that its implementation begins on line 1062 of Thread.cpp. Finally, listing the items referenced by thr reveals a forward declaration of Daemon and the enumerator Faction. These types are used to specify the parameters to the Thread constructor.

Image 2

If we list the items referenced by the definition of thrthat is, by the constructor's implementationthe output fills the screen:

Image 3

Finally, listing the references to thr reveals the constructors that make a base class constructor call to Thread:

Image 4

The operators db, ds, df, rb, and rs are recent additions to the static analysis tool. They allow dependencies between C++ code items, not just files, to be analyzed. This can assist the architect who wants to layer a monolithic code base by partitioning it into libraries so that the software not required by a given product can be excluded from its build. Although these operators return sets of C++ code items, the files where those items reside can easily be found by prefixing the f operator, and the db and rb operators can also be used on files or even directories. For example:

>list f ds C displays the files that declare the items in C
>list f rs C displays the files that reference the items in C
>list db F1 & rb F2 displays the items declared in files in F1 and referenced by files in F2

Additional Details

What to #include

Interactions exist among the warnings for adding and removing #include directives, using statements, and forward declarations. CodeFile::Trim generates these warnings. Its basic rules are

  • Always #include something if nothing guarantees that it will be visible transitively.
  • Don't #include something that will definitely be visible transitively. It is necessary to #include a base class, as well as a class that is used directly. However, it is not necessary to #include their base classes, even when using something declared in one of those transitive base classes. Similarly, it is not necessary for a .cpp to #include anything that its header will #include.
  • If a class is only used indirectly (i.e., as a pointer or reference type), don't #include it. Use a forward declaration instead. If there is no guarantee that one will be visible transitively, add one to this file.
  • A header should not contain a using directive or declaration. It is therefore told to remove it, and any .cpp that relies on it is told to add it.
  • If an #include, forward declaration, or using statement is not needed to resolve a symbol, remove it.

All of these warnings can be resolved by >fix, which will, for example, insert a forward declaration in the correct namespace and fully qualify symbols from another namespace when removing a using statement.

High-Level Design

The parser is implemented using recursive descent, which makes its code easy to read and modify. The advent of unique_ptr was a godsend to these types of parsers, which were previously cursed by the need to delete objects when backing up. Placing each of these objects in a unique_ptr allows the parser to back up without having to write any code to delete them.

The parser does not check everything in the same way that a full parser must. It assumes that the code correctly compiles and links, so it only contains enough checks to produce a correct parse. Its grammar, which is informally documented in the relevant functions, is also far simpler than a complete C++ grammar.

As each code file is read in during >import, #include relationships are noted. This allows a global compile order to be calculated. The only other preprocessing that occurs before parsing is to erase, within C++ code, any macro name that is defined as an empty string. Currently, the only such name is NO_OP, which RSC uses before a bare semicolon when a for statement is missing a parenthesized statement.

Once this simple preprocessing is complete, all of the code is parsed together, in a single pass. After an item is  parsed, it is added to the scope (namespace, class, function, or code block) in which it appears, and its virtual EnterScope function is invoked. It is then compiled by invoking its virtual EnterBlock function. An item's EnterScope or EnterBlock function also invokes the same function on each of its constituent parts.

Some of the warnings generated by >check are detected during >import, some are detected during >parse, and some are detected during >check itself, through the virtual function Check. CodeFile::Trim, mentioned in the previous section, uses the virtual function GetUsages to obtain, from all of its file's C++ items, the symbols that are used (a) as base classes, (b) directly, and (c) indirectly; those that were resolved by (d) forward declarations, (e) friend declarations, and (f) using statements; and those that were (g) inherited.

References are tracked by the virtual functions AddToXref and AddReference. This allows >export to create its cross-reference. The C++ items declared by another are obtained from the virtual function GetDecls, which supports the db operator.


My system, a Dell XPS15 with a 2.6 GHz Intel i7, takes 3½ minutes to  >read buildlib, >parse, >check, and >export all of RSC's code. That's using RSC's release build, which disables various optimizations so that it can be debugged (it's about 3½ times as fast as a debug build, but only half as fast as a fully optimized release build). As a comparison, Microsoft's C++ compiler takes just under 7 minutes to build RSC. Of course, this isn't a true apples-to-apples comparison because >parse doesn't lay out memory, emit object code, or generate files. But it does gather information that a regular compiler doesn't, and that 3½ minutes also includes the time needed to inspect the code (>check) and generate several large files (>export).

RSC consists of about 800 source code files and 92K lines of code, excluding those that are blank or that only contain braces or comments. When its release build initializes using its default configuration file under Win32, it uses 61MB, which could be significantly reduced by changing various configuration parameters. After executing >read buildlib, >parse, >check, and >export, it has grown by another 195MB.

The tools don't generate any intermediate or scratch files; everything is kept in memory. Using files would be a significant change, so the amount of available memory ultimately limits the size of the code library that the tools can accommodate. But my guess is that anyone with that much code could also provide a machine with enough memory—or simply purchase a commercial equivalent of the tool.

List of Code Files

The ct directory contains all of the code. If you want to dive into it, here's a summary of the files in that directory:

File Description
CodeCoverage code coverage tool (not discussed in this article)
CodeDir a directory that contains source code
CodeDirSet a set of code directories
CodeFile a file that contains source code
CodeFileSet a set of code files
CodeItemSet a set of C++ code items
CodeSet base class for CodeDirSet, CodeFileSet, and CodeItemSet
CodeTypes types for parsing and static analysis
CtIncrement CLI commands applicable to the ct directory
CtModule initialization of ct directory
Cxx types for C++
CxxArea namespaces, classes, and class template instances
CxxCharLiteral character literals
CxxDirective preprocessor directives
CxxExecute tracks code during compilation
CxxFwd forward declarations
CxxLocation tracks an item's location in its source file
CxxNamed low-level named C++ items
CxxRoot global namespace and built-in terminals
CxxScope code blocks, data items, and functions
CxxScoped arguments, base classes, enums, enumerators, forwards, friends, terminals, typedefs, usings
CxxStatement statements used in functions
CxxStrLiteral string literals
CxxString string utilities
CxxSymbols parser symbol tables
CxxToken low-level unnamed C++ items
Editor source code editor for >fix command
Interpreter interprets expressions (in CLI commands) that manipulate instances of LibrarySet subclasses
Lexer lexical analysis for Parser and Editor
Library code files, code directories, and CLI symbols
LibraryErrSet generates an error message when a CLI command does not apply to a set
LibraryItem base class for CodeDir, CodeFile, LibrarySet, and CxxToken
LibrarySet base class for CodeSet, LibraryErrSet, and LibraryVarSet (sets of items to which CLI commands can be applied)
LibraryTypes types for code library
LibraryVarSet built-in or user-defined library variables
Parser parser for C++ source code
SetOperations difference, intersection, and union operators for instances of LibrarySet


1 While preparing this article, I checked to see if anything had changed. Google's project eventually gained traction and is now on GitHub. They took the approach of building on Clang, and they say that they're currently "alpha" quality and that changes to Clang sometimes break them.

2 The file help.cli has a .txt extension, which is omitted from file names in this article.

3 A code file is assumed to be any file with no extension (e.g., <string>) or a .h, .c, .hpp, .cpp, .hxx, .xxx, .hh, .cc, .h++, or .c++ extension. This is hard-coded in CxxString::IsCodeFile.


  • 4th October, 2019: Initial version


This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


About the Author

Greg Utas
Canada Canada
Author of Robust Services Core (GitHub) and Robust Communications Software (Wiley, 2005). Formerly Chief Software Architect of the core network servers that handle the calls in AT&T's wireless network.

Comments and Discussions

GeneralMy vote of 5 Pin
Carc5-Apr-21 0:50
MemberCarc5-Apr-21 0:50 
GeneralMy vote of 5 Pin
Mirzakhmet Syzdykov23-Mar-21 22:20
professionalMirzakhmet Syzdykov23-Mar-21 22:20 
QuestionMessage Closed Pin
26-Jan-21 19:50
MemberMember 1505715926-Jan-21 19:50 
SuggestionAn overview of Code Analyzers Pin
RickZeeland15-Nov-20 23:37
mveRickZeeland15-Nov-20 23:37 
PraiseRe: An overview of Code Analyzers Pin
Greg Utas16-Nov-20 1:00
mvaGreg Utas16-Nov-20 1:00 
SuggestionRe: An overview of Code Analyzers Pin
RickZeeland16-Nov-20 1:11
mveRickZeeland16-Nov-20 1:11 
GeneralRe: An overview of Code Analyzers Pin
Greg Utas16-Nov-20 1:31
mvaGreg Utas16-Nov-20 1:31 
QuestionDo you have directions on how to actually use this tool? Pin
stheller23-Jun-20 8:28
Memberstheller23-Jun-20 8:28 
AnswerRe: Do you have directions on how to actually use this tool? Pin
Greg Utas3-Jun-20 9:34
mvaGreg Utas3-Jun-20 9:34 
GeneralRe: Do you have directions on how to actually use this tool? Pin
Gisle Vanem22-Mar-21 1:11
MemberGisle Vanem22-Mar-21 1:11 
GeneralRe: Do you have directions on how to actually use this tool? Pin
Greg Utas22-Mar-21 1:38
mvaGreg Utas22-Mar-21 1:38 
GeneralRe: Do you have directions on how to actually use this tool? Pin
Gisle Vanem22-Mar-21 2:34
MemberGisle Vanem22-Mar-21 2:34 
GeneralRe: Do you have directions on how to actually use this tool? Pin
Greg Utas22-Mar-21 3:18
mvaGreg Utas22-Mar-21 3:18 
GeneralRe: Do you have directions on how to actually use this tool? Pin
Greg Utas30-Mar-21 0:36
mvaGreg Utas30-Mar-21 0:36 
PraiseThanks Pin
Md Nahidul26-Dec-19 8:55
MemberMd Nahidul26-Dec-19 8:55 
QuestionIncorrect absolute include paths in vcproj Pin
Laurent Regnier9-Oct-19 3:11
professionalLaurent Regnier9-Oct-19 3:11 
AnswerRe: Incorrect absolute include paths in vcproj Pin
Greg Utas9-Oct-19 6:52
mvaGreg Utas9-Oct-19 6:52 
GeneralRe: Incorrect absolute include paths in vcproj Pin
Laurent Regnier9-Oct-19 21:28
professionalLaurent Regnier9-Oct-19 21:28 
GeneralRe: Incorrect absolute include paths in vcproj Pin
Greg Utas10-Oct-19 1:40
mvaGreg Utas10-Oct-19 1:40 
GeneralRe: Incorrect absolute include paths in vcproj Pin
Nelek30-Apr-20 2:52
protectorNelek30-Apr-20 2:52 
GeneralRe: Incorrect absolute include paths in vcproj Pin
Greg Utas30-Apr-20 3:13
mvaGreg Utas30-Apr-20 3:13 
GeneralRe: Incorrect absolute include paths in vcproj Pin
Nelek30-Apr-20 8:23
protectorNelek30-Apr-20 8:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.