Click here to Skip to main content
Click here to Skip to main content

What Every Computer Programmer Should Know About Windows API, CRT, and the Standard C++ Library

, 22 Aug 2008
Rate this:
Please Sign up or sign in to vote.
The article explains relationships and dependencies between the Windows API, the CRT, and the Standard C++ Library.

1. The Purpose

The purpose of this article is to clear the essential points about the Windows API, the C Runtime Library (CRT), and the Standard C++ Library (STL). It is not uncommon that even experienced developers have confusion and hold onto misconceptions about the relationship between these parts. If you ever wondered what is implemented on top of what and never had a time to figure it out, then keep reading.

2. Basics

The following diagram represents the relationship between WinAPI, CRT, and STL.

Diagram #1: The relationship between Windows API, CRT, and the C++ Standard Library

Diagram #1: The relationship between Windows API, CRT and C++ Standard Library

Adjacent blocks can communicate with each other. What does it mean? Let's go from the bottom to the top.

2.2. Hardware

Each hardware part exposes its own set of commands that enables the Operating System to control and communicate with it. The amount and complexity of the commands varies from part to part. Often, different vendors of the same part may provide additional commands beyond the requirements of a common standard. Communication with countless hardware devices with endless variety of commands would be enormous toil for software writers if they had to access it directly. Here, the Operating System comes to the rescue.

2.3 Operating System

The purpose of the OS is to encapsulate all the intricacies of the underlying hardware and provide a unified access interface to the computer's parts. No application can access the hardware directly. Only the OS can access the hardware. The part of the OS that accesses the hardware is said to run in kernel mode.

Older OSs like MS-DOS, for example, allowed programs to access hardware resources directly. Though it enabled software writers to make certain performance gains, in the long run, this technique often made the software very brittle, and incompatible with newer hardware parts.

2.4 Application Programming Interface

The OS exposes the underlying machine resources by means of an Application Programming Interface (API). An API is a uniform set of functions that enables software developers to abstract from hardware peculiarities and focus on their own goals. An application cannot bypass the OS and access hardware resources directly. It is commonly said that applications run in user mode. MS Windows provides an API as a set of C functions. The C language is chosen as the lowest common denominator for software development under the Windows platform.

2.4.1 Platform Software Development Kit

MS distributes a free Platform Software Development Kit (Platform SDK or PSDK), which enables software developers to write Windows programs. The PSDK contains:

  1. Header files with API function declarations
  2. Import lib files to link with (where calls to API functions are redirected to the relevant DLLs)
  3. Documentation
  4. Various binary helper tools

For example, to open or create a file, we call the CreateFile function, which is declared in the "WinBase.h" header file and requires the "Kernel32.lib" library to link with.

The names of Windows API functions follow the Camel case naming convention and usually are easily distinguished by this. Names of macros and constants are conventionally in uppercase. Each function always has a "Requirements" section on its documentation page where the necessary headers, import libraries, and supported OS versions are specified.

A Windows application can call any API function, provided the application follows the function's signature and links with the appropriate import library (or gets the function's address directly from the implementing DLL with the GetProcAddress call).

2.5 C Runtime Library

On top of the OS API functions, software vendors implement the C Runtime Library (CRT). CRT is a standardized set of header files and C functions which implement common tasks such as string operations, some math functions, basic input/output etc. Usually, the same vendor that makes the C compiler also provides the CRT implementation. The International Organization for Standardization [^] is responsible for the C language standard and its runtime library.

2.5.1 Standards and Extensions

Theoretically, by using only standard C functions, the developer can ensure that the same code may be used to build and run a program under any platform where a decent C compiler and CRT implementation exists. However, in practice, software vendors include many useful extensions to standard library functions, which make developers' life easier but at a price of portability.

The names of CRT functions are in lower case. The names of macros and constants are in uppercase. The names of extensions begin with the underscore character; for example, the _mkdir function. Each function always has a "Requirements" section on its documentation page where its header is specified.

2.6 Unicode Awareness

2.6.1 Platform SDK is Already Unicode Aware

Actually, the above mentioned Win32 API names are not real names. These names are mere macros that are defined in the PSDK header files. So, when the PSDK documentation mentions a function, for example CreateFile, a developer should be aware that CreateFile is a macro. The true names of the CreateFile function are CreateFileA and CreateFileW. Yes, there are two, rather than one, versions for many Win32 API functions. The version that ends with 'A' accepts ANSI character strings, i.e., strings of regular chars. Another version ends with 'W' (the so called "wide" version) and accepts Unicode character strings, i.e., strings of wchar_ts. Both versions are implemented within the kernel32.dll module. The CreateFile macro will expand into the CreateFileW name if the UNICODE symbol is defined for a project, and into the CreateFileA name otherwise.

There are three families of Windows OS: MS-DOS/9x-based, Windows CE, and Windows NT.

  1. The MS-DOS/9x-based family, which includes Windows 1.0-3.11, 95, 98, and Windows ME, is based on the MS-DOS OS. Earlier versions of Windows: 1.0-2.0 are true 16-bit OSs. Newer versions: 3.0, 95, 98, and ME are the so called hybrid 16/32-bit OSs. They are 16-bit at low level, but capable of running 32-bit programs with certain limitations. One of these limitations is that only the ANSI version of the Win32 API functions exist on this platform. Currently, the MS-DOS/9x-based family is extinct and unsupported by Microsoft.
  2. The Windows NT family started from Window NT 3.1 in early 90's and includes Windows NT 4, Windows 2000, Windows XP, Window Vista, and Server flavors of these OSs. The Windows NT family is true 32-bit. It supports both ANSI and Unicode versions of the Win32 API. The Windows NT family operates with Unicode strings internally. The ANSI version of a Win32 API function is a mere wrapper around the real worker – the Unicode version of a function.
  3. The Windows CE family is intended for mobile and embedded devices. It is true 32-bit. Windows CE supports only the Unicode version of the Win32 API.

2.6.2 PSDK Solution: TCHARs

In order to avoid multiple PSDKs for different Windows families, Microsoft implemented generic text characters or TCHARs. TCHAR and other relevant macros are defined in the WinNT.h header file. The main idea is that the developer never uses the char or wchar_t types explicitly, but uses the TCHAR macro instead. The TCHAR macro will expand into the appropriate character type depending on whether the UNICODE symbol is defined for a build. In the same manner, instead of calling the 'A' or 'W' version of a Win32 API function, the developer calls a generic macro version, which will accommodate the actual character type at compile time.

// Generic code
//
LPCTSTR psz = TEXT("Hello World!");
TCHAR szDir[MAX_PATH] = { 0 };
GetCurrentDirectory(MAX_PATH, szDir);

// What actually happens if UNICODE symbol is NOT defined for a build
//
const char* psz = "Hello World!";
char szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryA(MAX_PATH, szDir);

// GetCurrentDirectoryA is a wrapper. It does the following:
// 1. Allocates temporary wchar_t buffer of given size.
// 2. Calls real worker: GetCurrentDirectoryW.
// 3. Calls WideCharToMultiByte in order to convert wchar_t string into
//    char string according to the active code page for a calling thread.
//    If some character cannot be converted, then it will be replaced with the '?' symbol.

// What actually happens if UNICODE symbol is defined for a build
//
const wchar_t* psz = L"Hello World!";
wchar_t szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryW(MAX_PATH, szDir);
// direct call to real worker; no wrappers in the middle

Using TCHARs allows a developer to maintain a single code line both for ANSI and Unicode builds. Nowadays, if you do not intend to target old Windows 9x/Me platforms, you can safely forget about TCHARs and use Unicode strings everywhere and make Unicode only builds. As an added bonus, Unicode applications can forget about code pages hustle and use the same logic for all strings.

The easy way to remember PSDK string declarations is to say them loud:

            L P C T STR = const TCHAR* 
            ^ ^ ^ ^ ^ 
            | | | | | 
Long -------+ | | | | 
Pointer to ---+ | | | 
Constant -------+ | | 
TCHAR ------------+ | 
STRing -------------+

Sometimes L - "Long" is omitted, since long and short pointers are obsolete for the Win32 platform. So, typedef can look like PTSTR = "pointer to TCHAR string", which is just TCHAR*.

Here are two screenshots of the same program. The first screenshot is taken when the program is built as ANSI. The second screenshot demonstrates the Unicode build of the program.

Naive ANSI program from the 20th century.

Naive ANSI program from the 20th century. All non-English characters are converted into illegible '?' symbols.

Modern program is aware of other languages.

A modern Unicode program is aware of other languages.

2.6.3 CRT Solution: _TCHARs

Following the Platform SDK logic, Microsoft introduced generic text mapping into its C runtime library. CRT uses an additional header file to define generic character macros: "tchar.h". In order to be compliant with the requirements of the C language standard, all non-standard names start from the underscore symbol. Also, CRT uses the shorter _T() macro for literal strings instead of the longer TEXT() macro, which is defined in "WinNT.h". CRT authors decided to advance the generic text notion even further, and as a result of this decision, now CRT distinguishes three modes for text characters:

  • SBCS - The Single Byte Character Set. The classic char is used for strings. One ASCII character fits within one char element. No symbol has to be defined for a project. This is the traditional C language approach that survived from the 1970's to our days. English characters are represented with values 0x00 - 0x7F; non-English characters are represented with values 0x80 - 0xFF. The actual meaning of non-English characters is interpreted according to the currently active code page.
  • _MBCS - The Multi-Byte Character Set. The classic char is used for strings. One multi-byte symbol may require one or two char elements. The _MBCS symbol has to be defined for a project. _MBCS is backward compatible with the SBCS mode, and was the default choice for new projects in MS Visual C++ until version 8.0 (2005). _MBCS was commonly used for Eastern Asian languages like Japanese, Korean, and Chinese. Now, _MBCS is being mostly ousted by Unicode characters. Using _MBCS was the only feasible option to handle Eastern Asian languages on Windows 9x/Me platforms.
  • _UNICODE - The Unicode Character Set. The wchar_t type is used for strings. One Unicode symbol occupies one wchar_t element, which is 16-bit on the Windows platform, and can represent up to 65535 different values. This is the default mode for the new projects starting from version 8.0 (2005) of MS Visual C++.

CRT uses the _MBCS and _UNICODE symbols definition in order to distinguish between multi-byte and Unicode builds.

Diagram #2: The Generic Text Mapping in CRT
Generic-text data type or name SBCS (_UNICODE, _MBCS not defined) _MBCS defined _UNICODE defined
_TCHAR char char wchar_t
_T("Hello, World!") "Hello, World!" "Hello, World!" L"Hello, World!"
Function name prefix and example:
_tcs
_tcscat, _tcsicmp
str, _str
strcat, _stricmp
_mbs
_mbscat, _mbsicmp
wcs, _wcs
wcscat, _wcsicmp
// Generic code; names are not standard, hence the leading underscore.
//
_TCHAR message[128] = _T("The time is: ");
_TCHAR* now = _tasctime(&tm);
_tcscat(message, now);
_putts(message);

// What happens if no symbol is defined at all (SBCS).
//
char message[128] = "The time is: ";
char* now = asctime(&tm);
strcat(message, now);
puts(message);

// What happens if _MBCS symbol is defined (Multi-byte Character Set);
// non-standard names are with the leading underscore.
//
char message[128] = "The time is: ";
char* now = asctime(&tm);
_mbscat(message, now);
puts(message);

// What happens if _UNICODE symbol is defined (Unicode Character Set);
// non-standard names are with the leading underscore.
//
wchar_t message[128] = L"The time is: ";
wchar_t* now = _wasctime(&tm);
wcscat(message, now);
_putws(message);

2.7 C++ Standard Library

The C++ programming language defines its own standard library. The C++ Standard Library specifies a set of classes and functions that facilitate common programming tasks.

Often, the C++ Standard Library is referred to as STL. This abbreviation belongs to pre-standard times, and stands for Standard Template Library. The latest revision of the C++ standard STL became a subset of the C++ Standard Library. However, the term STL is still ubiquitous and used as a synonym for the C++ Standard Library.

The International Organization for Standardization [^] is responsible for the C++ language standard and its library.

2.7.1 Contents of the C++ Standard Library

The C++ Standard Library may be divided into the following major parts:

  1. Containers, where common data structures are defined, such as vector, set, list, map etc.
  2. Iterators, which provide a uniform way to operate over standard containers.
  3. Algorithms, which implement common useful algorithms. Algorithms use iterators instead of working directly with containers. That's why the same implementation of an algorithm can be used with different standard containers.
  4. Allocators, which handle memory storage allocation/deallocation for elements in containers.
  5. Function Objects and Utilities, which are helpers to algorithms and containers.
  6. Streams, which provide a uniform object oriented way of input/output.
  7. C Runtime Library. Due to the backward compatibility of C++ with the C language, CRT is incorporated into the Standard C++ Library.

2.8 Cross-platform Development

Sometimes there is a requirement that a software program will run on several computer platforms. The developer may choose to develop as many separate code bases of software as there are target platforms. However, this approach is tedious and error prone. It is also wasteful and ineffective considering development resources since the same functionality must be implemented and maintained over and over again.

The common approach is to develop a single code base for all platforms and restrict the usage of platform-dependent API functions and vendor-specific standard library extensions. It makes development harder; however, in the long run, all platforms benefit from new features and bug fixes.

3. Code Reuse

There are two ways to incorporate the CRT and/or the C++ Library code into a program: static linking and dynamic linking. In the following discussion, I will use solely the CRT term to save typing; however, these concepts are relevant both to CRT and the C++ Standard Library.

3.1 Linking Statically

When the CRT/C++ Library is linked statically, then all its code is embedded into the resulting executable image. This technique has both advantages and disadvantages.

Advantages:

  1. Simple deployment. It is enough to copy a program to the destination computer to make it run. No need to worry about complicated scenarios of CRT/C++ Library deployment.
  2. No additional files. It can be very convenient for small utility applications to comprise just of one executable file. Such self-contained applications can be easily downloaded and redistributed without the risk of breaking its integrity.

Disadvantages:

  1. Not serviceable. New versions of a library and fixes of old versions are invisible for statically linked programs.
  2. Domino Effect of static linking. In the modern world, rarely can a program pull it out all by itself. Nowadays, software programs are complex, and heavily rely on third party components and libraries. Also, a software program itself is often divided into several loosely coupled modules. Using static linking to CRT in one of them greatly reduces interoperability between modules and forces developers to fall back on the lowest common denominator, i.e., the C interface with explicit methods for the acquisition and release of resources. The following section discusses the issue in more details.

3.1.1 CRT as a Black Box

The problem is that internal CRT objects cannot be shared with other CRT instances. The memory allocated in one instance of CRT must be freed in the same instance, the file opened on one instance of CRT must be operated and closed by functions from the same instance, etc. It happens because the CRT tracks the acquired resources internally. Any attempt to free a memory chunk or read from a file via FILE* that came from another CRT instance will lead to corruption of the internal CRT state and most likely to crash.

That's why linking CRT statically obligates a developer of a module to provide additional functions to release allocated resources and a user of a module to remember to call these functions in order to prevent resource leaks. No STL containers or C++ objects that use allocations internally can be shared across modules that link to the CRT statically. The following diagram illustrates the usage of a memory buffer allocated via a call to malloc.

Diagram #2: Using memory allocated by malloc from different modules

Diagram #2: Using memory allocated by `malloc' from different modules

In the above diagram, Module 1 is linked to the CRT statically, while Modules 2 and 3 are linked to the CRT dynamically. Modules 2 and 3 can pass CRT owned objects between them freely. For example, a memory chunk allocated with malloc in Module 3 can be freed in Module 2 with free. It is because both malloc and free calls will end up in the same instance of CRT.

On the other hand, Module 1 cannot let other modules to free its resources. Everything allocated in Module 1 must be freed by Module 1. It is because only Module 1 has access to the statically linked instance of the CRT. In the above sample, Module 2 must remember to call a function from Module 1 in order to properly release the acquired memory.

3.2 Linking Dynamically

When the CRT/C++ Library is linked dynamically, only small import libraries are linked with the resulting executable image. Import libraries contain instructions for where to find the actual implementation of the CRT/C++ Library functions. On a program's start, the system loader reads these instructions and loads the appropriate DLLs into the process' address space.

Advantages:

  1. Improved Modularity. As described in previous sections, the overall modularity of a program can benefit from dynamic linking. A program can be divided into several modules while being able to pass relatively high-level objects between them.
  2. Faster start. CRT DLLs are preloaded by the system on start. Then, when a program needs to load a CRT module, no actual load occurs. It enables the system to save physical memory and reduce page swapping.

Disadvantages:

  1. Complicated deployment. CRT libraries must be redistributed and properly installed in order for a program to work. It requires writing an additional setup program and thinking out a deployment strategy.

4. Summary

The article described relationships and dependencies between the Windows API, the C Runtime Library, and the Standard C++ Library. The Windows API is the lowest operational level for user mode programs. On top of the Windows API, there is the C Runtime Library, which encapsulates and hides the Operating System differences. The Standard C++ Library provides much more functionality and also includes the CRT as an integral part. Using only standardized functions and classes allows to write cross-platform applications. Such applications require rebuild only in order to run on a new platform. No code change is required.

Both the C Runtime Library and the Standard C++ Library can be linked to statically or dynamically, depending on the application's needs. Each method has its own advantages and drawbacks.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Alex Blekhman
Software Developer
Australia Australia
More than ten years of C++ native development, and counting.
 
Smile | :)

Comments and Discussions

 
Generallink error Pinmemberliaohaiwen8-Jan-10 3:10 
GeneralRe: link error PinmemberAlex Blekhman8-Jan-10 4:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140814.1 | Last Updated 22 Aug 2008
Article Copyright 2008 by Alex Blekhman
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid