![]() |
Languages »
C / C++ Language »
General
Beginner
License: The Code Project Open License (CPOL)
What Every Computer Programmer Should Know About Windows API, CRT and Standard C++ LibraryBy Alex BlekhmanThe article explains relationships and dependencies between Windows API, CRT and Standard C++ Library. |
C++, C++/CLI, C, Windows, Win32, Dev
|
||||||||||||
|
Advanced Search |
|
|
|
||||||||||||||||
The purpose of this article is to clear the essential points about Windows API, C Runtime Library (CRT) and Standard C++ Library (STL). It is not uncommon that even experienced developers have confusion and hold onto misconceptions about the relationship between these parts. If you ever wondered what is implemented on top of what and never had a time to figure it out, then keep reading.
The following diagram represents the relationship between WinAPI, CRT and STL.
| Application | User Mode | |||
| Standard C++ Library | ||||
| C Runtime Library | ||||
| Operating System | Kernel Mode | |||
| Hardware | ||||
Adjacent blocks can communicate with each other. What does it mean? Let's go from the bottom to the top.
Each hardware part exposes its own set of commands that enables operating system to control and communicate with it. An amount and complexity of commands varies from part to part. Often different vendors of the same part may provide additional commands beyond requirements of a common standard. Communication with countless hardware devices with endless variety of commands would be enormous toil for software writers if they had to access it directly. Here operating system comes to the rescue.
The purpose of the OS is to encapsulate all intricacies of underlying hardware and provide unified access interface to computer's parts. No application can access hardware directly. Only OS can access hardware. The part of the OS that accesses hardware is said to run in kernel mode.
Older OS'es like MS-DOS, for example, allowed programs to access hardware resources directly. Though it enabled software writers to make certain performance gains, in the long run this technique often made the software very brittle and incompatible with newer hardware parts.
OS exposes underlying machine resources by means of Application Programming Interface (API). API is a uniform set of functions that enables software developers to abstract from hardware peculiarities and focus on their own goals. Application cannot bypass an OS and access hardware resources directly. It is commonly said that applications run in user mode. MS Windows provides API as a set of C functions. C language is chosen as the lowest common denominator for software development under Windows platform.
MS distributes for free Platform Software Development Kit (Platform SDK or PSDK), which enables software developers to write Windows programs. PSDK contains:
For example, to open or create a file one calls CreateFile function, which is declared in "WinBase.h" header file and requires "Kernel32.lib" library to link with.
Names of Windows API functions follow Camel case naming convention and usually are easily distinguished by this. Names of macros and constants are conventionally in uppercase. Each function always has "Requirements" section on its documentation page where necessary headers, import libraries and supported OS versions are specified.
A Windows application can call any API function, provided application follows function's signature and links with appropriate import library (or gets function's address directly from implementing DLL with GetProcAddress call.
On top of OS API functions software vendors implement C Runtime Library (CRT). CRT is a standardized set of header files and C functions, which implement common tasks, such as string operations, some math functions, basic input/output etc. Usually the same vendor that makes C compiler also provides CRT implementation. International Organization for Standardization [^] is responsible for C language standard and its runtime library.
Theoretically, by using only standard C functions developer can ensure that the same code may be used to build and run a program under any platform where decent C compiler and CRT implementation exists. However, in practice software vendors include many useful extensions to standard library functions, which make developers' life easier but at a price of portability.
Names of CRT functions are in lower case. Names of macros and constants are in uppercase. Names of extensions begin from underscore character, for example _mkdir function. Each function always has "Requirements" section on its documentation page where its header is specified.
Actually, the above mentioned Win32 API names are not real names. These names are mere macros that defined in PSDK header files. So, when PSDK documentation mentions a function, for example CreateFile, then a developer should be aware that CreateFile is a macro. True names of the CreateFile function are CreateFileA and CreateFileW. Yes, there are two, rather than one, versions for many Win32 API functions. The version that ends with 'A' accepts ANSI character strings, i.e. strings of regular char's. Another version ends with 'W' (so called "wide" version) and accepts Unicode character strings, i.e. strings of wchar_t's. Both versions are implemented within kernel32.dll module. CreateFile macro will expand into CreateFileW name if UNICODE symbol is defined for a project and into CreateFileA name otherwise.
There are three families of Windows OS: MS-DOS/9x-based, Windows CE and Windows NT.
- The MS-DOS/9x-based family, which includes Windows 1.0-3.11, 95, 98 and Windows ME, is based on MS-DOS OS. Earlier version of Windows: 1.0-2.0 are true 16-bit OS. Newer versions: 3.0, 95, 98 and ME are so called hybrid 16/32-bit OS'es. They are 16-bit at low level, but capable of running 32-bit programs with certain limitations. One of these limitations is that only ANSI version of Win32 API functions exist on this platform. Currently, MS-DOS/9x-based family is extinct and unsupported by Microsoft.
- The Windows NT family started from Window NT 3.1 in early 90's and includes Windows NT 4, Windows 2000, Windows XP, Window Vista and Server flavors of these OS'es. Windows NT family is true 32-bit OS. It supports both ANSI and Unicode versions of Win32 API. Windows NT family operates with Unicode strings internally. The ANSI version of a Win32 API function is a mere wrapper around the real worker – the Unicode version of a function.
- The Windows CE family is intended for mobile and embedded devices. It is true 32-bit OS. Windows CE supports only Unicode version of Win32 API.
In order to avoid multiple PSDK's for different Windows families Microsoft implemented generic text characters or TCHAR's. TCHAR and other relevant macros are defined in WinNT.h header file. The main idea is that developer never uses char or wchar_t types explicitly, but uses TCHAR macro instead. TCHAR macro will expand into appropriate character type depends on whether UNICODE symbol is defined for a build. In the same manner, instead of calling 'A' or 'W' version of a Win32 API function, developer calls generic macro version, which will accommodate to actual character type at compile time.
// Generic code
//
LPCTSTR psz = TEXT("Hello World!");
TCHAR szDir[MAX_PATH] = { 0 };
GetCurrentDirectory(MAX_PATH, szDir);
// What actually happens if UNICODE symbol is NOT defined for a build
//
const char* psz = "Hello World!";
char szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryA(MAX_PATH, szDir);
// GetCurrentDirectoryA is a wrapper. It does the following:
// 1. Allocates temporary wchar_t buffer of given size.
// 2. Calls real worker: GetCurrentDirectoryW.
// 3. Calls WideCharToMultiByte in order to convert wchar_t string into
// char string according to the active code page for a calling thread.
// If some character cannot be converted, then it will be replaced with the '?' symbol.
// What actually happens if UNICODE symbol is defined for a build
//
const wchar_t* psz = L"Hello World!";
wchar_t szDir[MAX_PATH] = { 0 };
GetCurrentDirectoryW(MAX_PATH, szDir); // direct call to real worker; no wrappers in the middle
Using TCHAR's allows a developer to maintain single code line both for ANSI and Unicode builds. Nowadays, if you do not intend to target old Windows 9x/Me platforms you can safely forget about TCHAR's and use Unicode strings everywhere and make Unicode only builds. As an added bonus Unicode application can forget about code pages hustle and use the same logic for all strings.
The easy way to remember PSDK string declarations is to say them loud:L P C T STR = const TCHAR* ^ ^ ^ ^ ^ | | | | | Long -------+ | | | | Pointer to ---+ | | | Constant -------+ | | TCHAR ------------+ | STRing -------------+SometimesL- "Long" is omitted, since long and short pointers are obsolete for Win32 platform. So, typedef can look likePTSTR= "pointer toTCHARstring", which is justTCHAR*.
Here are two screenshots of the same progam. First screenshot is taken when the program is built as ANSI. Second screenshot demonstrates Unicode build of the program.
Naive ANSI program from the 20th century. All non-English characters are converted into illegible '?' symbols.
Modern Unicode program is aware of other languages.
Following Platform SDK logic Microsoft introduced generic text mapping into its C runtime library. CRT uses additional header file to define generic character macros: "tchar.h". In order to be compliant with requirements of the C language standard all non-standard names start from the underscore symbol. Also, CRT uses shorter _T() macro for literal strings instead of longer TEXT() macro, which is defined in "WinNT.h". CRT authors decided to advance generic text notion even further and as a result of this decision now CRT distinguishes three modes for text characters:
char is used for strings. One ASCII character fits within one char element. No symbol has to be defined for a project. This is the traditional C language approach that survived from the 1970's to our days. English characters are represented with values 0x00 - 0x7F; non-English characters are represented with values 0x80 - 0xFF. The actual meaning of non-English characters is interpreted according to currently active code page. char is used for strings. One multi-byte symbol may require one or two char elements. The _MBCS symbol has to be defined for a project. _MBCS is backward compatible with SBCS mode and was the default choice for new projects in MS Visual C++ until version 8.0 (2005). _MBCS was commonly used for Eastern Asian languages like Japanese, Korean and Chinese. Now _MBCS is being mostly ousted by Unicode characters. Using _MBCS was the only feasible option to handle Eastern Asian languages on Window 9x/Me platform. wchar_t type is used for strings. One Unicode symbol occupies one wchar_t element, which is 16-bit on Windows platform and can represent up to 65535 different values. This is the default mode for new projects starting from version 8.0 (2005) of MS Visual C++. CRT uses _MBCS and _UNICODE symbols definition in order to distinguish between multi-byte and Unicode builds.
| Generic-text data type or name | SBCS (_UNICODE, _MBCS not defined) | _MBCS defined | _UNICODE defined |
|---|---|---|---|
_TCHAR |
char |
char |
wchar_t |
_T("Hello, World!") |
"Hello, World!" |
"Hello, World!" |
L"Hello, World!" |
Function name prefix and example:_tcs_tcscat, _tcsicmp |
str, _strstrcat, _stricmp |
_mbs_mbscat, _mbsicmp |
wcs, _wcswcscat, _wcsicmp |
// Generic code; names are not standard, hence the leading underscore.
//
_TCHAR message[128] = _T("The time is: ");
_TCHAR* now = _tasctime(&tm);
_tcscat(message, now);
_putts(message);
// What happens if no symbol is defined at all (SBCS).
//
char message[128] = "The time is: ";
char* now = asctime(&tm);
strcat(message, now);
puts(message);
// What happens if _MBCS symbol is defined (Multi-byte Character Set);
// non-standard names are with the leading underscore.
//
char message[128] = "The time is: ";
char* now = asctime(&tm);
_mbscat(message, now);
puts(message);
// What happens if _UNICODE symbol is defined (Unicode Character Set);
// non-standard names are with the leading underscore.
//
wchar_t message[128] = L"The time is: ";
wchar_t* now = _wasctime(&tm);
wcscat(message, now);
_putws(message);
C++ programming language defines its own standard library. C++ Standard Library specifies a set of classes and functions that facilitate common programming tasks.
Often C++ Standard Library is referred as STL. This abbreviation belongs to pre-standard times and stands for Standard Template Library. Since latest revision of C++ standard STL became a subset of C++ Standard Library. However, the term STL is still ubiquitous and used as a synonym for C++ Standard Library.
International Organization for Standardization [^] is responsible for C++ language standard and its library.
C++ Standard Library may be divided into following major parts:
vector, set, list, map etc. Sometimes there is requirement that software program will run on several computer platforms. Developer may choose to develop as many separate code bases of software as there are target platforms. However, this approach is tedious and error prone. It is also wasteful and ineffective considering development resources since the same functionality must be implemented and maintained over and over again.
The common approach is to develop single code base for all platforms and restrict the usage of platform-dependent API functions and vendor-specific standard libraries extensions. It makes development harder, however, in the long run all platforms benefit from new features and bug fixes.
There are two ways to incorporate CRT and/or C++ Library code into a program: 1) static linking and 2) dynamic linking. In the following discussion I will use solely CRT term to save typing, however these concepts are relevant both to CRT and C++ Standard Library.
When CRT/C++ Library linked statically, then all its code is embedded into resulting executable image. This technique has both advantages and disadvantages.
Advantages:
Disadvantages:
The problem is that internal CRT objects cannot be shared with other CRT instances. The memory allocated in one instance of CRT must be freed in the same instance, the file opened on one instance of CRT must be operated and closed by functions from the same instance, etc. It happens because CRT tracks acquired resources internally. Any attempt to free memory chunk or read from file via FILE* that came from other CRT instance will lead to corruption of internal CRT state and most likely to crash.
That's why linking CRT statically obligates a developer of a module to provide additional functions to release allocated resources and a user of a module to remember to call these functions in order to prevent resource leaks. No STL containers or C++ objects that use allocations internally can be shared across modules that link to CRT statically. Following diagram illustrates the usage of a memory buffer allocated via call to malloc.
malloc from different modules
In the above diagram Module 1 is linked to CRT statically, while Modules 2 and 3 are linked to CRT dynamically. Modules 2 and 3 can pass CRT owned objects between them freely. For example, a memory chunk allocated with malloc in Module 3 can be freed in Module 2 with free. It is because both malloc and free calls will end up in the same instance of CRT.
On the other hand, Module 1 cannot let other modules to free its resources. Everything allocated in Module 1 must be freed by Module 1. It is because only Module 1 has an access to statically linked instance of CRT. In the above sample Module 2 must remember to call a function from Module 1 in order to properly release acquired memory.
When CRT/C++ Library linked dynamically, then only small import libraries are linked with resulting executable image. Import libraries conatin instructions where to find actual implementation of CRT/C++ Library functions. On program's start system loader reads these instructions and loads appropriate DLL's into process' address space.
Advantages:
Disadvantages:
The article described relationships and dependencies between Windows API, C Runtime Library and Standard C++ Library. Windows API is a lowest operational level for user mode programs. On top of Windows API there is C Runtime Library, which encapsulates and hides operating system differences. Standard C++ Library provides much more functionality and also includes CRT as an integral part. Using only standardized functions and classes allows to write cross-platform applications. Such applications require rebuild only in order to run on new platform. No code change is required.
Both C Runtime Library and Standard C++ Library can be linked to statically or dynamically, depending on application's needs. Each method has its own advantages and drawbacks.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 22 Aug 2008 Editor: |
Copyright 2008 by Alex Blekhman Everything else Copyright © CodeProject, 1999-2009 Web10 | Advertise on the Code Project |