Click here to Skip to main content
Click here to Skip to main content

Using the Visual C/C++ Compiler (1)

, 13 Mar 2012
Rate this:
Please Sign up or sign in to vote.
This is the first article of a series describing internals of the Visual C/C++ compiler

Introduction

With this article I would like to look a bit behind the scenes of the Visual C/C++ compiler. It for sure is interesting how the individual compiler switches influence the generated code.

Background

I assume the reader to be familiar with the basics of the Visual C/C++ compiler. Also the reader should not be affraid of using the compiler from the command line.

Enable Read-Only String Pooling

The command line help denotes the /GF compiler switch by "enable read-only string pooling". What exactly does this mean?

In a nutshell it means that identical string literals occuring in several places in your source code will be translated to a single data item in the binary image. Thus the /GF option will help you optimize your code as it will produce smaller binaries. Let us look at the following code snippet:

char* str1 = "Bart Simpson";
char* str2 = "Milhouse van Houten";

void foo() {
   static char* s1 = "Bart Simpson";
   static char* s2 = "Milhouse van Houten";
   //...
}

int main() {
   //...
}

Without specifying the /GF option the Visual C/C++ compiler will generate code like this:

_DATA   SEGMENT
$SG855  DB    'Bart Simpson', 00H
        ORG $+3
str1    DQ    FLAT:$SG855
$SG857  DB    'Milhouse van Houten', 00H
        ORG $+4
str2    DQ    FLAT:$SG857
$SG862  DB    'Bart Simpson', 00H
        ORG $+3
?s1@?1??foo@@9@9 DQ FLAT:$SG862
$SG865  DB    'Milhouse van Houten', 00H
        ORG $+4
?s2@?1??foo@@9@9 DQ FLAT:$SG865
_DATA   ENDS

As you can see each string literal in the C++ will be placed into the binary image. Even if some string literals are identical.

SECTION HEADER #3
   .data name
    21A0 virtual size
    9000 virtual address (0000000140009000 to 000000014000B19F)
    1000 size of raw data
    7800 file pointer to raw data (00007800 to 000087FF)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
C0000040 flags
         Initialized Data
         Read Write
RAW DATA #3
  0000000140009000: 42 61 72 74 20 53 69 6D 70 73 6F 6E 00 00 00 00  Bart Simpson....
  0000000140009010: 00 90 00 40 01 00 00 00 4D 69 6C 68 6F 75 73 65  ...@....Milhouse
  0000000140009020: 20 76 61 6E 20 48 6F 75 74 65 6E 00 00 00 00 00   van Houten.....
  0000000140009030: 18 90 00 40 01 00 00 00 42 61 72 74 20 53 69 6D  ...@....Bart Sim
  0000000140009040: 70 73 6F 6E 00 00 00 00 38 90 00 40 01 00 00 00  pson....8..@....
  0000000140009050: 4D 69 6C 68 6F 75 73 65 20 76 61 6E 20 48 6F 75  Milhouse van Hou
  0000000140009060: 74 65 6E 00

Now let us rebuild the code with the /GF compiler switch. As we can see from the assembly listing the compiler will generate completely different code:

CONST    SEGMENT
   ??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@ DB 'Milhouse van Houten', 00H
CONST    ENDS

CONST    SEGMENT
   ??_C@_0N@MPADFJH@Bart?5Simpson?$AA@ DB 'Bart Simpson', 00H
CONST    ENDS

_DATA    SEGMENT
   str1    DQ    FLAT:??_C@_0N@MPADFJH@Bart?5Simpson?$AA@
   str2    DQ    FLAT:??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@
   ?s1@?1??foo@@9@9 DQ FLAT:??_C@_0N@MPADFJH@Bart?5Simpson?$AA@ 
   ?s2@?1??foo@@9@9 DQ FLAT:??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@
_DATA    ENDS

We can make two observations:

  • several identical string literals will be translated into a single data item
  • the string data will be placed in a read-only data segment
SECTION HEADER #2
  .rdata name
    2560 virtual size
    6000 virtual address (0000000140006000 to 000000014000855F)
    2600 size of raw data
    5200 file pointer to raw data (00005200 to 000077FF)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40000040 flags
         Initialized Data
         Read Only

RAW DATA #2
  0000000140006000: 48 81 00 00 00 00 00 00 5A 81 00 00 00 00 00 00  H.......Z.......
  ...
  0000000140006210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  0000000140006220: 4D 69 6C 68 6F 75 73 65 20 76 61 6E 20 48 6F 75  Milhouse van Hou
  0000000140006230: 74 65 6E 00 00 00 00 00 42 61 72 74 20 53 69 6D  ten.....Bart Sim
  0000000140006240: 70 73 6F 6E 00

Points of Interest

The concept of strings in the C/C++ programming language even confuses advanced developers. One can find countless questions related to strings in C/C++. Here just some notes regarding the code snippet in this article.

According to the Annotated C++ Reference Manual a string literal has type char[] ("array of char"). An attempt to modify a string literal results in undefined behavior. The C++ language refused to make string literals of type const char[] for compatibility reasons with classic C.

So let us see what the compiler does when we type the strings to char[] instead of char*.

char str1[] = "Bart Simpson";
char str2[] = "Milhouse van Houten";

void foo() {
   static char s1[] = "Bart Simpson";
   static char s2[] = "Milhouse van Houten";
   //...
}

int main() {
   //...
}

Surprisingly the /GF option in this situation has no effect! Multiple data items will be placed in the data segment of the binary image. This is not a real issue with the compiler. Just one thing one would expect a professional compiler be able to do...

History

  • March, 2012 - Article first published.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

vl106

Germany Germany
No Biography provided

Comments and Discussions

 
GeneralIs this an article? [modified] Pinmemberpwasser13-Mar-12 12:41 
GeneralRe: Is this an article? PinmemberAssaf Levy18-Mar-12 23:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 13 Mar 2012
Article Copyright 2012 by vl106
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid