Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C Design compiler
Hi Smile | :)
I'm developing a C like compiler and I whant to know, how it's work the include system... I mean, how the compiler works with the system include
 
The compiler read the entire code, and stores all includes found in one list and parser the includes, after finish the reading the current code?
 
// file main.c
#include <stdio.h> // store in one list

// continue the parse ...
int main()
{
    return 0;
}
// now, read the includes
// after finish the includes parse, gen code of sources

// just a sample
// file stdio.h
#include <types.h> // store in list
#include <bios.h>  // store in list

void printf(...)
{
}
 
void scanf(...)
{
}
 
Btw, I have developd an system ( only test ) to read the includes and, stop the parse, to read the include... ( it's a disgusting code, but, work... )
( link of sample )
https://gist.github.com/4399601[^]
Btw, What is the best way to read the includes... and work with includes files ??
Posted 28-Dec-12 6:51am
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Most compilers use a preprocessor to read the included headers and then convert all references, macros etc. into their final form. The converted output of the preprocessor is then fed into the compiler for conversion to object code. There are quite a few free tools available on the internet to help with building compilers, and Google should help you find them.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

If you want to parse and process any language, you probably need to read some text books first.
E.g. C-Programming Language[^], or C11 Language Draft[^], etc.
 
E.g. the latter specifies the #include semantics as follows:
[...]
6.10.2 Source file inclusion
  1. Constraints
    A #include directive shall identify a header or source file that can be processed by the
    implementation.
  2. Semantics
    A preprocessing directive of the form
    # include <h-char-sequence> new-line
    searches a sequence of implementation-defined places for a header identified uniquely by
    the specified sequence between the < and > delimiters, and causes the replacement of that
    directive by the entire contents of the header. How the places are specified or the header
    identified is implementation-defined.
  3. A preprocessing directive of the form
    # include "q-char-sequence" new-line
    causes the replacement of that directive by the entire contents of the source file identified
    by the specified sequence between the " delimiters. The named source file is searched for in an implementation-defined manner. If this search is not supported, or if the search
    fails, the directive is reprocessed as if it read
    # include <h-char-sequence> new-line
    with the identical contained sequence (including > characters, if any) from the original
    directive.
  4. [...]

 
Furthermore, some fallback is defined to avoid infinte inclusion loop (e.g. an implementaiton is not required to support more than 15 nested including levels, etc.).
 
Usually, including <...> searches first for system headers, "..." searches in the provided -I... paths first. But as stated above, it is implementation dependent. Check your target compiler for the implemented semantics of these includes.
 
Please note that the #include does not add the files in some "magic" list - it simply replaces the #include line literally by the content of the referenced file (as if you had written the content of the included file at the location of the #include line).
 
Cheers
Andi
 
PS: if you use with gcc the -E -C options, you see the effect of the inclusion and other preprocessing.
  Permalink  
v4
Comments
Alexandre Bencz at 28-Dec-12 15:42pm
   
Oh :)
Thanks :)
btw, in what moment I parse the includes ?
Andreas Gieriet at 28-Dec-12 16:10pm
   
Read the C11 Draft. Section 5.1.1.2 specifies exactly what's to do in what sequence. I.e.
[...]5.1.1.2 Translation phases
1 The precedence among the syntax rules of translation is specified by the following
phases.
1. Physical source file multibyte characters are mapped, in an implementationdefined
manner, to the source character set (introducing new-line characters for
end-of-line indicators) if necessary. Trigraph sequences are replaced by
corresponding single-character internal representations.
2. Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part
of such a splice. A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any such
splicing takes place.
3. The source file is decomposed into preprocessing tokens and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
4. Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
5. Each source character set member and escape sequence in character constants and
string literals is converted to the corresponding member of the execution character
set; if there is no corresponding member, it is converted to an implementationdefined
member other than the null (wide) character.
6. Adjacent string literal tokens are concatenated.
7. White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
8. All external object and function references are resolved. Library components are
linked to satisfy external references to functions and objects not defined in the
current translation. All such translator output is collected into a program image
which contains information needed for execution in its execution environment.
[...]

Cheers
Andi
nv3 at 28-Dec-12 17:26pm
   
My 5. (Looks like some compiler class has started somewhere in the world :-)
Andreas Gieriet at 28-Dec-12 17:37pm
   
Thanks again for your 5!
Yeah, looks like there is something going on ;-)
Cheers
Andi

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 6,165
1 DamithSL 4,658
2 Maciej Los 4,107
3 Kornfeld Eliyahu Peter 3,649
4 Sergey Alexandrovich Kryukov 3,382


Advertise | Privacy | Mobile
Web02 | 2.8.141220.1 | Last Updated 28 Dec 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100