Skip to main content
Email Password   helpLost your password?

TABLE OF CONTENTS

  1. Introduction
  2. Prerequisites
  3. Brief introduction to the ELF format
  4. ELF loading
  5. Building up our lab...
  6. PLT: a practical example
  7. Conclusions
  8. References

In the last few weeks I've found myself playing around with Linux for various reasons, so the first thing that I thought was to apply some of the well known Win32-methods to this operating system, most of them from the point of view of a reverser.
This is the first part of a two-part article which will deal with code injection under Linux. The technique presented here resembles a well known win32 technique: inject a DLL (shared object on Linux and UNIX-like systems) into a running process, thus being able to hook some of the process' imported functions. There are two ways which may lead us to code injection: 1) using the LD_PRELOAD method (this requires to restart the process in which we are injecting our shared object) 2) injecting a stub into the target process which loads the required library. Of course as you may have guessed, the second way requires the presence of "libdl" in the address space of the target process, but that's a reasonable requirement, since most modern softwares for Linux and UNIXes are quite complex, and most of them will have libdl mapped into their address space. We will use the second method, since our goal is to inject a shared object into a running process.
Regarding the hook technique, we will use PLT redirection, which is a well known technique under Linux/UNIX to hook functions imported from other libraries in a simple and elegant way.

This first part will teach you the basics of the ELF format (the standard object format on most Linux and UNIX-like operating systems) with some code examples, so it will be an introduction to the next part, which will show you the actual technique, but this one is a required reading, since it will build up the required concepts.

Prerequisites

Before moving further down the article it's best that you take a look at the ELF manuals. They are a required reading to understand most of what I'm writing, I'll give you just a brief introduction to the ELF format, but there so much to write about it that we might forget our goal with this article (which IS NOT an article about the ELF format, but it is about shared object injection). In various parts of the text I will recall the manuals, so let's see "how" you should read them: you have to download two manuals, the first is the general one, the second is the x86-specific one. If you take a look at the general one, you will see that it miss some parts, which you will find in the specific one (there is one additional ELF manual for each platform which supports the ELF format), missing parts starts with a marker, so you will notice them.

You should have a good knowledge of the C programming language and you should know how to use gcc and gdb. A minimal knowledge of x86 assembly language is required. This is the reference system:

Brief troduction to the ELF format

In this chapter the principal aspects of the ELF format will be covered from a programmer's point of view. This chapter will not be a raw copy of the ELF manuals, but it will just introduce you to some aspects of the format, so you ARE REQUIRED to read the manuals. This chapter will deal with the various structures of the ELF format and how to read them. Dynamic linking will be demanded to the next chapter.

Historical notes

The ELF format (which is an acronym for Executable and Linking Format) is one of the many formats that describes the structure of an object file (a file which contains compiled code), from a logical and/or physical point of view. Object file formats similar to ELF includes: the PE format (Portable Executable, used mainly on Microsoft platforms), the Mach-O (used on OSX), the COFF and the a.out. From an historical point of view, the COFF and the a.out where very influential for the development of the the newer formats, such as the PE (which derives from COFF) and the ELF (which takes some of its aspects from the a.out format). As you may guess from the name, the ELF format allows both executable and libraries to be described

The ELF format was introduced to replace the a.out and COFF formats on UNIX systems, becoming a standard as of 1999. Today most non-Microsoft platforms uses the ELF format, being well suited to adapt to most platforms, moreover the "ld" linker can be instructed (through the use of custom ld-scripts) to build custom ELF files, which can meet almost every requirement you may need. Examples of non-UNIX platforms using the ELF formats inclludes: PSP, PS2, PS3, Wii, Dreamcast, BeOS, Haiku, AmigaOS, MorphOS, SymbianOS (which actually uses a format derived from ELF). This flexibility comes from the fact that most of the ELF main structures are not bound to a specific platform (e.g. the format (fields and their size) of the relocation structure is independent of the platform used, but its contents are highly platform-dependent).

ELF structure

The ELF structure is similar to others image (which is a synonym for object file) formats: it has an header, followed by sections which decribes the content of various segments of memory. In the following paragraphs we will make use of the "readelf" utility on a test executable, namely "/bin/ls" (which you should have in your system...).

The ELF header

Working with a file format, the first thing that comes into mind is its header, which is the most important part of a file. Regarding the ELF format, the header contains the architecture for which the file was built, its endianness (the byte order) which in most of cases is bound to the architecture (there are some exceptions: modern ARMs and PPCs can be set either to big-endian or to little-endian), the number of sections contained in the file, the number of sgments (a single segment contains one or more sections), etc... . Using the C structure syntax (in the same way as the manual) lets take a look at the header:

#define EI_NIDENT (16)

typedef struct
{
  unsigned char e_ident[EI_NIDENT];    /* Magic number and other info */
  Elf32_Half    e_type;            /* Object file type */
  Elf32_Half    e_machine;        /* Architecture */
  Elf32_Word    e_version;        /* Object file version */
  Elf32_Addr    e_entry;        /* Entry point virtual address */
  Elf32_Off     e_phoff;        /* Program header table file offset */
  Elf32_Off     e_shoff;        /* Section header table file offset */
  Elf32_Word    e_flags;        /* Processor-specific flags */
  Elf32_Half    e_ehsize;        /* ELF header size in bytes */
  Elf32_Half    e_phentsize;        /* Program header table entry size */
  Elf32_Half    e_phnum;        /* Program header table entry count */
  Elf32_Half    e_shentsize;        /* Section header table entry size */
  Elf32_Half    e_shnum;        /* Section header table entry count */
  Elf32_Half    e_shstrndx;        /* Section header string table index */
} Elf32_Ehdr;
where the meaning of the importat fields is: as you may have seen, header fields are defined using custom data types, e.g. Elf32_Word, Elf32_Half, etc..., allowing to identify in a unique way the size of a field. Let's clarify this further: the size of an ELF word (binary word)) is ALWAYS 32bit (on both 32bit and 64bit architectures), so Elf32_Word is an unsigned 32bit integer, the same goes for Elf64_Word. This is true for data words (which are always 32bit for the ELF format), but not for address (which of course a 32bit on a 32bit architecture and 64bit on a 64bit one), so there are the address types: Elf32_Address, Elf32_Offset which are unsigned 32bit integers; on a 64bit architecture of course there will be Elf64_Address and Elf64_Offset which are unsigned 64bit integers. Elf32_Half and Elf64_Half will always be half the size of an ELF word, that is they will be unsigned 16bit integers. Here are the typedefs:
/* Type for a 16-bit quantity.  */
typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;

/* Types for signed and unsigned 32-bit quantities.  */
typedef uint32_t Elf32_Word;
typedef    int32_t  Elf32_Sword;
typedef uint32_t Elf64_Word;
typedef    int32_t  Elf64_Sword;

/* Types for signed and unsigned 64-bit quantities.  */
typedef uint64_t Elf32_Xword;
typedef    int64_t  Elf32_Sxword;
typedef uint64_t Elf64_Xword;
typedef    int64_t  Elf64_Sxword;

/* Type of addresses.  */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;

/* Type of file offsets.  */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;

/* Type for section indices, which are 16-bit quantities.  */
typedef uint16_t Elf32_Section;
typedef uint16_t Elf64_Section;

/* Type for version symbol information.  */
typedef Elf32_Half Elf32_Versym;
typedef Elf64_Half Elf64_Versym;
Let's take a look at the output of "readelf" for our binary:
quake2@quake2-desktop:~$ readelf -h /bin/ls
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8049b20
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95096 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         28
  Section header string table index: 27
from this output it is clear that "/bin/ls" is a 32bit executable for x86 platform.

Program header and segments

The program header is a foundamental structure of the ELF format. A program header describes a segment, which holds one or more sections. It contains instructions for the loader on how to map a segment of the file into memory. Some program headers have a special meaning, which will be discussed later. It is importat to note that an ELF image without a program header CANNOT be loaded into memory by the system loader (an intermediate object file (usually with .o extension), does not need a program header, since it is not loaded into memory), so a valid ELF image must have at least one program header of type PT_LOAD. Once an ELF image is fully loaded into memory, the program header is the only structure which contains reliable informations, section headers loose their meaning (they might be present in the memory image, but you must not rely on them). Here's the structure of the program header:
typedef struct
{
  Elf32_Word    p_type;            /* Segment type */
  Elf32_Off    p_offset;        /* Segment file offset */
  Elf32_Addr    p_vaddr;        /* Segment virtual address */
  Elf32_Addr    p_paddr;        /* Segment physical address */
  Elf32_Word    p_filesz;        /* Segment size in file */
  Elf32_Word    p_memsz;        /* Segment size in memory */
  Elf32_Word    p_flags;        /* Segment flags */
  Elf32_Word    p_align;        /* Segment alignment */
} Elf32_Phdr;
let's remark the fact that, within an image file, the program header holds the important data to allow the loader to load the image, section headers are just helpers and might not be present at all into an image file.
Let's take a look at the output from "readelf":
quake2@quake2-desktop:~$ readelf -l /bin/ls

Elf file type is EXEC (Executable file)
Entry point 0x8049b20
There are 9 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
  INTERP         0x000154 0x08048154 0x08048154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x16eb4 0x16eb4 R E 0x1000
  LOAD           0x016ef0 0x0805fef0 0x0805fef0 0x003a0 0x0081c RW  0x1000
  DYNAMIC        0x016f04 0x0805ff04 0x0805ff04 0x000e8 0x000e8 RW  0x4
  NOTE           0x000168 0x08048168 0x08048168 0x00020 0x00020 R   0x4
  GNU_EH_FRAME   0x016dec 0x0805edec 0x0805edec 0x0002c 0x0002c R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  GNU_RELRO      0x016ef0 0x0805fef0 0x0805fef0 0x00110 0x00110 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn 
          .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag 
   06     .eh_frame_hdr 
   07     
   08     .ctors .dtors .jcr .dynamic .got 
from this output it is evident that a single program header describes a single segment, which holds one or more sections (e.g. program headers 2,3,8), moreover a section may appear into multiple program headers (e.g. the .dynamic section appears in 8 and in 4). Take a look at the fact that PT_LOAD segments (which contains code and data sections) are aligned on a page-bounary basis.

The section header

The section header holds informations which describes a portion of the on-disk structure of the file. A section might be as small as a few bytes (e.g. .interp section) or it may be as big as the .text section, which holds almost all the executable code. Sections are a way to logically subdivide an ELF file into smaller portions, if you're familiar with Microsoft's PE, then an ELF section has nothing to do with a PE section, but a PE section is the same thing as an ELF segment. Here's the structure:
typedef struct
{
  Elf32_Word    sh_name;        /* Section name (string tbl index) */
  Elf32_Word    sh_type;        /* Section type */
  Elf32_Word    sh_flags;        /* Section flags */
  Elf32_Addr    sh_addr;        /* Section virtual addr at execution */
  Elf32_Off    sh_offset;        /* Section file offset */
  Elf32_Word    sh_size;        /* Section size in bytes */
  Elf32_Word    sh_link;        /* Link to another section */
  Elf32_Word    sh_info;        /* Additional section information */
  Elf32_Word    sh_addralign;        /* Section alignment */
  Elf32_Word    sh_entsize;        /* Entry size if section holds table */
} Elf32_Shdr;
So let's see the usual output from "readelf":
quake2@quake2-desktop:~$ readelf -S /bin/ls
There are 28 section headers, starting at offset 0x17378:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048154 000154 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            08048168 000168 000020 00   A  0   0  4
  [ 3] .hash             HASH            08048188 000188 000330 04   A  5   0  4
  [ 4] .gnu.hash         GNU_HASH        080484b8 0004b8 00005c 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          08048514 000514 000690 10   A  6   1  4
  [ 6] .dynstr           STRTAB          08048ba4 000ba4 0004af 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          08049054 001054 0000d2 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         08049128 001128 0000d0 00   A  6   3  4
  [ 9] .rel.dyn          REL             080491f8 0011f8 000028 08   A  5   0  4
  [10] .rel.plt          REL             08049220 001220 0002e8 08   A  5  12  4
  [11] .init             PROGBITS        08049508 001508 000030 00  AX  0   0  4
  [12] .plt              PROGBITS        08049538 001538 0005e0 04  AX  0   0  4
  [13] .text             PROGBITS        08049b20 001b20 01145c 00  AX  0   0 16
  [14] .fini             PROGBITS        0805af7c 012f7c 00001c 00  AX  0   0  4
  [15] .rodata           PROGBITS        0805afa0 012fa0 003e4c 00   A  0   0 32
  [16] .eh_frame_hdr     PROGBITS        0805edec 016dec 00002c 00   A  0   0  4
  [17] .eh_frame         PROGBITS        0805ee18 016e18 00009c 00   A  0   0  4
  [18] .ctors            PROGBITS        0805fef0 016ef0 000008 00  WA  0   0  4
  [19] .dtors            PROGBITS        0805fef8 016ef8 000008 00  WA  0   0  4
  [20] .jcr              PROGBITS        0805ff00 016f00 000004 00  WA  0   0  4
  [21] .dynamic          DYNAMIC         0805ff04 016f04 0000e8 08  WA  6   0  4
  [22] .got              PROGBITS        0805ffec 016fec 000008 04  WA  0   0  4
  [23] .got.plt          PROGBITS        0805fff4 016ff4 000180 04  WA  0   0  4
  [24] .data             PROGBITS        08060180 017180 000110 00  WA  0   0 32
  [25] .bss              NOBITS          080602a0 017290 00046c 00  WA  0   0 32
  [26] .gnu_debuglink    PROGBITS        00000000 017290 000008 00      0   0  1
  [27] .shstrtab         STRTAB          00000000 017298 0000df 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
as you can see from the output, the section table starts with a null entry. Some sections have an address field of equal to 0: these sections will not be mapped into memory (take a look at the previous program header table, you won't find them).
From the output you can see that there are two string tables. Take a look at their addresses, one is different from zero, the other is zero, this means that one string table will be discarded while mapping the file in memory; the one which is not discarded will holds the names of the dynamic symbols (.dynsym), which are used by the dynamic linker to resolve runtime relocations. Take a note of the index of the last section, is the same index that you found in the field e_shstrndx of the ELF header: the .shstrtab section holds the name of the ELF sections.

String table

The string table strictly speaking is not a structure, but rather is an un-ordered collection of strings. There's not much to say about string tables, just keep in mind that when you see a name field inside an ELF structure, it will always be an index into a string table. Which string table? It depends on the context, but whenever a string table is encountered there is always a way to know it.
Let's take a look at a memory dump of the last section:
quake2@quake2-desktop:~$ readelf -x 27 /bin/ls

Hex dump of section '.shstrtab':
  0x00000000 002e7368 73747274 6162002e 696e7465 ..shstrtab..inte
  0x00000010 7270002e 6e6f7465 2e414249 2d746167 rp..note.ABI-tag
  0x00000020 002e676e 752e6861 7368002e 64796e73 ..gnu.hash..dyns
  0x00000030 796d002e 64796e73 7472002e 676e752e ym..dynstr..gnu.
  0x00000040 76657273 696f6e00 2e676e75 2e766572 version..gnu.ver
  0x00000050 73696f6e 5f72002e 72656c2e 64796e00 sion_r..rel.dyn.
  0x00000060 2e72656c 2e706c74 002e696e 6974002e .rel.plt..init..
  0x00000070 74657874 002e6669 6e69002e 726f6461 text..fini..roda
  0x00000080 7461002e 65685f66 72616d65 5f686472 ta..eh_frame_hdr
  0x00000090 002e6568 5f667261 6d65002e 63746f72 ..eh_frame..ctor
  0x000000a0 73002e64 746f7273 002e6a63 72002e64 s..dtors..jcr..d
  0x000000b0 796e616d 6963002e 676f7400 2e676f74 ynamic..got..got
  0x000000c0 2e706c74 002e6461 7461002e 62737300 .plt..data..bss.
  0x000000d0 2e676e75 5f646562 75676c69 6e6b00   .gnu_debuglink.
it's just a collection of strings (the value on the left-most column is an offset), plain and simple.

Let's write come code...

We've reached the point where we can start writing some code, which will show the ELF header, and will perform sanity checks on it. The full source can be found in the attachment.
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <elf.h>

int elf_is_valid(Elf32_Ehdr *elf_hdr)
{
    if( (elf_hdr->e_ident[EI_MAG0] != 0x7F) || 
        (elf_hdr->e_ident[EI_MAG1] != 'E') ||
        (elf_hdr->e_ident[EI_MAG2] != 'L') ||
        (elf_hdr->e_ident[EI_MAG3] != 'F') )
    {
         return 0;
    }

    if(elf_hdr->e_ident[EI_CLASS] != ELFCLASS32)
        return 0;

    if(elf_hdr->e_ident[EI_DATA] != ELFDATA2LSB)
        return 0;

    return 1;
}

static char *elf_types[] = {
    "ET_NONE",
    "ET_REL",
    "ET_EXEC",
    "ET_DYN",
    "ET_CORE",
    "ET_NUM"
};

char *get_elf_type(Elf32_Ehdr *elf_hdr)
{
    if(elf_hdr->e_type > 5)
        return NULL;

    return elf_types[elf_hdr->e_type];
}

int print_elf_header(Elf32_Ehdr *elf_hdr)
{
    char *sz_elf_type = NULL;

    if(!elf_hdr)
        return 0;

    printf("ELF header information\n");

    sz_elf_type = get_elf_type(elf_hdr);
    if(sz_elf_type)
        printf("- Type: %s\n", sz_elf_type);
    else
        printf("- Type: %04x\n", elf_hdr->e_type);

    printf("- Version: %d\n", elf_hdr->e_version);
    printf("- Entrypoint: 0x%08x\n", elf_hdr->e_entry);
    printf("- Program header table offset: 0x%08x\n", elf_hdr->e_phoff);
    printf("- Section header table offset: 0x%08x\n", elf_hdr->e_shoff);
    printf("- Flags: 0x%08x\n", elf_hdr->e_flags);
    printf("- ELF header size: %d\n", elf_hdr->e_ehsize);
    printf("- Program header size: %d\n", elf_hdr->e_phentsize);
    printf("- Program header entries: %d\n", elf_hdr->e_phnum);
    printf("- Section header size: %d\n", elf_hdr->e_shentsize);
    printf("- Section header entries: %d\n", elf_hdr->e_shnum);
    printf("- Section string table index: %d\n", elf_hdr->e_shstrndx);

    return 1;
}

int main(int argc, char *argv[])
{
    int fd_elf = -1;
    u_char *p_base = NULL;
    struct stat elf_stat;
    Elf32_Ehdr *p_ehdr = NULL;

    if(argc < 2)
    {
        printf("Usage: %s \n", argv[0]);
        return 1;
    }

    fd_elf = open(argv[1], O_RDONLY);
    if(fd_elf == -1)
    {
        fprintf(stderr, "Could not open %s: %s\n", argv[1], strerror(errno));
        return 1;
    }

    if(fstat(fd_elf, &elf_stat) == -1)
    {
        fprintf(stderr, "Could not stat %s: %s\n", argv[1], strerror(errno));
        close(fd_elf);
        return 1;
    }

    p_base = (u_char *)calloc(sizeof(u_char), elf_stat.st_size);
    if(!p_base)
    {
        fprintf(stderr, "Not enough memory\n");
        close(fd_elf);
        return 1;
    }

    if(read(fd_elf, p_base, elf_stat.st_size) != elf_stat.st_size)
    {
        fprintf(stderr, "Error while reading file: %s\n", strerror(errno));
        free(p_base);
        close(fd_elf);
        return 1;
    }
    
    close(fd_elf);

    p_ehdr = (Elf32_Ehdr *)p_base;
    if(elf_is_valid(p_ehdr))
        print_elf_header(p_ehdr);
    else
        fprintf(stderr, "Invalid ELF file\n");

    free(p_base);
    return 0;
}
the code is quite simple, as you can see the ELF header is at the beginning of the file, so we just declare a pointer to the start of the allocated buffer (which contains the file). Take a look at the elf_is_valid function, which performs sanity checks.

Let's see another source, which this time will show also the section header table and the program header table:

static char *ptypes[] = {
        "PT_NULL",
        "PT_LOAD",
        "PT_DYNAMIC",
        "PT_INTERP",
        "PT_NOTE",
        "PT_SHLIB",
        "PT_PHDR"
};

int print_program_header(Elf32_Phdr *phdr, uint index)
{
    if(!phdr)
        return 0;

    printf("Program header %d\n", index);
    if(phdr->p_type <= 6)
        printf("- Type: %s\n", ptypes[phdr->p_type]);
    else
        printf("- Type: %08x\n", phdr->p_type);

    printf("- Offset: %08x\n", phdr->p_offset);
    printf("- Virtual Address: %08x\n", phdr->p_vaddr);
    printf("- Physical Address: %08x\n", phdr->p_paddr);
    printf("- File size: %d\n", phdr->p_filesz);
    printf("- Memory size: %d\n", phdr->p_memsz);
    printf("- Flags: %08x\n", phdr->p_flags);
    printf("- Alignment: %08x\n", phdr->p_align);
}

static char *stypes[] = {
        "SHT_NULL",
        "SHT_PROGBITS",
        "SHT_SYMTAB",
        "SHT_STRTAB",
        "SHT_RELA",
        "SHT_HASH",
        "SHT_DYNAMIC",
        "SHT_NOTE",
        "SHT_NOBITS",
        "SHT_REL",
        "SHT_SHLIB",
        "SHT_DYNSYM"
};

int print_section_header(Elf32_Shdr *shdr, uint index, char *strtable)
{
    if(!shdr)
        return 0;

    printf("Section header: %d\n", index);
    printf("- Name index: %d\n", shdr->sh_name);
    
    //as you can see, we're using sh_name as an index into the string table
    printf("- Name: %s\n", strtable + shdr->sh_name);
    if(shdr->sh_type <= 11)
        printf("- Type: %s\n", stypes[shdr->sh_type]);
    else
        printf("- Type: %04x\n", shdr->sh_type);
    printf("- Flags: %08x\n", shdr->sh_flags);
    printf("- Address: %08x\n", shdr->sh_addr);
    printf("- Offset: %08x\n", shdr->sh_offset);
    printf("- Size: %08x\n", shdr->sh_size);
    printf("- Link %08x\n", shdr->sh_link);
    printf("- Info: %08x\n", shdr->sh_info);
    printf("- Address alignment: %08x\n", shdr->sh_addralign);
    printf("- Entry size: %08x\n", shdr->sh_entsize);

}

int main(int argc, char *argv[])
{
    int fd_elf = -1;
    u_char *p_base = NULL;
    char *p_strtable = NULL;
    struct stat elf_stat;
    Elf32_Ehdr *p_ehdr = NULL;
    Elf32_Phdr *p_phdr = NULL;
    Elf32_Shdr *p_shdr = NULL;
    int i;

    if(argc < 2)
    {
        printf("Usage: %s </path/to/file>\n", argv[0]);
        return 1;
    }

    fd_elf = open(argv[1], O_RDONLY);
    if(fd_elf == -1)
    {
        fprintf(stderr, "Could not open %s: %s\n", argv[1], strerror(errno));
        return 1;
    }

    if(fstat(fd_elf, &elf_stat) == -1)
    {
        fprintf(stderr, "Could not stat %s: %s\n", argv[1], strerror(errno));
        close(fd_elf);
        return 1;
    }

    p_base = (u_char *)calloc(sizeof(u_char), elf_stat.st_size);
    if(!p_base)
    {
        fprintf(stderr, "Not enough memory\n");
        close(fd_elf);
        return 1;
    }

    if(read(fd_elf, p_base, elf_stat.st_size) != elf_stat.st_size)
    {
        fprintf(stderr, "Error while reading file: %s\n", strerror(errno));
        free(p_base);
        close(fd_elf);
        return 1;
    }
    
    close(fd_elf);

    p_ehdr = (Elf32_Ehdr *)p_base;
    if(elf_is_valid(p_ehdr))
    {
        print_elf_header(p_ehdr);

        printf("\n");
        
        //to reach the section header table and the program header table
        //we simply add the offset of these table to the base address
        p_phdr = (Elf32_Phdr *)(p_base + p_ehdr->e_phoff);
        p_shdr = (Elf32_Shdr *)(p_base + p_ehdr->e_shoff);
        
        //this is the first example of string table usage: the e_shstrndx field
        //holds an index into the section header table, which is address by p_shdr. The section's
        //sh_offset field will hold the offset of the string table, to get the actual pointer
        //we have just to sum it to the base address
        p_strtable = (char *)(p_base + p_shdr[p_ehdr->e_shstrndx].sh_offset);

        for(i = 0; i < p_ehdr->e_phnum; i++)
        {
            print_program_header(&p_phdr[i], i);
        }

        for(i = 0; i < p_ehdr->e_shnum; i++)
        {
            print_section_header(&p_shdr[i], i, p_strtable);
        }
    }
    else
        printf("Invalid ELF file\n");

    free(p_base);
    return 0;
}
the interesting parts have been commented.

Symbol table

The symbol table is a foundamental structure of the ELF format. As the name says, it's a table which contains an array of symbols. The ELF manual provides us an elegant definition for the symbol table: "An object file’s symbol table holds information needed to locate and relocate a program’s symbolic definitions and references.". Let's make an example, anticipating the dynamic linking process: when an executable needs to use a function defined elsewhere (usually inside a shared object) the linker will create an entry in the dynamic symbol table with its value equal to 0 (the symbol's value is not the same thing as the symbol's name) and its name equal to the name of the external function needed by the executable; together with the entry in the symbol table, an additional entry will be created in the relocation table which holds additional information on how to resolve the symbol (will be explained later), the linker will look at loaded libraries, searching for a library that has a symbol with the same name as the symbol used by the executable, the value of the symbol found in one on the loaded libraries will be the actual function's entry point. This way the dynamic linker is "resolving" a symbol which refers an external one.
The symbol table structure:
typedef struct
{
  Elf32_Word    st_name;        /* Symbol name (string tbl index) */
  Elf32_Addr    st_value;        /* Symbol value */
  Elf32_Word    st_size;        /* Symbol size */
  unsigned char    st_info;        /* Symbol type and binding */
  unsigned char    st_other;        /* Symbol visibility */
  Elf32_Section    st_shndx;        /* Section index */
} Elf32_Sym;
the symbol table is quite a complex structure, so it's better if you take a look at the ELF manual.
Let's see the output from "readelf":
quake2@quake2-desktop:~$ readelf -W -s /bin/ls

Symbol table '.dynsym' contains 105 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
[...]
    23: 00000000   198 FUNC    GLOBAL DEFAULT  UND strncpy@GLIBC_2.0 (2)
    24: 00000000    35 FUNC    GLOBAL DEFAULT  UND freecon
    25: 00000000    88 FUNC    GLOBAL DEFAULT  UND memset@GLIBC_2.0 (2)
    26: 00000000   441 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (2)
    27: 00000000    68 FUNC    GLOBAL DEFAULT  UND mempcpy@GLIBC_2.1 (5)
    28: 00000000    80 FUNC    GLOBAL DEFAULT  UND __memcpy_chk@GLIBC_2.3.4 (6)
    29: 00000000   186 FUNC    GLOBAL DEFAULT  UND _obstack_begin@GLIBC_2.0 (2)
    30: 00000000    19 FUNC    GLOBAL DEFAULT  UND _exit@GLIBC_2.0 (2)
    31: 00000000   441 FUNC    GLOBAL DEFAULT  UND strrchr@GLIBC_2.0 (2)
    32: 00000000   336 FUNC    GLOBAL DEFAULT  UND __assert_fail@GLIBC_2.0 (2)
    33: 00000000    29 FUNC    GLOBAL DEFAULT  UND bindtextdomain@GLIBC_2.0 (2)
    34: 00000000   597 FUNC    GLOBAL DEFAULT  UND mbrtowc@GLIBC_2.0 (2)
    35: 00000000    62 FUNC    GLOBAL DEFAULT  UND gettimeofday@GLIBC_2.0 (2)
    36: 00000000    64 FUNC    GLOBAL DEFAULT  UND __ctype_toupper_loc@GLIBC_2.3 (7)
    37: 00000000    69 FUNC    GLOBAL DEFAULT  UND __lxstat64@GLIBC_2.2 (3)
    38: 00000000   446 FUNC    GLOBAL DEFAULT  UND _obstack_newchunk@GLIBC_2.0 (2)
    39: 00000000   102 FUNC    GLOBAL DEFAULT  UND __overflow@GLIBC_2.0 (2)
    40: 00000000    73 FUNC    GLOBAL DEFAULT  UND dcgettext@GLIBC_2.0 (2)
    41: 00000000   100 FUNC    GLOBAL DEFAULT  UND sigaction@GLIBC_2.0 (2)
    42: 00000000   351 FUNC    GLOBAL DEFAULT  UND strverscmp@GLIBC_2.1 (5)
    43: 00000000   152 FUNC    GLOBAL DEFAULT  UND opendir@GLIBC_2.0 (2)
    44: 00000000    71 FUNC    GLOBAL DEFAULT  UND getopt_long@GLIBC_2.0 (2)
    45: 00000000    64 FUNC    GLOBAL DEFAULT  UND ioctl@GLIBC_2.0 (2)
    46: 00000000    64 FUNC    GLOBAL DEFAULT  UND __ctype_b_loc@GLIBC_2.3 (7)
    47: 00000000   226 FUNC    GLOBAL DEFAULT  UND iswcntrl@GLIBC_2.0 (2)
    48: 00000000    50 FUNC    GLOBAL DEFAULT  UND isatty@GLIBC_2.0 (2)
    49: 00000000   539 FUNC    GLOBAL DEFAULT  UND fclose@GLIBC_2.1 (5)
    50: 00000000    25 FUNC    GLOBAL DEFAULT  UND mbsinit@GLIBC_2.0 (2)
    51: 00000000    54 FUNC    GLOBAL DEFAULT  UND _setjmp@GLIBC_2.0 (2)
    52: 00000000    56 FUNC    GLOBAL DEFAULT  UND tcgetpgrp@GLIBC_2.0 (2)
    53: 00000000    60 FUNC    GLOBAL DEFAULT  UND mktime@GLIBC_2.0 (2)
    54: 00000000   222 FUNC    GLOBAL DEFAULT  UND readdir64@GLIBC_2.2 (3)
    55: 00000000    70 FUNC    GLOBAL DEFAULT  UND memcpy@GLIBC_2.0 (2)
    56: 00000000    76 FUNC    GLOBAL DEFAULT  UND strtoul@GLIBC_2.0 (2)
    57: 00000000   175 FUNC    GLOBAL DEFAULT  UND strlen@GLIBC_2.0 (2)
    58: 00000000   299 FUNC    GLOBAL DEFAULT  UND getpwuid@GLIBC_2.0 (2)
    59: 00000000   186 FUNC    GLOBAL DEFAULT  UND acl_extended_file@ACL_1.0 (8)
    60: 00000000  1931 FUNC    GLOBAL DEFAULT  UND setlocale@GLIBC_2.0 (2)
    61: 00000000    37 FUNC    GLOBAL DEFAULT  UND strcpy@GLIBC_2.0 (2)
    62: 00000000   148 FUNC    GLOBAL DEFAULT  UND raise@GLIBC_2.0 (2)
    63: 00000000   178 FUNC    GLOBAL DEFAULT  UND fwrite_unlocked@GLIBC_2.1 (5)
    64: 00000000   293 FUNC    GLOBAL DEFAULT  UND clock_gettime@GLIBC_2.2 (9)
    65: 00000000   123 FUNC    GLOBAL DEFAULT  UND getfilecon
    66: 00000000    98 FUNC    GLOBAL DEFAULT  UND closedir@GLIBC_2.0 (2)
    67: 00000000   403 FUNC    GLOBAL DEFAULT  UND fwrite@GLIBC_2.0 (2)
    68: 00000000   174 FUNC    GLOBAL DEFAULT  UND sigprocmask@GLIBC_2.0 (2)
    69: 00000000    32 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (10)
    70: 00000000    42 FUNC    GLOBAL DEFAULT  UND __fpending@GLIBC_2.2 (3)
    71: 00000000   123 FUNC    GLOBAL DEFAULT  UND lgetfilecon
    72: 00000000   223 FUNC    GLOBAL DEFAULT  UND error@GLIBC_2.0 (2)
    73: 00000000   299 FUNC    GLOBAL DEFAULT  UND getgrgid@GLIBC_2.0 (2)
    74: 00000000    75 FUNC    GLOBAL DEFAULT  UND __strtoull_internal@GLIBC_2.0 (2)
    75: 00000000   115 FUNC    GLOBAL DEFAULT  UND sigaddset@GLIBC_2.0 (2)
[...]
as you can see the executable "/bin/ls" has only dynamic symbols, i.e. symbols that are resolved by the dynamic linker. You may have noticed that some symbols have a "@GLIBC_2.2" o similar appended to their names: this string is not part of the name, is appended by "readelf", which actually parse GNU's versioning information. If you want to know how to read the versioning informations, take a look at the source of "readelf", which can be found in the "binutils" package.

Relocation table

The relocation table holds an array of entry, each one of these describes how to relocate code/data sections. Let's try to understand better the relocation process: various instructions within the code section actually makes reference to objects (variables and/or functions) which resides somewhere else (within the code section or in another place). To access these objects an absolute address is needed: this might not be a problem for executables since they're always loaded at the same base address, but libraries are not, so we need a way to locate these objects, by "manipulating" the absolute address which references them. This is exactly what relocations do: they instruct the dynamic linker (or the loader) on how to manipulate the address they're referenceing, since a relocation will always reference an address to be manipulated in some way. There are various kind of relocations, but for this article the important ones are those for x86 architecture. There are two big categories of relocations: RELA and REL. On x86 architecture there are only relocations of type REL (whilst on x64 there are only relocations of type RELA).
The REL relocation structure:
typedef struct
{
  Elf32_Addr    r_offset;        /* Address */
  Elf32_Word    r_info;            /* Relocation type and symbol index */
} Elf32_Rel;
the interesting type of relocation for this article will be R_386_JMP_SLOT, which will be described in the last paragraph of the last chapter.
It's time so make an example which deals with relocations. The shared object used is "libhook.so" which will be built in the next part of the article, but this is just an example, so don't worry if you can't reproduce it (you should try to reproduce it using another library, as an exercise), let's take a look at the first few relocations from "libhook.so":
quake2@quake2-desktop:~/elfinj$ readelf -r libhook.so

Relocation section '.rel.dyn' at offset 0x3b4 contains 49 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000679  00000008 R_386_RELATIVE   
0000069d  00000008 R_386_RELATIVE   
000006b8  00000008 R_386_RELATIVE   
000006c5  00000008 R_386_RELATIVE   
000006ca  00000008 R_386_RELATIVE
from this output it is clear that the relocation being used is R_386_RELATIVE, so from the ELF manual:
R_386_RELATIVE The link editor creates this relocation type for dynamic link-
               ing. Its offset member gives a location within a shared object
               that contains a value representing a relative address. The
               dynamic linker computes the corresponding virtual address
               by adding the virtual address at which the shared object was
               loaded to the relative address. Relocation entries for this type
               must specify 0 for the symbol table index.
so the r_offset field holds the address to which the relocation must applied, let's choose the relocation with offset 0x000006b8, here's the contents at this address:
 6b5:    c7 04 24 81 0a 00 00     movl   $0xa81,(%esp)
 6bc:    e8 fc ff ff ff           call   6bd 
the offset referenced by the relocation is the actual argument of the "movl" instruction, which rewritten with Intel syntax reads "mov [esp], 0x00000A81", so the relocation is to be applied to the immediate value on the right hand side of the "mov", which is the value at offset 0x6B8. From the description, it is clear that the value referenced must be added to the image base (that is, the address at which the image of the library resides in memory) to form a valid absolute address. These kind of relocations do not have symbol associated with them. The relocation process is performed by the linker while it's preparing the memory image of the ELF file. Have a look at the ELF manual for a complete list of valid relocation types for the x86 architecture.

Dynamic section

The last structure which will be described is the dynamic section. This section holds informations for the dynamic linker, in particular how to retrieve dynamic symbols once the image has been loaded, the libraries required to load the image, the dynamic relocations, etc... . Usually the dynamic section is contained in its own section of type SHT_DYNAMIC; once the ELF has been loaded into memory, the dynamic section can be retrieved from the program header of type PT_DYNAMIC. As usual, the dynamic section is a table made up of various entries. The table ends will a NULL entry.
Let's see the structure:
typedef struct
{
  Elf32_Sword    d_tag;            /* Dynamic entry type */
  union
    {
      Elf32_Word d_val;            /* Integer value */
      Elf32_Addr d_ptr;            /* Address value */
    } d_un;
} Elf32_Dyn;
Here's the dynamic section of "/bin/ls":
quake2@quake2-desktop:~$ readelf -d /bin/ls

Dynamic section at offset 0x16f04 contains 24 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [librt.so.1]
 0x00000001 (NEEDED)                     Shared library: [libselinux.so.1]
 0x00000001 (NEEDED)                     Shared library: [libacl.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000c (INIT)                       0x8049508
 0x0000000d (FINI)                       0x805af7c
 0x00000004 (HASH)                       0x8048188
 0x6ffffef5 (GNU_HASH)                   0x80484b8
 0x00000005 (STRTAB)                     0x8048ba4
 0x00000006 (SYMTAB)                     0x8048514
 0x0000000a (STRSZ)                      1199 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x805fff4
 0x00000002 (PLTRELSZ)                   744 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x8049220
 0x00000011 (REL)                        0x80491f8
 0x00000012 (RELSZ)                      40 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffe (VERNEED)                    0x8049128
 0x6fffffff (VERNEEDNUM)                 3
 0x6ffffff0 (VERSYM)                     0x8049054
 0x00000000 (NULL)                       0x0
of couse the dynamic section does not hold any section index, since once the image is loaded into memory, sections loose their meaning, se there are only (relative)virtual addresses.
From the dynamic section of "/bin/ls" it can be seen that this executable requires four libraries to run, namely: "librt.so.1", "libselinux.so.1", "libacl.so.1" and "libc.so.6".

Hash tables

If you take a deeper look to the sections list, you will find a section of type SHT_HASH, this means that the section holds an hash table. Hash tables are used for fast symbol lookup, but they're not used in this article, so there's nothing to care about, except one thing: each hash table has a field called nchains (take a look at Fig. 5-11 in page 94 of the general manual), which is equal to the total number of symbol names that have been hashed. So this field gives the total number of symbols and it will be used in the second part to perform symbol lookup (to know when to stop searching for a particular symbol).

Let's write more code...

The brief description of the ELF format has ended, so it's time to see more snippets of code, here they are:
static char *btypes[] = {
        "STB_LOCAL",
        "STB_GLOBAL",
        "STB_WEAK"
};

static char *symtypes[] = {
        "STT_NOTYPE",
        "STT_OBJECT",
        "STT_FUNC",
        "STT_SECTION",
        "STT_FILE"
};

void print_bind_type(u_char info)
{
    u_char bind = ELF32_ST_BIND(info);
    if(bind <= 2)
        printf("- Bind type: %s\n", btypes[bind]);
    else
        printf("- Bind type: %d\n", bind);
}

void print_sym_type(u_char info)
{
    u_char type = ELF32_ST_TYPE(info);

    if(type <= 4)
        printf("- Symbol type: %s\n", symtypes[type]);
    else
        printf("- Symbol type: %d\n", type);
}

int print_sym_table(u_char *filebase, Elf32_Shdr *section, char *strtable)
{
    Elf32_Sym *symbols;
    size_t sym_size = section->sh_entsize;
    size_t cur_size = 0;

    if(section->sh_type == SHT_SYMTAB)
        printf("Symbol table\n");
    else
        printf("Dynamic symbol table\n");

    if(sym_size != sizeof(Elf32_Sym))
    {
        printf("There's something evil with symbol table...\n");
        return 0;
    }

    symbols = (Elf32_Sym *)(filebase + section->sh_offset);
    symbols++;
    cur_size += sym_size;
    do
    {
        printf("- Name index: %d\n", symbols->st_name);
        printf("- Name: %s\n", strtable + symbols->st_name);
        printf("- Value: 0x%08x\n", symbols->st_value);
        printf("- Size: 0x%08x\n", symbols->st_size);

        print_bind_type(symbols->st_info);
        print_sym_type(symbols->st_info);

        printf("- Section index: %d\n", symbols->st_shndx);
        cur_size += sym_size;
        symbols++;
    } while(cur_size < section->sh_size);

    return 1;
}

int main(int argc, char *argv[])
{
    int fd_elf = -1;
    u_char *p_base = NULL;
    char *p_strtable = NULL;
    struct stat elf_stat;
    Elf32_Ehdr *p_ehdr = NULL;
    Elf32_Phdr *p_phdr = NULL;
    Elf32_Shdr *p_shdr = NULL;
    int i;

    if(argc < 2)
    {
        printf("Usage: %s \n", argv[0]);
        return 1;
    }

    fd_elf = open(argv[1], O_RDONLY);
    if(fd_elf == -1)
    {
        fprintf(stderr, "Could not open %s: %s\n", argv[1], strerror(errno));
        return 1;
    }

    if(fstat(fd_elf, &elf_stat) == -1)
    {
        fprintf(stderr, "Could not stat %s: %s\n", argv[1], strerror(errno));
        close(fd_elf);
        return 1;
    }

    p_base = (u_char *)calloc(sizeof(u_char), elf_stat.st_size);
    if(!p_base)
    {
        fprintf(stderr, "Not enough memory\n");
        close(fd_elf);
        return 1;
    }

    if(read(fd_elf, p_base, elf_stat.st_size) != elf_stat.st_size)
    {
        fprintf(stderr, "Error while reading file: %s\n", strerror(errno));
        free(p_base);
        close(fd_elf);
        return 1;
    }
    
    close(fd_elf);

    p_ehdr = (Elf32_Ehdr *)p_base;
    if(elf_is_valid(p_ehdr))
    {
        print_elf_header(p_ehdr);

        printf("\n");
        p_phdr = (Elf32_Phdr *)(p_base + p_ehdr->e_phoff);
        p_shdr = (Elf32_Shdr *)(p_base + p_ehdr->e_shoff);
        p_strtable = (char *)(p_base + p_shdr[p_ehdr->e_shstrndx].sh_offset);

        for(i = 0; i < p_ehdr->e_phnum; i++)
        {
            print_program_header(&p_phdr[i], i);
        }

        for(i = 0; i < p_ehdr->e_shnum; i++)
        {
            print_section_header(&p_shdr[i], i, p_strtable);
            if(p_shdr[i].sh_type == SHT_SYMTAB || p_shdr[i].sh_type == SHT_DYNSYM)
            {
                printf("This section holds a symbol table...\n");

                //being a symbol table, the field sh_link of the section header
                //will hold an index into the section table which gives the
                //section containing the string table
                print_sym_table(p_base, &p_shdr[i], (char *)(p_base + p_shdr[p_shdr[i].sh_link].sh_offset));
            }
        }
    }
    else
        printf("Invalid ELF file\n");

    free(p_base);
    return 0;
}
as an exercise you should write the code which prints out the dynamic section (by now you should know how to do it).

ELF loading

This chapter is a brief description of the loading process of an ELF image. Mainly it will be about the PLT, so much more attention will be given to that argument. But it will present a general overview of the loading process. For detailed information you should read the manual (which gives a very good description of the loading process, with various examples of memory configurations).

Program headers

Program headers decribe how to map one or more sections of the file into memory (e.g. PT_LOAD type) and can hold informations that might be useful at runtime after the loading process has ended. Some program headers have a special role:

Dynamic section

The dynamic section holds all the informations needed to dynamically link the executable/library. Let's spend a few words on dynamic linking: it's the actual process that will apply relocations to the ELF image and, if lazy binding is used, will resolve any external symbol not already resolved. The dynamic section also holds informations not strictly needed for dynamic linking as the entry point to the initialization/finalization function. The next paragraph will be entirely on PLT, which is the process through which dynamic symbols get resolved at runtime. Here's a list of important dynamic types:

The PLT

The Procedure Linkage Table is a table in which the various entries are made up of code blocks. It's the principal component which allows the dynamic linker to resolve external functions. Here's an example: suppose that you write a program which references a function defined inside a library:
int main()
{
    int res = external_function(3,4);
    return 0;
}
of course to invoke the function you need to know it's absolute address, being an external one. The absolute address cannot be hardcoded by the linker, because usually libraries are loaded at different base addresses, so absolute addresses have no meaning for them. This situation is overcomed by the use of the PLT, so when you call an external function, the following code is generated:
push 0x04
push 0x03
call external_function@plt
add esp, 8

[,..]

; this is a PLT entry
external_function@plt (address 0xXXXXXX00):
  external_function@plt+0x00: jmp dword ptr [reloc_address] ; reloc_address is just a memory location
  external_function@plt+0x06: push reloc_offset ; reloc_offset is a byte offset (not an index) into the relocation table
  external_function@plt+0x0B: jmp resolve_function ; resolve_function is a function that will resolve the external symbol
  
; this is what you find at reloc_address, data is displayed using dwords
reloc_address: XXXXXX06 ........
what's happening here is that when the program reaches the "call", it will transfer execution to a PLT entry. The first instruction executed then is a "jmp", which will transfer execution to the value contained in the locaion "reloc_address", which, as you can see, is the address of the instruction following the first "jmp" in the plt entry. So back again in the PLT, a byte offset is PUSHed into the stack and then execution is transferred to a function which, taking out of the stack the last value pushed, will resolve the external symbol. By now, you might think that this procedure is painfully slow, with all those cache-killing jumps. But let's go one step ahead, and look at what's happending after the external symbol has been resolved:
push 0x04
push 0x03
call external_function@plt
add esp, 8

[...]

external_function@plt:
    external_function@plt+0x00:  jmp dword ptr [reloc_address]
    external_function@plt+0x06:  push reloc_index
    external_function@plt+0x0B:  jmp  resolve_function
    
reloc_address: BFF31337 ........
something has changed, hasn't it? After the external symbol has been resolved, the memory location adressed by "reloc_address" will not contain the address of the instruction following the first jmp, but will contain the actual entry point to the external function, so all the jmp-crazyness will be done only the first time. If you did not understand something, then read carefully the manual. The PLT is the most important thing in this two-part article, and the next part will deal much more with it, so be sure to know how it works. Anyway, by the end of this part, there will be a practical example using gdb on how the PLT works.

Building up our lab...

After the brief introduction to the ELF format, it's time to start working on preparing our "laboratory", that is, building a bunch of sample ELFs that we will use in the next part.
We will build three ELFs, one executable and two libraries, one of which will be the library that will get injected into the executable and will hook the function inside the other library (which is used by the executable directly). Let's start building the first two image file that we will use: the executable and the library used by it. Here's the source of the library, only the .c file is shown:
#include "libdummy.h"

int dummy_add(int a, int b)
{
	return a+b;
}
compile and link it:
gcc -fPIC -c libdummy.c
ld -shared -soname libdummy.so.1 -o libdummy.so.1.0 -lc libdummy.o
now let's update the cache with "ldconfig" (if you move the library to /usr/lib or any other system path, you might remove the "-n ." parameter):
ldconfig -v -n .
and create the symbolic link needed by the linker, so we can link with "-ldummy":
ln -sf libdummy.so.1 libdummy.so
Here's the executable which makes use of the library just built:
#include 
#include "libdummy.h"

int main()
{
	int a,b;
	int res = 0;

	printf("Enter the first number: ");
	scanf("%d", &a);
	printf("Enter the second number: ");
	scanf("%d", &b);
	res = dummy_add(a,b);
	printf("Result is: %d\n", res);
	return 0;
}
compile and link it:
gcc -o dummyelf dummyelf.c -L. -ldummy
Don't forget to try it to see if everything workd (if you did not move libdummy.so.1.0 to /usr/lib you should set the LD_LIBRARY_PATH: "export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH").

That's all for now, in the next chapter we will explore the PLT. We will build the second library in the next part of this article.

PLT: a practical example

The PLT has been discussed much in detail, but there's nothing better than a real example, so let's play around with the executable just built.
First of all, let's debug the executable with gdb, setting a breakpoint on the "dummy_add" call (keep in mind that your addresses can be different):
quake2@quake2-desktop:~/elfinj$ gdb dummyelf
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x08048507 <main+99>:	call   0x80483cc <dummy_add@plt>
[...]
End of assembler dump.
(gdb) break *0x08048507
Breakpoint 1 at 0x8048507
(gdb) 
start the program and step until it breaks, then step until you enter the call:
(gdb) display/i $pc
(gdb) run
Starting program: /home/quake2/elfinj/dummyelf 
Enter the first number: 23
Enter the second number: 23

Breakpoint 1, 0x08048507 in main ()
1: x/i $pc
0x8048507 <main+99>:	call   0x80483cc <dummy_add@plt>
Current language:  auto; currently asm
(gdb) stepi
0x080483cc in dummy_add@plt ()
1: x/i $pc
0x80483cc <dummy_add@plt>:	jmp    *0x804a00c
(gdb)
so, as expected, the first instruction is a jmp to the value addressed by 0x804a00c, let's see what's contained there:
(gdb) print /x *0x804a00c
$1 = 0x80483d2
(gdb)
it holds 0x80483d2 which is the address of the instruction following the first jmp, this can be checked by disassembling the instructions around eip:
(gdb) disassemble
Dump of assembler code for function dummy_add@plt:
0x080483cc :	jmp    *0x804a00c
0x080483d2 :	push   $0x18
0x080483d7 :	jmp    0x804838c <_init+48>
End of assembler dump.
(gdb) 
as I told you, the instruction following the jmp will push an offset into the stack, which is a byte offset into the relocation table, let's check if this is true: take the base address of the relocation table (value of DT_JMPREL dynamic entry) and add it to the offset:
(gdb) print /x *(0x8048334+0x18)
$4 = 0x804a00c
(gdb)
and if you check the Elf32_Rel structure, you will see that the first field has Elf32_Addr as its type, which is a 32bit unsigned integer and holds the address to which the relocation must be applied. The dynamic linker will replace the value at 0x804a00c with the actual address of the dummy_add function:
0x80483d7 <dummy_add@plt+11>:	jmp    0x804838c <_init+48>
(gdb) step
Single stepping until exit from function dummy_add@plt, 
which has no line number information.
0xb8095168 in dummy_add () from ./libdummy.so.1
1: x/i $pc
0xb8095168 <dummy_add>:	push   %ebp
(gdb) print /x *0x804a00c
$1 = 0xb8095168
(gdb)
the memory location 0x804a00c now holds the virtual address of dummy_add, that is 0xb8095168.

Conclusions

This tutorial should have introduced the reader to the basics of the ELF format and how the system (in this case Linux) loads it. This part by no means is intended to replace the ELF manuals, which are a required reading for the next part, where things will get far more complicated. I've decided to split the article so if you're familiar with the ELF format, you can just skip to the second part. Also since there's a lot of code and other space consuming sections, making a big single article would have made a REALLY long html file, which is quite unpleasant. Things should be simple and straight.

The next part will deal with the actual injection and hooking. The arguments covered will be: the ptrace interface with examples, code injection (general view), shared object injection and limitations of the technique, PLT hooking (which will use all the concepts learned so far). The next part is expected to come out in the next few weeks (I hope not more than 2 weeks), depending on my time schedule (I'm an university student with a full-time job, it's quite difficult to find free-time :) ). I hope you enjoyed this first part...see you on the next one!

References

Here you will find all the tools/books used during this article:

The GNU Compiler Collection (GCC)
The GNU Debugger (GDB)
The Netwide Assembler (NASM)
GNU's binutils
The ELF format specifications (for every platforms)
The ELF formt specifications for ELF64 (for every platforms)
The ELF format specifications for the x86 architecture
The ELF format specifications for the x64 architecture
Brief description of the AT&T syntax, with NASM version of most statements

History

10/11/2008: first version

Introduction

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
Generalcan you email me the source code. Pin
karthik.sharma
12:23 7 Oct '09  
Generalpart two? Pin
rob508
8:44 2 Aug '09  
GeneralHelp Pin
xingcai
2:58 16 May '09  
GeneralE-mail me you source code Pin
incorrect.user
8:26 24 Apr '09  
GeneralDownload file missing Pin
Smitha Vijayan
6:56 14 Dec '08  
GeneralRe: Download file missing Pin
Quake2th
10:05 14 Dec '08  


Last Updated 10 Dec 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2009