The C-ompilation Process

Behind the scenes

Luis David Escobedo Velasquez

--

Compiler converts a C program into an executable, this is a multi-stage process and we are going to analize it with a powerful tool called GCC “GNU Compiler Collection”. GCC is an integrated distribution of compilers for several major programming languages. These languages currently include C, C++, Objective-C, Objective-C++, Fortran, Ada, D, Go, and BRIG.

What goes inside the compilation process?

There are four phases for a C program to become an executable:

  1. Pre-processing
  2. Compilation
  3. Assembly
  4. Linking

Let us explain what is happening behing the scenes every time we use the command — gcc — in a .c file.

First, we are going to create a C file using a text editor such as VIM or Emacs and will save it as hello_world.c

#include <stdio.h>int main(void)
{
printf("Hello, world\n");
return (0);
}

In order to obtain all intermediate files, we’ll use the command below:

$gcc –Wall –save-temps hello_world.c –o hello_world

Then four new files are generated:

These files are generated as result of each stage in the compilation process, let’s analize it!

PREPROCESSING

This is the first phase through which source code is passed. This phase include:

  • Removal of Comments
  • Expansion of Macros
  • Expansion of the included files.
  • Conditional compilation

The preprocessed output is stored in the — hello_world.i. Let’s see what’s inside filename.i: using $emacs hello_world.i

We can see that the preprocessor will produce the contents of the stdio.h header file joined with the contents of ourhello_world.c file, stripped free from its leading comment.

COMPILATION

The second step is compile the hello_world.ifile to produce another intermediate output called —hello_world.s — . In this stage the preprocesed code is translated to assembly instructions, in this stage we still can understand it because is si an intermediate human readeable lenguaje.

This is what we can see from hello_world.susing an text editor such as Emacs.

The text editor shows that it is in assembly language, which assembler can understand.

ASSEMBLY

This is the third stage, an assembler is used to translate the assembly instructions to object code. The output consists of actual instructions to be run by the target processor. At this phase, the code is converted into machine language. Let’s view this file using $emacs hello_world.o

As we can see, this is totally unreadable for us.

LINKING

This is the last stage in the compilation process, here all the linking calls with their definitions are done. The linker will arrange the pieces of object code so that functions in some pieces can successfully call functions in other ones. It will also add pieces containing the instructions for library functions used by the program. In the case of the “Hello, world” program, the linker will add the object code for the printf function.

As result of this stage (and the other three stages) is an executable program. Is this case the final file is named hello_world. When run without the -o option the file will be named a.out as default, so consider going through the manual before using the gcc command.

--

--