Home / Tutorials / GCC Compilation Phases
GCC Compilation Phases
When you run gcc main.c -o myprogram, GCC performs four distinct phases to transform your source code into an executable. Understanding each phase helps with debugging, optimization, and building systems.
The four phases are:
- Preprocessing — text substitution before compilation
- Compilation — translating C to assembly
- Assembly — translating assembly to machine code (object files)
- Linking — combining object files into an executable
Phase 1: Preprocessing
The preprocessor (cpp) handles directives that begin with #. It runs before any actual compilation. Its job is purely textual:
#include— replaces the directive with the contents of the specified file#define— defines macros; every occurrence in the code is replaced with the expansion#ifdef/#ifndef/#endif— conditional compilation; sections of code are included or excluded based on whether a macro is defined- Comments — stripped out entirely
Invoking the Preprocessor
You can run just the preprocessor step with:
cpp main.c -o main.i
# or equivalently:
gcc -E main.c -o main.i
The output .i file contains the fully preprocessed source — all #include files inlined, all macros expanded, no comments.
Conditional Compilation
A common use case is enabling debug output only in debug builds:
#ifdef DEBUG_MODE
printf("x = %d\n", x);
#endif
To enable this at compile time:
gcc -DDEBUG_MODE main.c -o myprogram
The -D flag is equivalent to writing #define DEBUG_MODE at the top of the file.
Viewing Macro Expansions
The -E output lets you see exactly what the compiler receives after preprocessing — useful for debugging complex macros.
Phase 2: Compilation
The compiler proper takes the preprocessed source (.i file) and translates it into assembly language (.s file). This is where:
- Syntax checking happens
- Type checking occurs
- Optimizations are applied (if
-Oflags are set) - The high-level C constructs are mapped to the instruction set of the target CPU
Stopping After Compilation
gcc -S main.c -o main.s
The -S flag tells GCC to stop after the compilation step, producing an assembly .s file.
Reading the Assembly
The assembly output contains human-readable (though dense) instructions:
main:
pushq %rbp
movq %rsp, %rbp
movl $5, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
ret
Understanding this output is useful for performance analysis and confirming that your compiler optimizations are working as expected.
Phase 3: Assembly
The assembler (as) translates assembly language into machine code — binary instructions the CPU can execute — and packages the result into an object file (.o).
Object files use a standard format (ELF on Linux, Mach-O on macOS, COFF on Windows). They contain:
- The compiled machine code for the translation unit
- A symbol table listing all functions and variables defined or referenced
- Relocation information (placeholders for addresses that will be filled in during linking)
Producing Object Files
gcc -c main.c -o main.o
# or, from assembly:
as main.s -o main.o
Inspecting Object Files
The objdump tool lets you inspect the contents of an object file:
objdump -d main.o # disassemble
objdump -t main.o # show symbol table
objdump -r main.o # show relocation entries
Relocation entries are important — they mark the locations in the object file where the linker must fill in real addresses for external symbols (functions and variables defined in other translation units).
Phase 4: Linking
The linker (ld, usually invoked via gcc) takes one or more object files and combines them into a final executable. Its jobs include:
- Symbol resolution — match each reference to an external symbol with its definition (e.g., a call to
printfis matched with the definition in the C library) - Relocation — fill in the addresses that were left as placeholders during assembly
- Library linking — incorporate code from static libraries (
.afiles) or record references to dynamic libraries (.sofiles) for runtime linking
Linking Multiple Object Files
gcc main.o utils.o -o myprogram
Linking with Libraries
gcc main.o -lm -o myprogram # link with libm (math library)
gcc main.o -L. -lmylib -o prog # link with a library in the current directory
Static vs Dynamic Linking
- Static linking (
-static): library code is copied into the executable. The binary is self-contained but larger. - Dynamic linking (default): the executable records which shared libraries it needs; the OS loads them at runtime. Multiple programs share the same library in memory.
Viewing Link Dependencies
ldd myprogram # list dynamic library dependencies
nm myprogram # list symbols (defined and undefined)
Putting It All Together
You can perform all four phases manually:
# 1. Preprocess
gcc -E main.c -o main.i
# 2. Compile to assembly
gcc -S main.i -o main.s
# 3. Assemble to object file
as main.s -o main.o
# 4. Link
gcc main.o -o myprogram
Or let GCC handle all phases in one command (the most common approach):
gcc main.c -o myprogram
Understanding these phases makes it much easier to troubleshoot build errors, apply targeted optimizations, and reason about binary size and dependencies.