πŸ–₯️ Week 3 β€’ +100 XP

CPU Architecture & Memory

Understand how computers actually execute instructions and manage memory. This knowledge is crucial for writing your compiler's code generator.

🧠 How CPUs Work

β–Ά

A CPU is surprisingly simple at its core. It just does three things in an endless loop:

1
Fetch
Read the next instruction from memory (pointed to by RIP register)
2
Decode
Figure out what the instruction means (add? subtract? jump?)
3
Execute
Do the operation, update registers/memory, move to next instruction

This Fetch-Decode-Execute cycle happens billions of times per second!

The CPU cycle:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                      β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚    β”‚ Fetch   β”‚  Read instruction at address RIP     β”‚
β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                                       β”‚
β”‚         ↓                                            β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚    β”‚ Decode  β”‚  What does this instruction do?      β”‚
β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                                       β”‚
β”‚         ↓                                            β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚    β”‚Execute  β”‚  Do the operation                    β”‚
β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                                       β”‚
β”‚         ↓                                            β”‚
β”‚    RIP += instruction_size  (move to next)          β”‚
β”‚         β”‚                                            β”‚
β”‚         └───────────────────────────────────────→ (repeat)
β”‚                                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Registers: Super-Fast Storage

β–Ά

Registers are tiny, ultra-fast storage locations inside the CPU. They're ~100x faster than RAM!

x86-64 General Purpose Registers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              x86-64 Registers (64-bit)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   RAX    β”‚ Accumulator / Return values                  β”‚
β”‚   RBX    β”‚ Base register (callee-saved)                 β”‚
β”‚   RCX    β”‚ Counter / 4th function argument              β”‚
β”‚   RDX    β”‚ Data / 3rd function argument                 β”‚
β”‚   RSI    β”‚ Source index / 2nd function argument         β”‚
β”‚   RDI    β”‚ Destination index / 1st function argument    β”‚
β”‚   RSP    β”‚ Stack pointer (top of stack)       β˜… SPECIAL β”‚
β”‚   RBP    β”‚ Base pointer (stack frame base)    β˜… SPECIAL β”‚
β”‚   R8-R15 β”‚ Extra general purpose registers              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   RIP    β”‚ Instruction pointer (next instruction addr) β”‚
β”‚  RFLAGS  β”‚ Status flags (zero, sign, carry, etc.)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Register Size Variants

Each register can be accessed at different sizes:

64-bit: RAX  ←────────────────────────────────────────┐
32-bit: EAX  ←─────────────────────────────┐          β”‚
16-bit: AX   ←───────────────────┐         β”‚          β”‚
8-bit:  AL   ←─────┐             β”‚         β”‚          β”‚
              β”‚    β”‚             β”‚         β”‚          β”‚
              β”œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
              β”‚ 7-0 β”‚ 15-8 β”‚ 31-16 β”‚    63-32         β”‚
              β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example:
mov rax, 0x123456789ABCDEF0  ; RAX = full 64-bit value
mov eax, 0x12345678          ; EAX = lower 32 bits (zeros upper 32!)
mov ax,  0x1234              ; AX  = lower 16 bits
mov al,  0x12                ; AL  = lowest 8 bits
πŸ› οΈ FOR YOUR COMPILER:
  • RAX - Always holds function return value
  • RDI, RSI, RDX, RCX, R8, R9 - First 6 function arguments
  • RSP - Stack pointer (don't mess this up!)
  • RBP - Base pointer for accessing local variables

πŸ’Ύ Memory Layout

β–Ά

When your program runs, memory is organized into distinct regions:

High Address (0x7FFF...)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Stack                           β”‚  ← Local variables, function calls
β”‚                      ↓                             β”‚    (grows DOWNWARD)
β”‚                                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ gap ────────────────────────────
β”‚                                                    β”‚
β”‚                      ↑                             β”‚    (grows UPWARD)
β”‚                    Heap                            β”‚  ← malloc() allocations
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚            Uninitialized Data (BSS)                β”‚  ← Zero-initialized globals
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              Initialized Data                      β”‚  ← Initialized globals, string literals
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    Code                            β”‚  ← Your program's machine instructions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Low Address (0x0000...)

Stack vs Heap Comparison

Stack Heap
Allocation Automatic (just move RSP) Manual (malloc)
Deallocation Automatic (function return) Manual (free)
Speed Very fast Slower
Size Limited (~8MB typical) Large (limited by RAM)
Organization LIFO (Last In First Out) Random access
Use For Local variables, function calls Dynamic data, large allocations

Stack Frame Structure

When you call: main() calls add(5, 3)

Stack (growing downward):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” High addresses
β”‚     main's locals      β”‚
β”‚     ──────────────     β”‚
β”‚   return address to    β”‚ ← Where to go back after add() returns
β”‚   whoever called main  β”‚
β”‚     saved RBP          β”‚ ← main's frame base
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     add's locals       β”‚
β”‚     ──────────────     β”‚
β”‚   return address       β”‚ ← Where to return (back to main)
β”‚     saved RBP          β”‚ ← RBP points here during add()
β”‚     argument: 5        β”‚ ← [RBP - 8]
β”‚     argument: 3        β”‚ ← [RBP - 16]
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ← RSP (stack top)
                           Low addresses

πŸ”§ Dynamic Memory: malloc & free

β–Ά

🐍 YOU KNOW (Python): Automatic memory management

my_list = [1, 2, 3]  # Python allocates memory for you
my_list.append(4)    # Python resizes automatically
# Python garbage collector frees memory when no longer used

βš™οΈ IN C: Manual memory management

#include <stdlib.h>

// Allocate memory on heap
int *arr = malloc(10 * sizeof(int));  // 40 bytes
if (arr == NULL) {
    // Handle allocation failure!
    return -1;
}

// Use it
arr[0] = 42;
arr[9] = 100;

// Free when done
free(arr);
arr = NULL;  // Good practice - prevent use after free

malloc/free Rules

βœ… DO:
  • Check if malloc returns NULL (allocation can fail!)
  • Free every malloc'd pointer exactly once
  • Set pointer to NULL after freeing
  • Match types: int *p = malloc(sizeof(int))
❌ DON'T:
  • Use pointer before checking if malloc succeeded
  • Free the same pointer twice (double free)
  • Use pointer after freeing (use after free)
  • Lose track of malloc'd pointer (memory leak)

calloc and realloc

// calloc - allocate AND zero-initialize
int *arr = calloc(10, sizeof(int));  // All elements = 0

// realloc - resize allocation
int *bigger = realloc(arr, 20 * sizeof(int));
if (bigger != NULL) {
    arr = bigger;  // arr now has room for 20 ints
}
πŸ› οΈ IN YOUR COMPILER - Dynamic Arrays:
typedef struct {
    Token *items;
    long count;
    long capacity;
} Token_Array;

void da_append(Token_Array *arr, Token tok) {
    if (arr->count >= arr->capacity) {
        arr->capacity = (arr->capacity == 0) ? 16 : arr->capacity * 2;
        arr->items = realloc(arr->items, arr->capacity * sizeof(Token));
    }
    arr->items[arr->count++] = tok;
}

⚠️ Common Memory Pitfalls

β–Ά

Pitfall 1: Dangling Pointer

// ❌ WRONG: Returning pointer to local variable
int *get_value() {
    int x = 42;
    return &x;  // x dies when function returns!
}

int *p = get_value();
printf("%d\n", *p);  // UNDEFINED BEHAVIOR!

// βœ… FIX: Return value, or allocate on heap
int get_value() { return 42; }  // Return by value

int *get_value_heap() {
    int *x = malloc(sizeof(int));
    *x = 42;
    return x;  // Heap memory persists
}

Pitfall 2: Memory Leak

// ❌ WRONG: Lost reference to allocated memory
void leak() {
    int *p = malloc(100 * sizeof(int));
    // ... use p ...
    return;  // Never called free! Memory leaked forever
}

// βœ… FIX: Free before returning
void no_leak() {
    int *p = malloc(100 * sizeof(int));
    // ... use p ...
    free(p);  // Always free what you malloc
}

Pitfall 3: Double Free

// ❌ WRONG: Freeing same memory twice
int *p = malloc(sizeof(int));
free(p);
free(p);  // CRASH! Already freed!

// βœ… FIX: Set to NULL after freeing
int *p = malloc(sizeof(int));
free(p);
p = NULL;   // Now safe
free(p);    // No-op: free(NULL) is safe

Pitfall 4: Use After Free

// ❌ WRONG: Using memory after freeing
int *p = malloc(sizeof(int));
*p = 42;
free(p);
printf("%d\n", *p);  // UNDEFINED! Memory might be reused

// βœ… FIX: Use before freeing
int *p = malloc(sizeof(int));
*p = 42;
printf("%d\n", *p);  // Use it first
free(p);             // Then free

Pitfall 5: Uninitialized Pointer

// ❌ WRONG: Pointer not initialized
int *p;        // p contains garbage address!
*p = 42;       // Writing to random memory location - CRASH!

// βœ… FIX: Initialize to NULL or valid address
int *p = NULL;
// ... later ...
p = malloc(sizeof(int));
if (p != NULL) {
    *p = 42;  // Safe
}

πŸ”§ Why This Matters for Your Compiler

β–Ά

Your compiler will generate assembly that directly manipulates registers, stack, and memory:

Function Prologue/Epilogue (Stack Setup)

; Every function starts with "prologue":
my_function:
    push rbp          ; Save caller's base pointer
    mov rbp, rsp      ; Set new base pointer
    sub rsp, 32       ; Allocate space for local variables

    ; Function body here...

    ; Every function ends with "epilogue":
    mov rsp, rbp      ; Deallocate locals
    pop rbp           ; Restore caller's base pointer
    ret               ; Return to caller

Accessing Local Variables

; For: let x: int = 5; let y: int = 10; return x + y;
fn main() -> int {
    let x: int = 5;    ; x is at [rbp - 8]
    let y: int = 10;   ; y is at [rbp - 16]
    return x + y;
}

; Generated assembly:
main:
    push rbp
    mov rbp, rsp
    sub rsp, 16           ; Room for 2 local ints

    mov qword [rbp-8], 5  ; x = 5
    mov qword [rbp-16], 10 ; y = 10

    mov rax, [rbp-8]      ; rax = x
    add rax, [rbp-16]     ; rax = x + y

    mov rsp, rbp
    pop rbp
    ret                   ; Return value in rax

Function Arguments (System V ABI)

; Calling convention: first 6 args in registers
; Arg 1: RDI
; Arg 2: RSI
; Arg 3: RDX
; Arg 4: RCX
; Arg 5: R8
; Arg 6: R9
; More args: pushed on stack
; Return value: RAX

; Example: add(5, 3)
mov rdi, 5        ; First argument
mov rsi, 3        ; Second argument
call add          ; Call function
; Result is now in rax
πŸ“Œ KEY TAKEAWAYS FOR CODEGEN:
  1. RAX = Return value (always put result here before ret)
  2. RDI, RSI, RDX, RCX, R8, R9 = Function arguments
  3. RSP = Stack top (subtract to allocate, add to deallocate)
  4. RBP = Frame base (local vars at [rbp-8], [rbp-16], etc.)
  5. push/pop = Use stack for temporary values during expressions

πŸ“Š Dynamic Arrays & Complexity

β–Ά

Your compiler uses growable arrays for tokens, AST nodes, and more. In C, we must manage this manually.

Python vs C: List Append

# Python: grows automatically
arr = []
arr.append(1)  # Magic!
arr.append(2)  # Still works!
// C: Must manage capacity manually
DynamicArray = {
    items: pointer to array
    count: how many elements used
    capacity: how many elements allocated
}

Append Algorithm

FUNCTION da_append(arr, value):
    // Need to grow?
    IF arr.count >= arr.capacity:
        // Double capacity
        arr.capacity = arr.capacity * 2 (or 16 if 0)
        arr.items = realloc(arr.items, new_size)
    
    // Add element
    arr.items[arr.count] = value
    arr.count = arr.count + 1

Why Double Capacity?

Operation Old Cap New Cap Copies
1st append 0 16 0
17th append 16 32 16
33rd append 32 64 32

Total copies for n appends: ~2n β†’ Amortized O(1) per append!

Compiler Data Structures Complexity

Structure Use In Compiler Lookup Insert
Dynamic Array Token storage O(1) O(1)*
Linked List AST children O(n) O(1)
Hash Table Symbol table O(1)* O(1)*
Tree (AST) Program structure O(h) O(h)

* = amortized or average case. h = tree height.

🧠 Check Your Understanding

β–Ά
Which register holds the return value of a function?
RAX
RDI
RSP
RBP
In which direction does the stack grow on x86-64?
Upward (toward higher addresses)
Downward (toward lower addresses)
It doesn't grow, it's fixed size
Both directions
Where does malloc() allocate memory?
Stack
Registers
Heap
Data segment
What is a "dangling pointer"?
A pointer set to NULL
A pointer that's never used
A pointer to a malloc'd block
A pointer to memory that's been freed or is out of scope

πŸ“ Practice Problems

β–Ά

Problem 1: Find the Bug

char *duplicate_string(const char *s) {
    char copy[100];
    strcpy(copy, s);
    return copy;  // What's wrong?
}

int main() {
    char *s = duplicate_string("hello");
    printf("%s\n", s);  // Probably garbage
}
Click for solution

Problem: copy is a local array on the stack, destroyed when function returns.

// Fix: Allocate on heap
char *duplicate_string(const char *s) {
    size_t len = strlen(s) + 1;
    char *copy = malloc(len);
    if (copy != NULL) strcpy(copy, s);
    return copy;  // Heap persists!
}
// Caller must free!

Problem 2: Memory Leak Hunt

void process_data(const char *filename) {
    FILE *f = fopen(filename, "r");
    if (f == NULL) return;

    char *buffer = malloc(1000);
    if (buffer == NULL) {
        fclose(f);
        return;
    }

    fread(buffer, 1, 1000, f);

    if (some_error_condition()) {
        return;  // Is there a leak?
    }

    free(buffer);
    fclose(f);
}
Click for solution

Leak! If some_error_condition() is true, we return without freeing buffer or closing f!

// Fix: Free resources before ALL returns
if (some_error_condition()) {
    free(buffer);   // βœ…
    fclose(f);      // βœ…
    return;
}

Problem 3: Stack vs Heap

For each allocation, identify where it lives (Stack or Heap):

void example() {
    int a = 5;              // 1. Where?
    int *b = malloc(4);     // 2. Where is b? Where is *b?
    char str[] = "hello";   // 3. Where?
    char *ptr = "world";    // 4. Where is ptr? Where is "world"?
}
Click for solution
  1. a β†’ Stack (local variable)
  2. b β†’ Stack (pointer), *b β†’ Heap (malloc'd memory)
  3. str β†’ Stack (local array, contents copied from literal)
  4. ptr β†’ Stack (pointer), "world" β†’ Data segment (string literal)
← Week 2 Next: x86-64 Assembly β†’