πŸ“Š Foundation β€’ +100 XP

Data Structures

The building blocks your compiler uses: dynamic arrays, linked lists, hash tables, trees, and graphs.

⏱️ 90 min🎯 6 sections🧠 3 quizzes

πŸ“¦ Dynamic Arrays

β–Ά

In Python, lists grow automatically. In C, you manage this manually:

typedef struct {
    Token *items;     // Pointer to array
    long count;       // Elements used
    long capacity;    // Elements allocated
} Token_Array;

// Append with automatic growth
void da_append(Token_Array *arr, Token tok) {
    if (arr->count >= arr->capacity) {
        // Double capacity when full!
        arr->capacity = arr->capacity == 0 ? 16 : arr->capacity * 2;
        arr->items = realloc(arr->items, arr->capacity * sizeof(Token));
    }
    arr->items[arr->count++] = tok;
}

Why Double?

Appends Old Cap New Cap Copies
17th 16 32 16
33rd 32 64 32

Amortized O(1) per append. If we grew by +1 each time, it would be O(n)!

πŸ”— Linked Lists

β–Ά

AST nodes use doubly-linked lists for siblings:

typedef struct AST_Node {
    AST_Kind kind;
    struct AST_Node *prev;  // Previous sibling
    struct AST_Node *next;  // Next sibling
    // ... data fields
} AST_Node;
Visualization:
first β†’ [A] ⇄ [B] ⇄ [C] ← last
         └─prev/nextβ”€β”˜

Traversal

// Forward
for (AST_Node *n = list->first; n != NULL; n = n->next) {
    process(n);
}

// Backward
for (AST_Node *n = list->last; n != NULL; n = n->prev) {
    process(n);
}

Complexity

Append O(1)
Insert/Delete O(1)
Random Access O(n)

#️⃣ Hash Tables

β–Ά

Symbol tables use hash tables for O(1) variable lookup:

Symbol_Table (size = 8):
slots[0] β†’ NULL
slots[1] β†’ Symbol("x", 42) β†’ Symbol("foo", 10) β†’ NULL
slots[2] β†’ NULL
slots[3] β†’ Symbol("y", 5) β†’ NULL
...

// Collision: "x" and "foo" both hash to slot 1
// Solution: External chaining (linked list per slot)

FNV-1a Hash Function

long hash_string(String s) {
    long hash = 2166136261u;  // FNV offset
    for (long i = 0; i < s.count; i++) {
        hash ^= s.data[i];
        hash *= 16777619;     // FNV prime
    }
    return hash;
}

Load Factor

Load factor = entries / slots

When load factor > 0.75, double table size and rehash. This keeps chains short!

🌳 Trees & Traversals

β–Ά

Your AST is a tree! Understanding traversal order is crucial:

Expression: 2 + 3 * 4

        +
       / \
      2   *
         / \
        3   4
Pre-order
Root, Left, Right
+ 2 * 3 4
In-order
Left, Root, Right
2 + 3 * 4
Post-order
Left, Right, Root
2 3 4 * +

Post-order = Stack machine order! That's why code gen uses post-order traversal.

πŸ”€ Graphs & DFS/BFS

β–Ά

Control flow forms a graph. DFS and BFS help analyze it:

if (x > 0) {     Control Flow Graph:
    y = 1;       
} else {              [entry]
    y = 2;              β”‚
}                      ↓
                   [x > 0?]
                   /      \
              [y=1]      [y=2]
                   \      /
                    [exit]

Representations

Adjacency List
0: [1, 2]
1: [3]
2: [3]
3: []
Adjacency Matrix
  0 1 2 3
0[0 1 1 0]
1[0 0 0 1]
2[0 0 0 1]

DFS: Go deep first (use stack). BFS: Go wide first (use queue).

πŸ“ˆ Big-O Quick Reference

β–Ά
Structure Access Insert Delete
Array O(1) O(n) O(n)
Dynamic Array O(1) O(1)* O(n)
Linked List O(n) O(1) O(1)
Hash Table O(1) O(1) O(1)
BST O(log n) O(log n) O(log n)

* Amortized O(1) for dynamic array append

🧠 Check Your Understanding

β–Ά
Which data structure is best for O(1) variable lookup?
Array
Linked List
Binary Tree
Hash Table
Which traversal produces stack machine order for expressions?
Pre-order
In-order
Post-order
Level-order
← Memory & Pointers Next: CPU Architecture β†’