πŸ”§ Week 2 β€’ +100 XP

Introduction to C

C fundamentals through the lens of Python/Java. Learn how C differs and why those differences matter for building compilers.

πŸ“¦ Types & Variables

β–Ά

The big difference: C is statically typed. Once a variable has a type, it can never change.

🐍 YOU KNOW (Python): Dynamic Typing

x = 42          # x is int
x = "hello"     # Now x is str - perfectly fine!
x = [1, 2, 3]   # Now x is list - still fine!

βš™οΈ IN C: Static Typing

int x = 42;        // x is ALWAYS an int
x = "hello";       // ❌ ERROR! Type mismatch
x = 100;           // βœ“ OK - still an int
WHY DIFFERENT?
  • Python discovers types at runtime (slow but flexible)
  • C knows types at compile time (fast but rigid)
  • No type checking overhead at runtime in C

C Type Sizes (Reference Table)

Type Bytes Range Use Case
char 1 -128 to 127 Single character
int 4 -2Β³ΒΉ to 2Β³ΒΉ-1 Most integers
long 8 -2⁢³ to 2⁢³-1 Large integers, pointers
float 4 ~7 decimal digits Rarely used
double 8 ~15 decimal digits Floating point
bool 1 0 or 1 Boolean (needs stdbool.h)
void* 8 N/A Generic pointer

⚠️ Declaration vs Initialization

int x;           // DECLARATION: x exists but has GARBAGE value!
printf("%d", x); // Undefined behavior (could print anything)

int x = 0;       // βœ… ALWAYS INITIALIZE - Safe!
🚨 CRITICAL: Uninitialized variables in C contain random garbage! This causes bugs that are extremely hard to track down.

Constants

🐍 Python
MAX_SIZE = 100
# Convention only - 
# you CAN change it!
βš™οΈ C
const int MAX_SIZE = 100;
MAX_SIZE = 200; // ❌ ERROR!

#define MAX_SIZE 100  // Macro

πŸ”§ Functions

β–Ά

🐍 YOU KNOW (Python)

def add(a, b):
    return a + b

result = add(5, 3)  # result = 8

βš™οΈ IN C: Must Declare Types

int add(int a, int b) {
    return a + b;
}

int result = add(5, 3);  // result = 8

Function Anatomy

int           add    (int a, int b)     {  return a + b;  }
β”‚             β”‚       β”‚                  β”‚  β”‚
return type   name    parameters         β”‚  function body
                                         └─ opening brace

Forward Declarations

C requires functions to be declared before they're used:

// ❌ ERROR: add not declared yet
int main() {
    int x = add(5, 3);  // Compiler doesn't know about add!
    return 0;
}

int add(int a, int b) {
    return a + b;
}

FIX: Forward declaration

// βœ… WORKS - declare signature first
int add(int a, int b);  // Forward declaration

int main() {
    int x = add(5, 3);  // OK - compiler knows about add
    return 0;
}

int add(int a, int b) {  // Definition can come later
    return a + b;
}
πŸ› οΈ IN YOUR COMPILER:
// In parser.c - forward declarations at top
static AST_Node *parse_additive(Parser *parser);
static AST_Node *parse_multiplicative(Parser *parser);
static AST_Node *parse_primary(Parser *parser);

// Now these functions can call each other in any order!

The main() Function

int main(int argc, char *argv[]) {
    // argc = argument count
    // argv = argument values (array of strings)
    
    printf("Hello\n");
    return 0;  // 0 = success, non-zero = error
}

// Simple version:
int main() {
    printf("Hello\n");
    return 0;
}

πŸ“ printf - Formatted Output

β–Ά

🐍 YOU KNOW (Python): f-strings

x = 42
name = "Alice"
print(f"{name} has {x} points")  # Python 3.6+

βš™οΈ IN C: printf with format specifiers

int x = 42;
char *name = "Alice";
printf("%s has %d points\n", name, x);

Format Specifiers Reference

Specifier Type Example
%d int printf("%d", 42) β†’ 42
%ld long printf("%ld", 123L) β†’ 123
%x hex printf("%x", 255) β†’ ff
%p pointer printf("%p", ptr) β†’ 0x7fff...
%s string (char*) printf("%s", "hi") β†’ hi
%c character printf("%c", 'A') β†’ A
%.2f double (2 decimals) printf("%.2f", 3.14159) β†’ 3.14
%% literal % printf("%%") β†’ %
⚠️ CRITICAL: Type MUST match specifier!
int x = 42;
printf("%s", x);   // ❌ DANGER! Expects string, got int
printf("%d", "hi"); // ❌ DANGER! Expects int, got string
// Both cause undefined behavior (crashes, garbage output)
πŸ› οΈ IN YOUR COMPILER:

Your String struct has no null terminator, so use:

// Print exactly N characters with %.*s
printf("%.*s\n", (int)s.count, s.data);

// Or use the PRINT_STRING macro:
#define PRINT_STRING(s) (int)((s).count), (s).data
printf("Token: %.*s\n", PRINT_STRING(tok.source));

πŸ“ Structs

β–Ά

Structs are like classes with only data, no methods. They're used everywhere in your compiler!

β˜• YOU KNOW (Java): Classes

class Point {
    int x;
    int y;
    // In Java, you'd add methods here
}

Point p = new Point();
p.x = 10;
p.y = 20;

βš™οΈ IN C: Struct (data only)

struct Point {
    int x;
    int y;
};

struct Point p;  // Note: need "struct" keyword
p.x = 10;
p.y = 20;

typedef - Creating Type Aliases

Writing struct Point everywhere is tedious. Use typedef:

// Define struct AND create type alias in one go:
typedef struct {
    int x;
    int y;
} Point;

// Now can use just "Point" instead of "struct Point"
Point p;
p.x = 10;

Struct Initialization Methods

Method 1: Field by field
Point p;
p.x = 10;
p.y = 20;
Method 2: Positional
Point p = {10, 20};
// x=10, y=20
βœ… Method 3: Named (best!)
Point p = {
    .x = 10,
    .y = 20
};
Method 4: Zero init
Point p = {0};
// All fields = 0

Accessing Struct Fields

With struct variable: use .
Point p;
p.x = 10;     // dot operator
With pointer: use ->
Point *ptr = &p;
ptr->x = 10;  // arrow operator
πŸ› οΈ IN YOUR COMPILER - Nested Structs:
typedef struct {
    Token_Kind kind;
    String source;    // Nested struct!
    Loc loc;          // Nested struct!
    long long_value;
} Token;

// Access nested fields:
tok.source.data = "42";
tok.loc.line = 1;

🏷️ Enums & Unions

β–Ά

Enums: Named Constants

Type-safe alternatives to magic numbers:

🐍 YOU KNOW (Python)

# Option 1: Strings (error-prone)
token_kind = "KEYWORD"
if token_kind == "KEYWROD":  # Typo - no error!
    ...

# Option 2: Class attributes
class TokenKind:
    KEYWORD = 0
    INTEGER = 1

βš™οΈ IN C: Enum (type-safe!)

typedef enum {
    TOKEN_KEYWORD,      // = 0 (auto-numbered)
    TOKEN_INTEGER,      // = 1
    TOKEN_IDENTIFIER,   // = 2
    TOKEN_PLUS,         // = 3
    // ...
} Token_Kind;

Token_Kind kind = TOKEN_KEYWORD;  // Type-safe!
Why enums are better than strings:
  • βœ… Compiler catches typos
  • βœ… Autocomplete in editors
  • βœ… No string comparison overhead
  • βœ… Clear documentation of valid values

Unions: Same Memory, Different Types

A union allows multiple types to share the same memory location:

union Value {
    long int_value;
    double float_value;
    char *str_value;
};

union Value v;
v.int_value = 42;        // Store as int
v.float_value = 3.14;    // OVERWRITES int_value!

Memory Layout

Union (size = 8 bytes - size of LARGEST member):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  int_value   (8 bytes)     β”‚
β”‚  float_value (8 bytes)     β”‚  } All three share
β”‚  str_value   (8 bytes)     β”‚  } the SAME memory!
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tagged Unions - The Safe Pattern

Problem: How do you know which union member is valid?

Solution: Add a "tag" field (enum) to track it!

typedef struct {
    Token_Kind kind;  // The "TAG" - tells us which member is valid
    union {
        long int_value;      // Valid if kind == TOKEN_INTEGER
        Keyword keyword;     // Valid if kind == TOKEN_KEYWORD  
        Type type;           // Valid if kind == TOKEN_TYPE
    };
} Token;
πŸ› οΈ IN YOUR COMPILER:
Token tok = next_token(&scanner);

// Always check the tag before accessing union members!
if (tok.kind == TOKEN_INTEGER) {
    long value = tok.long_value;  // βœ… Safe - we checked!
    printf("Got number: %ld\n", value);
}
else if (tok.kind == TOKEN_KEYWORD) {
    Keyword kw = tok.keyword;     // βœ… Safe
    if (kw == KEYWORD_RETURN) {
        // Handle return keyword
    }
}

πŸ‘‰ Pointers: The Core Concept

β–Ά

Pointers store memory addresses. They're the most important concept in C and have no equivalent in Python/Java!

Memory Model Visualization

Memory Address    Value       Variable
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   0x1000     β”‚    42    β”‚   x       β”‚  ← int x = 42;
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   0x1008     β”‚  0x1000  β”‚   p       β”‚  ← int *p = &x;
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    └───── p contains the ADDRESS of x

Two Key Operations

&x
Address-of operator
Gets the memory address of variable x
*p
Dereference operator
Gets the value at address stored in p

Complete Example

int x = 42;       // x holds value 42
int *p = &x;      // p holds ADDRESS of x

printf("%d\n", x);   // Prints: 42 (value of x)
printf("%p\n", p);   // Prints: 0x7fff... (address of x)
printf("%d\n", *p);  // Prints: 42 (value AT address p)

*p = 100;            // Set value at address p to 100
printf("%d\n", x);   // Prints: 100 (x changed!)

Why Pointers Matter

βœ… Modify variables in other functions
void increment(int *x) {
    (*x)++;  // Modifies original!
}

int n = 5;
increment(&n);  // n is now 6
βœ… Avoid copying large structs
// Instead of copying 1000 bytes:
void process(BigStruct *ptr) {
    // Work with ptr->field
    // Only copies 8-byte pointer!
}
πŸ› οΈ IN YOUR COMPILER:
// Parser functions take pointer to avoid copying:
static AST_Node *parse_expression(Parser *parser) {
    Token tok = parser->current;  // Access via arrow
    advance(parser);              // Pass pointer to advance
    // ...
}

Common Pointer Mistakes

// ❌ Uninitialized pointer
int *p;
*p = 42;  // CRASH! p points to random memory

// ❌ Dangling pointer
int *p = malloc(sizeof(int));
free(p);
*p = 42;  // CRASH! Memory already freed

// ❌ NULL dereference
int *p = NULL;
*p = 42;  // CRASH! Can't dereference NULL

// βœ… Always check pointers!
if (p != NULL) {
    *p = 42;  // Safe
}

🧠 Check Your Understanding

β–Ά
How do you access a struct field through a pointer?
ptr.field
ptr->field
*ptr.field
ptr::field
What does &x give you?
The value of x
A copy of x
The memory address of x
A null pointer
What's wrong with: int x = 42; printf("%s", x);
Nothing, it works fine
Missing newline
x needs to be initialized
%s expects a string, not an int
In a tagged union, what's the purpose of the "tag"?
To store extra data
To indicate which union member is valid
To name the union
To set the union size

πŸ“ Practice Problems

β–Ά

Problem 1: Fix the Swap

void swap(int a, int b) {
    int temp = a;
    a = b;
    b = temp;
}

int main() {
    int x = 5, y = 10;
    swap(x, y);
    printf("%d %d\n", x, y);  // Prints: 5 10 (not swapped!)
}
Click for solution

Problem: swap receives COPIES of x and y, not the originals.

// Fix: Use pointers
void swap(int *a, int *b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

int main() {
    int x = 5, y = 10;
    swap(&x, &y);  // Pass addresses
    printf("%d %d\n", x, y);  // Prints: 10 5 βœ“
}

Problem 2: Struct by Value

typedef struct { int x; int y; } Point;

void move(Point p, int dx, int dy) {
    p.x += dx;
    p.y += dy;
}

int main() {
    Point p = {10, 20};
    move(p, 5, 5);
    printf("(%d, %d)\n", p.x, p.y);  // What prints?
}
Click for solution

Prints: (10, 20) - NOT (15, 25)!

Why: move receives a COPY of the struct.

// Fix: Pass pointer
void move(Point *p, int dx, int dy) {
    p->x += dx;
    p->y += dy;
}
move(&p, 5, 5);  // Now prints (15, 25)

Problem 3: Returning Local Array

char *create_greeting() {
    char msg[] = "Hello";
    return msg;  // Is this correct?
}

int main() {
    char *greeting = create_greeting();
    printf("%s\n", greeting);
}
Click for solution

NO! msg is on the stack, destroyed when function returns.

Result: Undefined behavior (garbage or crash).

// Fix: Return string literal or use malloc
char *create_greeting() {
    return "Hello";  // String literal persists
}
// OR
char *create_greeting() {
    char *msg = malloc(6);
    strcpy(msg, "Hello");
    return msg;  // Caller must free!
}
← Week 1 Next: CPU Architecture β†’