ALWAYS work on a git worktree and branch. NEVER modify files directly on main. main must always remain clean. Every piece of work — features, bug fixes, docs, even CLAUDE.md edits — must happen on a dedicated branch in a worktree:
git worktree add .worktrees/<name> -b <branch-name>
cd .worktrees/<name>
# do work, commit, then open a PRAgents can work autonomously end-to-end: create worktrees, make changes, push branches, create PRs, monitor CI, and merge when green. You have push access to feature branches and merge access to PRs.
- Create a worktree and branch
- Make changes, run
npm run verify:quick, commit git push origin <branch>— push to remotegh pr create— open a PRgh pr checks <number>— monitor CI- When CI is green:
gh pr merge <number> --squash --delete-branch— merge to main - Clean up:
cd /Users/csmith/git/ChadScript && git worktree remove .worktrees/<name> - Pull main and continue with next task
Every PR must be seen through to completion — don't just open and walk away. Monitor CI, fix failures, merge when green, delete the remote branch, and remove the local worktree.
Never push to main directly. Always go through PRs.
PR descriptions should be user-centric — lead with how the change affects users (better error messages, fewer crashes, new capabilities), not just technical implementation details.
After completing each todo:
- Run unit tests
- If tests pass, commit the changes
- If tests fail, fix them before moving to the next todo
- Never move on to the next todo while tests are failing
Before considering any feature complete, run the full self-hosting chain:
npm run verify— runs tests and self-hosting in parallel (preferred)npm run verify:quick— same but skips Stage 2 (day-to-day dev)
Or manually:
npm test— all tests pass (auto-uses native compiler if.build/chadexists)bash scripts/self-hosting.sh— full 3-stage self-hostingbash scripts/self-hosting.sh --quick— skip Stage 2
New features have complex side effects that may not be caught by unit tests alone. A change that passes all tests can still break self-hosting. The Stage 2 test is the true verification — it proves the compiler's output is correct enough to compile itself.
Version is defined in one place: package.json. npm run build auto-generates src/version.ts via the prebuild script (scripts/gen-version.js). Both chad-node.ts and chad-native.ts import VERSION from there.
To bump a version:
- Edit
versioninpackage.json npm run build(regeneratessrc/version.ts)- Merge to main
git tag v<version> && git push origin v<version>— CI creates a GitHub Release with binaries
After a rebase or merge that brings in new codegen features, .build/chad becomes stale — it was compiled
from the old source and doesn't know how to compile the new features. Rebuild it:
rm -f .build/chad && node dist/chad-node.js build src/chad-native.ts -o .build/chadTests auto-detect .build/chad and use it over node dist/chad-node.js. A stale native compiler causes
mysterious test failures that pass fine with the node compiler.
Each worktree builds its own vendor/ — do not symlink it from another worktree or the main repo. Different branches may have different c_bridges/ sources, and a shared vendor dir causes races and silent corruption when multiple agents build concurrently.
bash scripts/build-vendor.shnpm test rebuilds dist/ only if src/ is newer, and builds .build/chad only if missing.
TypeScript-to-native compiler using LLVM IR. Compiles .ts/.js files to native binaries via: Parser → AST → Semantic Analysis → LLVM IR Codegen → clang (compile + link) → native binary.
| Dir | Purpose |
|---|---|
src/semantic/ |
Semantic analysis passes run before codegen (closure mutation, union types) |
src/codegen/ |
LLVM IR code generation (the core) |
src/codegen/expressions/method-calls.ts |
Central dispatcher for all object.method() calls |
src/codegen/types/collections/string/ |
String method IR generators (manipulation.ts, search.ts, split.ts, etc.) |
src/codegen/types/collections/string.ts |
StringGenerator facade that delegates to sub-modules |
src/codegen/types/collections/array.ts |
ArrayGenerator facade that delegates to sub-modules |
src/codegen/types/collections/array/ |
Array sub-modules (literal, mutators, search-predicate, iteration, combine, etc.) |
src/codegen/stdlib/ |
Built-in module generators (console.ts, process.ts, fs.ts, math.ts, etc.) |
src/codegen/infrastructure/ |
Core: generator-context.ts, symbol-table.ts, type-resolver.ts |
src/codegen/llvm-generator.ts |
Main orchestrator, delegates to sub-generators |
src/ast/types.ts |
AST node type definitions |
tests/compiler.test.ts |
Main test suite |
tests/test-discovery.ts |
Auto-discovers test fixtures via @test annotations |
tests/fixtures/ |
Test fixture programs organized by category (auto-discovered) |
c_bridges/ |
C bridge files for complex runtime helpers (regex, json, os, etc.) |
- IR Generation: Add function in
src/codegen/types/collections/string/manipulation.ts(or search.ts, etc.) - Facade: Add
doGenerateX()insrc/codegen/types/collections/string.ts(StringGenerator class) - Dispatch: Add
if (method === 'x')block insrc/codegen/expressions/method-calls.ts - Handler: Add
private handleX()method in method-calls.ts - Test: Add fixture in
tests/fixtures/strings/(auto-discovered, no registry needed)
NOTE: Prefer direct field access (ctx.stringGen.doMethod()) over adding wrapper methods to IGeneratorContext. Concrete type propagation in loadFieldValue (member.ts) ensures chained access through interface fields works in the native compiler.
- Check if existing generator handles it (e.g.,
src/codegen/stdlib/process.ts) - Most built-ins are handled inline in
method-calls.tsfor performance - For member access (not method calls), look at
src/codegen/expressions/member.ts - Test: add fixture in
tests/fixtures/builtins/(auto-discovered, no registry needed)
| LLVM Type | JS Type |
|---|---|
%Array = type { double*, i32, i32 } |
number[] (data ptr, length, capacity) |
%StringArray = type { i8**, i32, i32 } |
string[] |
%ObjectArray = type { i8*, i32, i32 } |
object[] |
%Uint8Array = type { i8*, i32, i32 } |
Uint8Array (data ptr, length, capacity) |
i8* |
string (null-terminated C string) |
double |
number |
i1 |
boolean |
Tests are auto-discovered from tests/fixtures/ via @test annotations in tests/test-discovery.ts.
No manual registry — just add a fixture file and it's picked up automatically.
Annotation format (in the first 10 lines of each fixture file):
// @test-exit-code: 12— assert process exits with code 12// @test-args: hello world— pass CLI args to the compiled binary// @test-description: ...— custom test description// @test-skip— exclude from auto-discovery
Defaults (no annotation needed):
expectTestPassed: true— asserts stdout containsTEST_PASSEDand exit code 0- Description auto-generated from filename:
string-split-length.ts→string split length
Run tests: npm test or npm run test:full (via node scripts/test.js)
Run tests + self-hosting: npm run verify (or npm run verify:quick to skip Stage 2)
Build: npm run build (TypeScript → dist/)
Tests auto-detect .build/chad and use it instead of node dist/chad-node.js (~10x faster per compile).
compiler.test.ts runs at concurrency 32; smoke.test.ts at concurrency 8.
ctx.nextTemp()— get next SSA temp variable name (%1, %2, etc.)ctx.nextLabel(prefix)— get next unique label for control flowctx.emit(line)— emit a line of LLVM IRctx.generateExpression(expr, params)— recursively generate an expressionctx.setVariableType(name, type)— tell the type system what type a temp iscreateStringConstant(ctx, value)— create a global string constant, returns i8*GC_malloc_atomic(size)— allocate GC'd memory for non-pointer data (strings)GC_malloc(size)— allocate GC'd memory that may contain pointers
Prefer structured builder methods over raw ctx.emit() for all supported instructions:
Memory & access: emitStore, emitLoad, emitGep, emitCall, emitCallVoid, emitBitcast
Comparison: emitIcmp, emitFcmp
Control flow: emitBr, emitBrCond, emitLabel, emitRet, emitRetVoid, emitUnreachable
Arithmetic: emitAdd, emitSub, emitMul, emitFAdd, emitFSub, emitFMul, emitFDiv, emitSRem, emitFRem, emitFNeg
Casts: emitZext, emitSext, emitTrunc, emitSitofp, emitFptosi, emitPtrtoint, emitInttoptr
Bitwise: emitAnd, emitOr, emitXor, emitShl, emitAShr, emitLShr
Other: emitPhi, emitSelect, emitAlloca
Keep ctx.emit() only for: inbounds GEP, !tbaa metadata, call void @llvm.memcpy, and other
exotic instructions without builders.
LLVM basic blocks must end with exactly one terminator instruction (ret, br, unreachable, switch).
Rather than parsing emitted strings to detect terminators, we use a parallel outputIsTerminator: boolean[]
that auto-classifies every instruction at emit() time. Use ctx.lastInstructionIsTerminator() to check.
Single source of truth: The classification logic lives in terminator-classifier.ts as a standalone
function. Both BaseGenerator and MockGeneratorContext delegate to it. To add a new terminator
(e.g., invoke, indirectbr), update classifyTerminator() in that one file.
Builder methods (emitRet, emitRetVoid, emitBr, emitBrCond, emitUnreachable, emitLabel) are
available on BaseGenerator, LLVMGenerator, and MockGeneratorContext for type-safe terminator emission.
method-calls.ts → generateMethodCall() checks object type and method name:
- Static methods first (Object.keys, Array.from, Promise.all, etc.)
- Built-in objects (console, process, fs, path, JSON, Math, Date)
- String methods (trim, indexOf, split, replace, etc.)
- Array methods (push, pop, map, filter, find, etc.)
- Map/Set methods
- Class/interface method dispatch (vtable lookup)
For complex runtime logic (nested loops, string manipulation, data structure building), prefer writing
C bridge functions in c_bridges/ over raw LLVM IR string concatenation. C bridges are easier to read,
debug, and maintain.
Pattern:
- Create
c_bridges/your-bridge.cwithcs_prefixed functions - Add build step in
scripts/build-vendor.sh(compile to.o) - Declare extern functions in LLVM IR and call them from codegen
- Add conditional linking in
src/compiler.tsandsrc/native-compiler-lib.ts - Add to
scripts/build-target-sdk.shbridge list (cross-compile SDK packaging) - Add to
ci.ymlin all 4 places: Linux verify loop, Linux release copy, macOS verify loop, macOS release copy
Existing bridges: regex-bridge.c, yyjson-bridge.c, os-bridge.c, child-process-bridge.c,
child-process-spawn.c, lws-bridge.c, treesitter-bridge.c.
child_process.spawn() returns an opaque handle (i8*) for long-lived children with
interactive stdio (DAP adapters, REPL subprocesses, etc.):
const h = child_process.spawn("lldb-dap", [], onStdout, onStderr, onExit);
child_process.writeStdin(h, "Content-Length: 42\r\n\r\n{...}");
child_process.endStdin(h); // sends EOF
child_process.kill(h, 15); // optional signum, default SIGTERMHandle lifecycle is refcounted in child-process-spawn.c (proc + 3 pipes each hold a ref;
freed when all closed). onExit fires once after stdout/stderr/proc all close; stdin close
doesn't gate exit (user controls it). Writes after child exit, kill after exit, and
writeStdin/endStdin on null handles are all no-ops — never crash.
- Hoist allocas to entry block — never in conditional branches
- Store pointers as
i8*—doubleloses 64-bit precision - Check class before interface — try
findClassImplementingInterface()BEFOREinterfaceStructGen.hasInterface() - Load array values in objects — load the value, don't pass the alloca
- Type cast field order must match FULL struct layout — when the type extends a parent interface, the struct includes ALL parent fields.
as { name, closureInfo }on aLiftedFunction extends FunctionNode(10 fields) reads index 1 instead of index 9. Include every field. ret voidnotunreachableat end of void functions- Class structs: boolean is
i1; Interface structs: boolean isdouble - Propagate declared type before generating RHS for collection fields — when a class field is typed
Set<string>,Map<K,V>, etc., callsetCurrentDeclaredSetType/setCurrentDeclaredMapTypebeforegenerateExpressionon the RHS so thatnew Set()/new Map()without explicit type args picks the right generator. SeehandleClassFieldAssignmentinassignment-generator.ts. - Set feature flags when emitting gated extern calls — runtime declarations for C bridges (yyjson, curl, etc.) are conditionally emitted behind flags like
usesJson,usesCurl. Any code path that emitscall @csyyjson_*must callctx.setUsesJson(true), etc. Missing this causes "undefined value" errors fromclangbecause thedeclareis never emitted.
When building field lists for an interface (keys/types arrays for ObjectMetadata), always use
getAllInterfaceFields(interfaceDef) instead of interfaceDef.fields. The latter only returns the
interface's OWN fields, missing inherited fields from extends. This causes wrong GEP indices for
any interface with inheritance. All current allocation methods (allocateDeclaredInterface,
allocateMemberAccessInterface, allocateFunctionInterfaceReturn, etc.) use getAllInterfaceFields
correctly — maintain this when adding new ones.
Prefer for...of over index-based for loops when iterating arrays — it's fully supported and more idiomatic:
for (const item of items) { ... } // good
for (let i = 0; i < items.length; i++) {} // only when index is needed- Prettier auto-formats code; run
npm run formatto fix,npm run format:checkto verify - One-line comments are helpful on dense codegen blocks — explain the "why" or the LLVM IR pattern, not the "what"
- Use named AST types from
src/ast/types.tsfor type assertions instead of inlineas { ... }structs - There are several MASSIVE files. Where possible, do not add to them. Make a new file, and leave a comment in the other file to not touch it anymore, and to progressively break it down into smaller files.
newin class field initializers — codegen handles simplenew X()in field initializers (both explicit and default constructors), but complex nested class instantiation may have edge cases. Prefer initializing in constructors for safety.- Type assertions must match real struct field order AND count —
as { type, left, right }on a struct that's{ type, op, left, right }causes GEP to read wrong fields. Fields must be a PREFIX of the real struct in EXACT order. Watch out forextends: ifChild extends Parent, the struct has ALL of Parent's fields first, then Child's. A type assertion on a Child must include Parent's fields too — even optional ones the object literal doesn't set (the compiler allocates slots for them anyway, filled with null/0). allocafor collection structs stored in class fields —%Set,%StringSet,%Map, and similar structs must be heap-allocated viaGC_malloc, notalloca. Stack-allocated structs become dangling pointers when stored in a class field after the constructor returns. UseemitCall("i8*", "@GC_malloc", "i64 N") + emitBitcast(...)instead ofemit("... = alloca %Foo").- Never invent a subset/partial type for a type assertion — always use the real AST type from
src/ast/types.ts. An inventedtype FunctionMeta = { name, returnType, parameters }silently generates wrong GEP indices when its field order doesn't match the actual struct. Theobject-method.jsSIGSEGV was caused by exactly this:FunctionMeta.parameterssat at index 2, butFunctionNode.parametersis at index 6, so every access read the wrong field. The real type always works — TypeScript's structural typing lets you access any subset of fields safely without redefining a partial interface. ||fallback makes member access opaque —const x = foo.bar || { field: [] }stores the result asi8*(opaque pointer) because the||merges two different types. Subsequent.fieldaccess onxdoes NOT generate a GEP — it just returnsxitself. Fix: use a ternary that preserves the typed path:const y = foo.bar ? foo.bar.field : []. This applies to any||or??where the fallback is an inline object literal.
Self-hosting limitations:
- No import aliases —
import { foo as bar }compilesbar(...)as@_cs_barwhich doesn't match the original@_cs_foo. Use the original name. - No union type parameters in standalone functions —
fn(x: Expression)whereExpressionis a union emits the TS type name literally. Keep union-typed parameters in class methods.
ChadScript merges all imported files into one flat AST. export default maps the local import name to the exported name via importAliases (resolved in resolveImportAlias()). Re-exports synthesize ImportDeclaration entries — semantically equivalent to imports.
Enum declarations emit a compile error with a suggestion to use as const objects instead. The semantic
pass enum-checker.ts rejects all enums before codegen. The compiler's own internal "enums" (SymbolKind,
VarKind, LogLevel) were converted to plain const number values (e.g., const SymbolKind_Number = 0).
MethodCallNode.optional (must be at END of interface — see rule #3 above) triggers null-check branching in generateOptionalMethodCall in method-calls.ts. Pattern: evaluate object → icmp null → branch → phi merge, similar to generateOptionalChain for property access.
allocateAwaitResult in variable-allocator.ts must inspect the awaited expression to determine the correct SymbolKind. Default is i8*/string, but Promise.all() resolves to %ObjectArray*. For each new async API that resolves to a specific type, add a detection case to allocateAwaitResult.
Semantic passes live in src/semantic/ and run before codegen (called from LLVMGenerator.generateParts()).
They catch errors that would produce silently wrong native code — the native compiler can't throw exceptions
at runtime, so these must be compile-time errors.
Current passes:
closure-mutation-checker.ts— ChadScript closures capture by value. Mutating a variable after capture produces silently wrong results. This pass detects post-capture assignments and emits a compile error.union-type-checker.ts— Type alias unions liketype Mixed = string | numberbypass the inline union check. This pass resolves aliases and rejects unions whose members map to different LLVM representations.
To add a new semantic pass: create src/semantic/your-check.ts, export a checkX(ast: AST): void function,
and call it from generateParts() in llvm-generator.ts.
New per-function state goes in BaseGenerator.reset() if it's a base field, or after super.reset() in the override for LLVMGenerator-only fields.
orchestrator.ts must never silently generate null pointers (inttoptr i64 0 to i8*) for unrecognized
expressions. These nulls are UB that LLVM -O2 can exploit to prune unrelated code paths. Both fallback
paths (empty type, unsupported type) now call ctx.emitError() which is never-typed — it exits the
compiler immediately. If a new expression type is added to the parser, add a handler in the orchestrator;
don't rely on a fallback.