← All posts

How We Built sqlite-leap: A Production Rewrite With Zero Hand-Written Code

sqlite-leap is a multi-language rewrite of SQLite generated entirely from a single language-neutral specification. No engineer wrote a line of application code by hand. The spec drove everything. Seven days. Two Claude Max usage limits burned through. Five engines. This post explains how we did it, what we learned, and why it matters.

What We Built

We took SQLite — one of the most deployed software libraries in the world, a 24-year-old C codebase with byte-level on-disk invariants — and rewrote it from scratch across five programming languages using AI agents working from a single structured spec.

The spec lives under parts/ in the repository: ~36,000 lines of language-neutral prose and schema. The agents generated ~208,000 lines of engine code across the five language trees (~249,000 with per-target test and benchmark harnesses). The five targets: C, Rust, Zig, Go, and Python, plus a WASM build from the C engine. That's a 5.8× leverage ratio — 36K lines of spec in, 208K lines of working engine code out.

The repository is public: github.com/safitudo/sqlite-leap. Engine source is gitignored — the spec and the tests are what's checked in. Because code is regenerable; the spec is the artifact you keep.

Why SQLite

We needed a proof that spec-driven development works at production scale, not just on toy examples. SQLite is the ideal test case for three reasons:

  1. It is genuinely hard. SQLite handles parsing, query planning, B-tree storage, transactions, and on-disk byte invariants that have been stable for 24 years. This is not a TODO app.
  2. It has a definitive correctness gauntlet. The upstream sqllogictest corpus is 622 files that SQLite uses to test itself. There is no arguing with those numbers.
  3. It has byte-level observability. Two implementations are either byte-identical on the same input or they aren't. No fudging the results.

If spec-driven development holds for SQLite, it holds for your CRM, your auth service, your billing engine, your legacy system nobody wants to touch. SQLite was the hardest reasonable target we could pick.

The Specification

The spec is entirely language-neutral. It describes what SQLite does, not how any particular implementation should do it. Organized into discrete parts/ files covering:

No code. No language references. Just behavior and acceptance criteria. The 36K lines are the most valuable thing in the repo — they survive every regeneration, every model upgrade, every language port.

The Process

Step 1: Write the spec

We worked from the SQLite documentation, the file format spec, and the source code itself. Every behavior got translated into a discrete acceptance criterion in parts/. This was the most time-intensive part of the week — writing a spec precise enough for an agent to execute correctly requires understanding the domain deeply enough to define it without ambiguity.

Step 2: Generate implementations

Each language implementation was generated by an AI agent working from the same spec. The agent received the spec as context and produced application code in the target language plus a per-target test harness. The agents worked independently — no implementation was used as a reference for another. Each one was generated from the spec alone.

Step 3: Validate against sqllogictest

The correctness bar was SQLite's own upstream test corpus: 622 files, every statement counted. We ran all five targets against the full suite on Linux x86_64 and compared record-level pass rates against mainline SQLite. No partial credit, no skipping files that seemed hard.

Step 4: Verify byte identity

We ran two test fixtures — a small one (270 rows, one B-tree page split) and a large one (5,000 rows, deep tree splits) — through mainline SQLite and all five leap engines, then compared SHA1 hashes on the output .db files. All six implementations. Same hash. This was the finding we didn't expect.

Results

MetricValue
Lines of specification (parts/)~36,000
Languages generatedC, Rust, Zig, Go, Python + WASM
Lines of generated engine code~208,000 (~249K with harness)
Spec → code leverage~5.8×
Hand-written application code0 lines
Byte identity vs. mainline SQLiteAll 5 engines — identical SHA1 on both fixtures
Crashes (compiled targets)0

sqllogictest correctness — 622-file upstream corpus, record-level pass rate:

Targetexcl-SKIPexec / mainline surface
sqlite-mainline100.00%100.00%
sqlite-leap-rust99.98%99.96%
sqlite-leap-c99.97%99.96%
sqlite-leap-python99.98%98.94%
sqlite-leap-go99.92%99.93%
sqlite-leap-zig98.88%99.96%

The four compiled targets (Rust, C, Go, Zig) each attempt 99.93–99.96% of mainline's record surface with zero crashes. leap-python covers 98.94% — the pure-Python interpreter still times out on some of the heavier random/* files.

What We Learned

1. The spec is harder to write than the code

Writing a specification precise enough for an AI agent to execute correctly is significantly harder than writing the code yourself. But the spec is reusable. The code is disposable. A good spec can generate new implementations in any language at any time — we proved that across five languages from a single source.

2. Byte identity was a surprise

We expected the implementations to be functionally correct. We didn't expect them to be byte-for-byte identical to a 24-year-old C codebase that nobody's allowed near for stability reasons. Five languages, five fresh implementations, same SHA1. That's what a precise spec does — it forces convergence on the standard, not just something that passes tests.

3. The test suite is the most valuable artifact

The code can be regenerated at any time from the spec. The spec can be refined. But the test suite — 622 files of SQLite's own correctness gauntlet — is the thing that tells you whether you're right. It's the only artifact that survives every regeneration, every model upgrade, every language port.

4. Agent-agnostic specs produce agent-agnostic results

Because the spec is language-neutral and tool-neutral, we can regenerate implementations using any AI model. The spec works across Claude, GPT, Gemini. This is what we mean by agent-agnostic delivery: the spec is the contract, the model is interchangeable.

What We're Not Claiming

Being honest about the scope matters:

Why This Matters

sqlite-leap is not a product. It is a proof. The mental model it makes obsolete:

The artifact you keep isn't the code. It's the spec and the tests. Code is regenerable, in any language, by any agent. That's the thesis behind everything Leap Agentic does.

Explore the Project

The full repository — spec, tests, raw CSVs, and the deep writeup — is public:

github.com/safitudo/sqlite-leap

Emitted source for all five engines is published as src-{c,rust,zig,go,python}.tar.gz in the v0.1.1 release with SHA256SUMS, if you want to browse, build, or audit without spinning up an agent.

Next step: If you want to see how this methodology applies to your engineering team, book the $500 diagnostic. We'll assess your current process across 12 dimensions and show you where spec-driven development would have the highest impact.