How I Ship Code Without Reading It

Yesterday I shipped 1,300 lines of infrastructure code and I didn't read most of it. Docker images, a 9-phase build script, an egress proxy with per-phase network isolation, Nomad job templates, the whole thing. And no, it's not because I don't care. It's because I found something that (mostly) works better than reading.

Let me explain.

I'm building Zerokie, an EU-sovereign deployment platform. Think Vercel, but without the part where your data lives on US servers and you have to trust that they pinky-promise to follow EU law (which in certain circumstances they literally cannot do). The Go API for the build service was already deployed, but it needed the actual infrastructure to run builds: a builder image with 7 toolchains, an egress proxy that restricts network access per build phase, and an 841-line entrypoint script that orchestrates the whole thing from clone to deploy.

That's a lot of moving parts. The kind of thing where every experienced engineer has a war story about spending three days debugging a race condition.

Here's what I did instead of writing it myself: I wrote a spec. Not pseudocode, not a TODO list, a real specification. What the builder image must contain, what the egress proxy must do per phase, what happens on timeout, what the error messages say. 269 lines of "this is what I want."

Then I let two AIs fight about it.

My household AI (yeah, I have one of those, his name is Lares and he's a whole story for another day) acts as the architect. He took the spec and ran it through VSDD, a (soon to be open source) tool I built that pits a "builder" AI against an "adversary" AI in structured review rounds.

The adversary's job is simple: find everything wrong. Be hostile. Assume every ambiguity will cause a production incident at 03:00 on a Sunday. The builder's job is to fix what the adversary finds. They go back and forth until neither can find anything wrong with the other's work (which sometimes means when the adversary starts hallucinating problems).

11 rounds of spec review later, the spec had grown from 269 to 591 lines. Every single round found real issues. Not nitpicks, but actual problems that would have bitten me in production.

One of the good ones: the adversary found that the health check endpoint would return OK even if the proxy had crashed, because busybox httpd was still serving a static 200 regardless of whether tinyproxy was alive behind it. That would have been fun to debug. "The health check says everything is fine but builds keep timing out." I would have lost my mind.

Another good one: the build script signals 9 phases but the proxy only knew about 4. Silent network blocks on 5 out of 9 phases. Builds would have failed in ways that look like DNS resolution errors or timeouts, and I would have looked at literally everything else before finding the root cause.

After the spec converged, and while I was playing Helldivers II online with a friend, Lares spawned a coding agent to implement it, then ran the adversary against the actual code. 7 more rounds.

Round 1 found 3 blockers, and they were all the same bug in different clothes: environment variable name mismatches between the Go code and the bash script. The Go side used PLATFORM_API_URL. The bash side expected BUILD_CALLBACK_URL. Both are perfectly sensible names. The Go code compiled. The bash script validated its env vars at startup. Everything looks fine until you actually try to run a build and it fails immediately because the variable is empty.

I most likely wouldn't have caught this until I tried to run the whole thing. Be honest with yourself, you likely wouldn't have either. To find it, you need to hold two files in your head simultaneously (sorta), something that LLMs are better then us at doing. The adversary just...did.

Round 4 found a pipe bug with kaniko. You know this one if you've been writing shell scripts long enough: run_timed 15m /kaniko/executor ... 2>&1 | tail -100 always returns exit code 0 because tail succeeds even when kaniko fails. Container build failures would have been silently swallowed. I've been bitten by this exact thing before, in the good old days working in Network Monitoring at AWS, and I still would have written it the same way, because it looks correct and I apparently don't learn from experience.

Round 6 found a cache extraction path bug. Go and Rust dependencies live at absolute paths — /go/pkg/mod, /usr/local/cargo. When you tar them, the leading / gets stripped. Extracting into /build/repo puts them at /build/repo/go/pkg/mod instead of /go/pkg/mod. Cache restore produces empty caches. Every Go and Rust build is slow and nobody knows why.

By round 7, the adversary had nothing above minor severity. Done. Converged.

So what did I actually do during all of this?

I wrote the spec. That's my job, I understand what a build service needs because I've been doing this for a while. I reviewed the adversary's findings, which is mostly pattern recognition (are these real issues or is the AI hallucinating problems?) I made architectural decisions. I decided when to call convergence. I played videogames with a friend.

I did not read the 841-line build script line by line. I did not trace the environment variable flow from Go through Nomad through Docker through bash. I did not check whether echo "clone" has a trailing newline that breaks a string comparison in a different container (it did...like I hinted earlier, the system is not perfect).

My role was judgment, not inspection.

Now, this isn't magic. The adversary hallucinates sometimes. Around round 8 of the spec review, it expanded its scope instead of converging, and findings jumped from 12 to 31 because it started inventing problems. You need to recognize when that's happening and pull the plug. It also doesn't replace understanding, I wrote the spec because I know what I'm building. The adversary found bugs in the implementation, not flaws in the concept, if that makes sense.

But for the space between "I know what this should do" and "this code actually does it" (which is the space where every production incident I've ever had lives) adversarial verification is the best tool I've found so far.

14 real bugs caught. About 4 hours from spec to converged code. Compare that to writing it myself (days), or reviewing someone else's code and missing the subtle bugs anyway, or shipping it and discovering the cache extraction bug after a week of "why are the Go builds so slow?"

This isn't perfect, the code isn't perfect, there are definitely bugs we have not caught, but it got me to "good enough" fast enough. And there would have been bugs I had not caught even if I wrote it.

The VSDD tool I built (using VSDD itself) will be open source. It's a Rust CLI that works with multiple AI backends. The build infrastructure spec and all 18 review rounds are in a private monorepo which I cannot share. So if you want to see what adversarial convergence actually looks like...be patient until I release the tool I guess?

This is a real build session from March 19, 2026. The code is running. The bugs were real. I still haven't read most of it.