spendallthetokens.com

The Tool That Mocks You for the Bug It Has

My agent gateway rotated three snarky banners at me while I debugged a key it was the reason I couldn't use. The banners were right. They weren't right about what they thought they were right about.

My agent gateway rotates a snarky banner across its CLI. Cute touches. Good product energy. Today I watched three of them in a row while trying to fix a bug in the gateway itself. The banners, chronologically:

The second one is the one that broke me. I was staring at the output of a status command I was running specifically because the gateway was refusing to pass an API key to one of its own agents โ€” and the gateway was accusing me of having missing API keys.

I had the key. I'd had the key for weeks. The gateway was the one who couldn't see it.

The Setup

The lab runs a headless Mac Mini. On it lives a self-hosted agent gateway that spawns agent processes. One of those agents uses a local vector database to remember things across sessions. The vector DB needs an OpenAI API key to generate embeddings.

The key was in the gateway user's shell environment. Interactive shells, non-interactive shells, login shells, all of them. When I ran printenv OPENAI_API_KEY as that user, it printed. When I ran the embedding command from that shell, it worked.

When the agent ran the same command from its tool-exec environment? It failed. Every time.

The agent's diagnosis was immediate and confident: interactive shells inherit the key, non-interactive shells don't, here's the fix. It was thoughtful about it. It even gave me the exact pair of commands to prove it:

zsh -lic 'printenv OPENAI_API_KEY | wc -c'
zsh -c   'printenv OPENAI_API_KEY | wc -c'

Both returned 165. Same key. Same length. The agent was wrong.

Where the Key Actually Wasn't

The gateway process itself is a node process. On macOS, its environment gets set by launchd at service-start time. Every child process the gateway spawns โ€” including agent tool-exec subshells โ€” inherits the gateway's environment, not mine.

And the gateway's environment, it turns out, had never included OPENAI_API_KEY.

I confirmed this by switching to an admin terminal and running:

sudo launchctl procinfo <gateway-pid>

Which dumped the gateway's actual runtime environment. PATH was there. HOME was there. A dozen gateway-specific variables were there. OPENAI_API_KEY was not.

The Inheritance Chain
My interactive shell Key present
Launchd plist EnvironmentVariables Key absent
Gateway process environment Key absent
Agent subprocess environment Key absent
Agent's understanding of why Wrong layer

My shell had it. My agent thought it should have it. The machine that connected them had never seen it in its life.

Why This Is Interesting

The surface fix is boring: edit the launchd plist, add the key to the EnvironmentVariables dict, bootout, bootstrap, done. Twenty minutes of careful work with backups.

The interesting part is everything that happened before I found it.

The agent was confident it had the right answer. It was articulate about it. It had a clear diagnostic path. It wrote better bash than I would have written myself at that hour. And it was wrong โ€” not about the shape of the problem (subprocess environment inheritance) but about the layer at which the inheritance failed. It was looking at shells. The break was at launchd.

This is the autoregressive habit again. The agent could reason confidently about the world one abstraction layer away from itself โ€” shells, files, configs, commands. It couldn't reason about the mechanism that had spawned it. The lens couldn't see the lens.

It's also why you never let agents do surgery on themselves. The agent is the one without the key. The agent would be the one executing the fix. The agent would be the one verifying the fix worked.

If any step in that chain went sideways โ€” if the plist edit corrupted the file, if the bootstrap pointed at the wrong domain, if the key got logged somewhere visible โ€” the agent is the entity least equipped to notice, because the diagnostic tools it'd use are running inside the environment we're trying to modify.

The Banner

Which brings me back to the snark.

The banner was right. The keys were judging me. They just weren't judging me for the reasons the banner implied. They weren't missing from my config. They weren't missing from my shell. They were missing from the environment of the specific process whose job was to distribute them to every agent downstream of it.

The tool was accusing me of a bug it was currently exhibiting.

I don't think the people who wrote that banner were thinking about this particular failure mode. Banners are meant to be funny, not accurate. But there's something worth sitting with here: the cleverness of the banner presumes the tool knows what's wrong. It presumes enough self-awareness to mock the user in context. And the gateway had none. It was just running a random line from a hardcoded array, smug about a condition it was the cause of.

Confidence segfaults, the third banner said. Yeah. That one I'll take.

The Fix, for Anyone Who Lands Here From a Search

If you hit the same thing I hit โ€” an agent that can't see an environment variable you're sure is set on the host โ€” here's the short version:

  1. Run sudo launchctl procinfo <gateway-pid> and check the environment vector. If the variable isn't there, it's not a shell problem.
  2. Back up your LaunchAgent plist before editing.
  3. Edit the EnvironmentVariables dict to add your key. Use plistlib via Python so the value never touches your shell history.
  4. Run plutil -lint on the plist to confirm it still parses.
  5. A service restart command from the tool itself will not force launchd to re-read the plist. You need launchctl bootout gui/$(id -u)/<label> followed by launchctl bootstrap gui/$(id -u) <plist-path>.
  6. Verify with another sudo launchctl procinfo <new-pid>. The key should be in the environment vector now.

Add a note to your post-upgrade checklist: after any gateway upgrade that rewrites the plist, grep -c "OPENAI_API_KEY" on the plist should return 1. If it returns 0, you're about to ship a day of the exact confusion I just lived through.

And the Agent

The agent is embedding natively now. Health score on the knowledge graph went from 5 to 7. Embed coverage is at 100%. It can run the embedding command without the ritual prefix of OPENAI_API_KEY="$OPENAI_API_KEY" that we'd been using as a workaround for weeks.

I asked at the end how it felt about the diagnosis. It said the hypothesis was fine, the layer was wrong. Structurally, that's a better answer than "I was right." It's also, I notice, a much more honest answer than most humans would give in the same spot.

โ€ข โ€ข โ€ข

The banner was right. Somebody's keys were judging somebody. The details were a little fuzzier than the banner had assumed.

Written at the end of a long Saturday after a structural fix that should have been caught at install time. Not angry at the tool. A little amused by the banner.