Let me set the scene.

It's 2am. My ThinkCentre M75n is humming on the desk. I've got an AI agent running — or at least I think I do. OpenClaw is installed, the gateway is live, and Mewtwo (that's what I named him) is connected to Telegram. I send a test message. He responds. I'm gassed.

Then I check my Anthropic billing dashboard.

$18. In four hours.

I sat there for a second just staring at the number. Not because $18 is going to break me — but because I had no idea what caused it, and that feeling right there? That's the feeling you need to get comfortable with if you're going to build in this space. The moment where the thing you built is running, but you don't fully understand how yet.

That's what this post is about. Not the polished version of building AI agents. The real version.


How I Got Here

Twelve years playing professional basketball. Seven countries. When I retired, I had no CS degree, no bootcamp, no network in tech. What I had was a refusal to accept that this world wasn't for me.

I started Null Limit as a company built on one thesis: that the skills that make someone elite at anything — discipline, pattern recognition, the ability to operate under pressure — are the same skills that make someone a great builder. You don't need a four-year degree to understand systems. You need reps.

So I got reps.

I built things that broke. I rebuilt them. I shipped six live products in under a year — an AI companion app, a competitive intelligence engine, a BTC trading radar, a knowledge API, a lead generation service, an eBook series. None of it with a single line of venture capital. All of it solo.

The $18 night was one of the most important nights in that process.


The Loop Nobody Warns You About

Here's what happened: I had configured my AI agent with a fallback model. The idea was smart — if Claude hits a limit or has an issue, fall back to another model automatically. What I didn't know was that I had the wrong model ID in my config. Not wrong as in bad — wrong as in invalid. The model string didn't exist.

So what happened when Claude tried to fall back? It called an invalid model. That failed. Then it tried to fall back again. That failed. Then again. And again. An infinite retry loop burning API tokens every single cycle, running all night while I slept.

$18.

The fix was a single line in a JSON config file. Change gemini-3-flash to gemini-2.5-flash. That's it. Four characters. But I only found it because I had to go looking — and going looking taught me more about how these systems actually work than any tutorial ever did.

This is the part of building AI agents that nobody's writing blog posts about. Not because it's a secret — because most of the people writing about AI agents haven't actually run one 24/7 on their own hardware and watched the billing dashboard in real time.


What Actually Runs an Agent

Most people think of an AI agent as a chatbot. You send a message, it responds, end of story. That model is fine for demos. It's not what's actually happening when you're running an agent in production.

A real AI agent has five layers:

01 — Runtime

The system that keeps the agent alive, processes incoming messages, manages context, and routes requests to the right model. I use OpenClaw, which runs as a system service on my M75n Linux machine. It starts when the machine boots and never stops unless I tell it to.

02 — Model

The actual intelligence. Claude Sonnet 4.6 is my primary. Gemini 2.5 Flash is my fallback for high-volume, lower-stakes tasks. Model selection isn't just a preference decision — it's a cost architecture decision.

03 — Tools and Skills

What the agent can actually do. Web search, file access, code execution, API calls. Without tools an agent is just a text box. With them it's an autonomous operator.

04 — Memory

How the agent maintains context across conversations. This is one of the hardest problems in agent design and most tutorials gloss over it completely.

05 — Communication Layer

How you actually talk to it. Mine uses Telegram. When I message @vibeymewto_bot, Mewtwo picks it up, processes it, and responds. From anywhere in the world, on any device, with full context of our history.

Understanding all five layers is the difference between having a demo and having infrastructure.


The Two-Tier Trick That Cut My Costs by 97%

After the $18 night I became obsessed with token cost efficiency. Not because I couldn't afford the usage — because efficiency is respect for the system you're building. Wasteful architecture is a sign you don't understand what you built.

The breakthrough came from my BTC trading engine.

I had built a radar that runs every five minutes, analyzing Bitcoin price action and generating a signal: buy, sell, or hold. My first version called Claude on every single cycle. Every five minutes, a full LLM call. Even when the market was flat. Even when the signal was obviously neutral. That's 288 Claude calls per day just for one tool.

The fix was embarrassingly obvious once I saw it.

Tier 1 — Pure math

RSI, volume, momentum, trend direction — all calculated locally with zero API calls. This runs on every cycle. If the signal confidence is below 65%, the decision is no signal and the engine stops there. No LLM needed.

Tier 2 — LLM confirmation

Only when Tier 1 produces a high-confidence signal does the engine escalate to Claude or Gemini for deeper analysis. This happens maybe 10-15% of cycles.

Result: 97% cost reduction. Same signal quality. Sometimes better, because the LLM isn't being called to confirm obvious nothing.

This pattern — do the cheap computation first, only escalate to expensive computation when justified — applies to almost every AI agent use case. It's not a trick. It's good engineering.


What I'm Building Now

Null Limit today is a stack of interconnected systems, all running on the same philosophical foundation: build lean, build real, build things that actually work.

Mewtwo is my operations agent. He runs 24/7 on the M75n, handles research tasks, monitors the trading radar, and is my first point of contact for anything I need done when I'm away from the keyboard.

Makima is my content intelligence agent. She runs on-demand on my MacBook, writes blog posts and tweets, researches prospects for the ghost-writing service, and is slowly becoming the marketing engine for everything else I build.

The BTC radar runs on a cron job, feeding signals into a prediction market strategy I'm developing on Kalshi.

The Knowledge API serves structured intelligence to other developers and AI agents who want clean data without building the collection layer themselves.

None of this is theory. All of it is running right now, in production, generating real data.


The Thing About Being a Former Athlete

People ask me sometimes how I went from basketball to AI development without a technical background, and I always give the same answer: the same way I got good at basketball. Repetition, film study, and refusing to accept that the gap between where I was and where I wanted to be was permanent.

Basketball is a game of systems. Offense. Defense. Personnel. Every team you face has tendencies, and your job is to read them faster than they can hide them. Building software is the same thing. Systems with patterns. Inputs with predictable outputs. The ability to look at a broken thing and ask why before you ask how to fix it.

The $18 loop wasn't a failure. It was film study.


Start Here

If you're building your first AI agent and you want to avoid the mistakes I made, the most important thing I can tell you is this: run it small first. One model. One tool. One communication channel. Get it stable, understand every component, then add complexity.

The agents that compound in value are the ones built on foundations you actually understand.

I wrote a full beginner's guide to getting your first OpenClaw agent running — available now at nulllimit.gg/ebooks. Everything I know about setup, configuration, cost management, and the eight mistakes that will burn your budget before you realize what happened.

Want your own agent set up and running? That's exactly what we do at Null Limit — fully configured, secured, and connected to your workflow in one session.

If you want to follow the build in real time, every post lives at nulllimit.gg/blog and on @devCharizard — pulled directly from what's actually happening in the stack.

No filter. No polish. Just the work.