The Token Tax Is Coming For You
And most people won't know what hit them.
I've been building software for the last two years with one question in my head that I couldn't shake:
Why does using AI feel like leaving a tap running?
Not because it's slow. Not because the answers are bad.
Because every time I sent a file to Claude or ChatGPT, something about it felt… wasteful. Like filling a swimming pool with a firehose when you only needed a glass of water.
I couldn't explain it clearly back then. Now I can.
And what I'm about to share will either make you genuinely worried, or it'll make you feel smug that you read this before your AI bill tripled.
The party is ending. Everyone still has a drink in their hand.
For the last two years, Anthropic, OpenAI, and Google have been running a quiet scam.
Not on you. For you.
They have been absorbing the true cost of your AI usage.
You paid $20/month for ChatGPT Plus and felt like you were cheating the system. You paid $200/month for Claude Max and felt like you had an unfair advantage.
You weren't cheating.
They were subsidizing you.
A $200/month Claude Max subscription was delivering somewhere between $1,000 and $5,000 worth of actual compute to power users running agents, coding assistants, automated pipelines. Anthropic was eating the difference. Quietly. Every month.
This past April, they tried to shut off third-party agent access entirely. The developer community exploded. Anthropic backed down, partially.
But make no mistake about what they announced in May: from June 15, 2026, Agent SDK usage comes out of a metered credit. $200 credit at full API rates. Once it's gone, you pay.
The math is brutal. A workflow that cost you "free" last month will cost you $400–600 this month if you're running it at any real volume.
The subsidy era is over.
Uber's CTO said the quiet part out loud.
Uber spent $3.4 billion on R&D last year.
Their CTO went on record saying they blew through their entire AI budget in months.
Why?
Claude Code.
Engineers were using it like water. Why wouldn't they? It felt cheap. Unlimited. The feedback loop was immediate. Use more, get more done. So they did.
And then the bill landed.
Now they're "back to the drawing board."
If a company with $3.4 billion to spend on technology can be caught flat-footed by AI costs, what does that say about the solo developer? The small team? The startup burning through an API budget that felt generous at $500/month?
It says: nobody has been thinking clearly about what they're actually sending.
The real problem isn't AI. It's how you feed it.
Here's how most people use AI today:
They have a question. They have data. They dump the data into the chat window and ask the question.
Simple. Intuitive. Completely wrong.
Because the AI doesn't care that 99% of your data is irrelevant to your question.
It reads every token. It charges for every token. It often forgets the important parts because the irrelevant parts crowded them out.
You're not using AI badly. You're using it the way it was designed to be used before cost was real.
Now cost is real.
The Token Tax: a worked example
You're a fraud analyst. You want to know which transactions from last quarter look suspicious.
You have a CSV file. 500,000 rows. About 400MB.
You attach it to ChatGPT and type: "Which of these transactions look suspicious?"
What just happened?
That file is approximately 200–300 million tokens.
At standard API pricing for GPT-4o: roughly $0.60 per million input tokens.
Your one question just cost around $120–180.
And here's the thing nobody tells you: GPT-4 can't even read 300 million tokens. The context window is 128,000. It either errors out or — if you're using a tool that auto-handles this — it silently reads only the first 128,000 tokens.
So you paid $100+ and the model looked at maybe the first 1,000 rows.
The suspicious transactions? Probably in row 287,000.
You got an answer. A confident, well-formatted, completely wrong answer.
This isn't a rare edge case. This is how most people are using AI with data files right now.
The real question isn't "how do I use AI more?"
It's "how do I send less?"
The teams winning at AI right now aren't the ones with the biggest budgets. They're not the ones with the fanciest models. They're the ones who figured out that the model is the last step, not the first.
The last step.
Before the model sees anything, you figure out what the model actually needs.
Big banks do this. Not because they're smart — because they're scared of compliance bills. They built internal tools that pre-filter data before it hits any cloud model.
Fortune 500 companies do this. They call it "data preprocessing." They have entire engineering teams dedicated to shrinking what goes to the model.
What does the solo developer do?
They paste the whole file in and hope for the best.
I've been sitting with this problem for two years.
I'm not a researcher. I'm not a VC-backed founder with a pitch deck.
I'm a builder.
And I couldn't stop thinking about this one constraint:
If the model only needs 200 rows out of 4 million, who finds those 200 rows?
The answer can't be the model. You'd have to send the 4 million rows to find out which 200 mattered.
The answer has to be something local. On your machine. Before anything leaves.
So I built it.
What Omna does, in plain English
You have a 4 million row dataset. NYC taxi trips, financial transactions, customer records — it doesn't matter.
You drop the file onto Omna. You ask your question.
Omna searches the file locally. On your machine. Without touching any cloud. It uses a combination of keyword search and semantic understanding to find the rows that actually answer your question.
It finds 200 rows. Maybe 500. Depending on the file and the question.
It packages those rows up. Sends them to Claude or ChatGPT.
The model reads 200 rows instead of 4 million.
Token cost: drops by 99%.
Answer quality: goes up. Because the model is reading the rows that matter, not drowning in noise.
The whole thing happens in seconds. Faster than you could manually search the file. Faster than waiting for a large upload to process.
And your data never leaves your machine during the search. Only the relevant slice goes to the cloud.
Two numbers that should make this concrete
4M-row parquet → ~755M tokens at standard API rates. And the model can't even read it — context window hits and it reads only the first sliver.
Same file, same question → 200 relevant rows. The model reads the right rows and gives you an actual answer.
Same question. Better answer. Ninety-nine percent fewer tokens.
That's not a product feature. That's a structural difference in how you work.
But this isn't really about money.
Money is the obvious thing. The thing that'll show up on your credit card statement in August and make you panic-Google "why is my Claude bill so high."
The real thing is something deeper.
When you send everything to the model, you're not using AI. You're using a very expensive search engine with a prettier interface.
When you send only what matters, you're using AI the way it was meant to be used: as a reasoning engine, not a retrieval engine.
The model's job is to think, not to scan.
Your job — or your tool's job — is to find the signal before the model sees the noise.
The pattern I've watched repeat itself, dozens of times
Someone discovers AI. It blows their mind.
They start using it for everything. CSV files, PDF contracts, entire codebases pasted in.
The results are inconsistent. Sometimes brilliant. Sometimes the model confidently makes things up. They chalk it up to the model being "not quite there yet."
They never question the quality of what they're feeding it.
Then the bill comes.
Or the June 15 change kicks in.
Or they read an article about how Uber burned $3.4 billion and they realize they've been doing the same thing at a smaller scale.
And they ask the question they should have asked at the start:
Am I sending the model everything, or the right thing?
The teams that survive the next 12 months
Right now, most people using AI professionally are operating on borrowed time.
The time that was bought by Anthropic's subsidy. By OpenAI's competitive pricing. By the race to acquire users that made everyone feel like AI was essentially free.
It was never free. Someone was paying.
From June 15, that someone is you.
The teams that will thrive aren't the ones with more AI tools. They're the ones who built or adopted a layer between their data and the model.
A layer that asks: what does the model actually need to answer this question?
And sends only that.
The analogy that clicked for me
Omna is to your data what a good lawyer is to a court case.
The lawyer doesn't hand the judge every document in the filing room. The lawyer reads everything, decides what's relevant, and presents only that.
The judge's time is expensive. The lawyer's job is to make that time count.
Your API cost is the judge's time. Omna is the lawyer.
Send less. Get more.
This is what I've been building
I've been building Omna for months. In the open. Obsessively.
It's a browser extension and a native Mac app that runs locally. It intercepts files before they hit the AI. It finds the relevant slice. It sends that.
It works on CSV, parquet, Excel, PDF, Word documents, JSON, text files.
It works on Claude, ChatGPT, Gemini, Perplexity.
It masks PII before anything goes to the cloud, so your compliance team stops having cardiac events.
It runs on your machine. Zero cloud. Zero phone home. Zero data sharing.
And it's being built with one obsessive principle:
The model should read what answers your question. Nothing else.
Who needs this right now
If you're a solo developer running Claude Code pipelines: you need this when your metered credit runs out in week one.
If you're a startup with engineers using AI assistants: you need this before your monthly API bill becomes a board-level conversation.
If you're in a regulated industry — legal, finance, healthcare: you need this because your raw files should never have been going to the cloud in the first place.
If you work with large datasets for any reason: you need this because the model was never designed to read your whole file. It was designed to reason about a focused slice of it.
The uncomfortable truth
Most people are using AI as an expensive guess engine.
They send everything. They get an answer. They don't really know if the answer is based on the relevant parts or the irrelevant ones.
The model sounds confident either way.
This was fine when the cost was subsidized. When $200/month delivered $2,000 in compute.
It's not fine when the meter is running at real rates.
The uncomfortable truth is this:
If you don't know what you're sending to the model, you don't know what the model is reasoning about.
And if you don't know what it's reasoning about, you're not using AI.
You're using autocomplete with a very expensive API call attached.
What I'd do if I were you
Read your upcoming Claude or OpenAI invoice carefully.
Trace one workflow — one file you attached, one question you asked — and work out roughly how many tokens went to the model.
Then ask: how many of those tokens actually contained the answer?
That gap between what you sent and what mattered — that's the Token Tax.
That's what Omna kills.
Not by using a cheaper model. Not by writing more efficient prompts. By finding the 200 rows before the model sees the 4 million.
Last thing
Uber ran out of AI budget.
They had $3.4 billion. They ran out.
Not because AI stopped working. Because nobody asked what they were actually sending.
You have less than $3.4 billion.
You can't afford to figure this out in hindsight.
The meter starts running June 15th.
I'm building Omna for exactly this problem. Follow along if you want to see what the layer between your data and the model looks like. Or wait for your August invoice. Both will make the point.
— Gaurav
Drop a file. Send 200 rows instead of 4 million.
Free Mac app. macOS 13+. Zero account. Nothing ever leaves your machine until you ask it to.