ProductsPrivacyLibraryDocsPricingGitHubAdd to ChromeDownload for Mac

// USING OMNA

How Slicing Works

Understanding what Omna does inside the black box — so you know what to expect and how to get the best results.

8 min readInside the black box

The problem: the Token Tax

AI models (Claude, ChatGPT, Gemini) charge by the token. One token is roughly 4 characters of text. A 100,000-row CSV file might contain 10 million tokens. Sending the whole file to Claude costs roughly $30 and often fails entirely because Claude's context window only fits 200,000 tokens.

Most people deal with this by:

  • Sending just the first N rows (the AI gets an arbitrary slice, not the relevant one)
  • Summarizing manually (time-consuming, loses detail)
  • Giving up (the question doesn't get answered)

Omna's approach: find the 200–500 rows that actually answer your question, and send only those. The AI gets full-resolution relevant data. You pay for a fraction of the cost. The answer is better.


The two paths: Tier 1 and Tier 2

Tier 1 — Pre-indexed files (fast path)

If you've dropped this file before, or if it lives in a folder Omna watches, it's already been indexed in the background. "Indexed" means Omna has pre-computed a semantic embedding (a 384-number fingerprint of meaning) for every row and built an HNSW (Hierarchical Navigable Small World) approximate-nearest-neighbour graph over those embeddings.

When you drop the file with a question:

  1. Omna embeds your question (same 384-number format)
  2. Queries the HNSW graph for the nearest neighbours in O(log N) — the search jumps directly to nearby vectors instead of scoring every row
  3. Fetches the matched row text from a memory-mapped .rows random-access store (so Tier 1 reads only the rows that survived, not the whole file)
  4. Drops rows below a minimum similarity threshold (0.3 cosine similarity, proven across a 113-test benchmark)
  5. If more than 70% of rows survive the threshold (your question is very broad), applies a tighter filter: max(0.3, top_score × 0.6)
  6. Returns the survivors in order of relevance, within the token budget

If the HNSW + .rows artifacts haven't been back-filled yet (a file indexed before the fast-artifact upgrade and not yet rebuilt), Omna automatically falls back to a parallel brute-force cosine scan over every embedded row. Same answer, slower path.

How fast is Tier 1 in practice?

  • Small files (a few thousand rows, a few hundred MB of embeddings): sub-second.
  • Large files today (4-million-row parquet, ~6 GB of embeddings): about 20 seconds on the first query after Omna starts, then sub-second for every later query on the same file while Omna keeps running. The first-query cost is dominated by loading the HNSW segment graphs from disk.
  • An always-warm in-memory cache (so the first query is fast too) is a tracked open item — see USER_JOURNEY §5 Step 8 for the current spec.
  • Numeric range questions ("over $500", "between X and Y") take the brute-force arithmetic path instead of HNSW — see Numeric questions below.

The point of Tier 1 is that Omna never has to re-read or re-embed your file when you ask a question. The vectors and the search graph already exist on disk — Tier 1 just walks them.

Tier 2 — Live drop (15–45 seconds)

If the file has never been indexed, Omna processes it on the fly:

On a standard Mac (Slow profile — below 1,000 rows/second embedding speed):

  1. BM25 keyword scan — reads every row, scores by keyword frequency and rarity. Takes 1–7 seconds depending on file size.
  2. Keeps rows that score at least 15% of the top BM25 score. Drops everything else.
  3. From the survivors, takes up to 1,000 rows.
  4. Runs semantic embedding on those 1,000 rows.
  5. Reranks by cosine similarity to your question.
  6. Returns the top rows within the token budget.

On a fast Mac (Fast profile — 1,000+ rows/second):

  1. Skips BM25 entirely.
  2. Embeds up to 3,000 rows directly (stratified sample if the file has more).
  3. Ranks by cosine similarity.
  4. Returns top rows within the token budget.

After a Tier 2 slice, the background indexer immediately starts embedding all rows. The next time you drop the same file, it's a Tier 1 slice.


Numeric questions — exact arithmetic filter

For questions with explicit comparisons ("transactions over $500", "between 100 and 200", "less than $10"), pure semantic search is wrong — language models cannot compare numbers. $389 and $621 are semantically identical ("they're both dollar amounts near $500") but arithmetically different.

Omna detects range questions and runs an exact arithmetic filter before any semantic step:

  1. Identifies the numeric column (named in your question, or the only numeric column, or asks you to clarify if ambiguous)
  2. Filters to rows that arithmetically satisfy the condition: amount > 500
  3. This set is authoritative — cosine similarity only orders the results, it cannot remove a row that passes the arithmetic test

Why this matters: If you ask "find transactions over $500" and 11 of your 50 rows qualify, you get exactly those 11 rows — not 36 rows where some are over $500 and some aren't, which is what pure semantic search would return.


What the result card shows

After every slice, a result card appears. It classifies the result into one of four scenarios:

ScenarioConditionWhat it means
Token Tax killedFile fits in Claude's window; Omna sent a small relevant sliceReal dollar savings — you pay for the slice, not the file
All rows matchedFile fits; question matched most rows (broad question)Minimal filtering — card shows actual cost on both sides
Token Tax crushedFile is bigger than Claude's 200K window; Omna's slice fitsBig savings — baseline is the truncated 200K, not the full file
Token Tax redirectedFile is bigger than 200K; even Omna's slice fills the windowSame token cost, but Omna's 200K contains the relevant rows, not the first arbitrary rows

The savings math always uses Claude's real context window (200K tokens) as the baseline. We never show savings based on a number Claude would have rejected — that would be fictional.


Token budget

Omna does not trim the slice to a fixed token target. Every row that matches your question — by keyword, by semantic similarity, or by the exact arithmetic filter — is kept in the output. If your question has 11 matches, you get all 11. If it has 50,000 matches, you get all 50,000.

The AI you're sending the slice to (Claude, ChatGPT, etc.) enforces its own context-window limit. Claude Sonnet's window is 200,000 tokens; if the slice exceeds that, the AI itself truncates from the end. The result card calls this "Token Tax redirected" — same token cost, but the rows in the window are the relevant ones, not the first arbitrary ones.

The earlier "100,000 token output" cap that lived in different files for different surfaces caused row counts to disagree between the website, the desktop app, and the browser extension. It was removed on 2026-05-20 so every surface returns the same slice for the same question. See USER_JOURNEY §12.1 for the current rule.


Weak keyword results (BM25 found nothing)

If fewer than 10 rows survive the BM25 step, your question likely used words that don't appear in the data. Omna shows suggestions based on your file's actual column names and sample values:

"Your question didn't match any keywords in this file. Based on your data, try asking: — 'trips where distance is over 20 miles' — 'fares above $50 with zero tip'"

Try rewording and drop the file again. If BM25 still finds nothing on the second attempt, Omna falls back to pure semantic search on a random sample of 5,000 rows.


Background indexing

The background indexer runs silently while you work. It:

  • Watches your chosen folders for new or changed files
  • Processes files in order of size (smaller files first, so you can query them sooner)
  • Pauses automatically when you're on battery, your CPU is over 70%, or you're on a Zoom/Meet/Teams call
  • Runs at low priority — capped at half your Mac's cores by default (30% in Low-power mode) and scheduled at a background QoS so macOS steers the work toward efficiency cores
  • Stores a small set of artifacts per indexed file: index.bm25 (keyword index), index.embed (384-dim vectors), index.hnsw.segs (segmented HNSW graph), index.rows (memory-mapped row-text store for fast Tier-1 lookup), and index.fingerprint (content hash for change detection)
  • Never re-indexes a file unless its content changes (it uses a content fingerprint, not the file name)

Each indexed file gets its own subfolder under ~/Library/Application Support/Omna/index/, named after the source file. The default disk cap is 20 GB across all subfolders.


PII masking during slicing

Before the sliced rows are sent to the AI, Omna masks personal information:

  • Names, emails, phone numbers, addresses
  • Social Security numbers, passport numbers
  • Credit card numbers, bank account numbers
  • Healthcare data, employment data, credentials

Masked text looks like: [PERSON_1], [EMAIL_1], [PHONE_1]. The AI receives the masked version. Omna's local token registry maps each placeholder back to the real value so results stay meaningful.

You can turn PII masking off in the menu bar if you're working with non-sensitive data.