// USING OMNA
Supported File Formats
Omna can slice any of these file types before sending to AI. Each format is handled differently based on its structure.
Tabular files (rows and columns)
These are sliced row-by-row. Omna finds the rows most relevant to your question and sends only those.
| Format | Extensions | Notes |
|---|---|---|
| CSV | .csv | Most common. All columns preserved. |
| Excel | .xlsx, .xls, .xlsb | All sheets are concatenated into one table. |
| OpenDocument Spreadsheet | .ods | Same as Excel handling. |
| Parquet | .parquet | Column-store format common in data engineering. Fully supported — no size limit beyond the 10 GB file cap. |
Output format: A clean CSV with a header row and only the relevant rows. Column names are preserved exactly as they appear in the original.
What counts as a "relevant row": Omna uses a combination of keyword matching (BM25) and semantic similarity (sentence embeddings) to score every row against your question. Rows with scores below a minimum threshold are dropped. For numeric questions ("over $500", "between 10 and 20"), an exact arithmetic filter runs first — all rows that arithmetically satisfy the condition are kept, regardless of semantic score.
Documents (text with structure)
These are sliced section-by-section. Omna finds the sections most relevant to your question.
| Format | Extensions | Notes |
|---|---|---|
| Word | .docx, .doc, .odt, .rtf | Sliced by paragraph or heading section. |
.pdf | Sliced by page. Each page becomes one searchable unit. |
Output format: A .txt file with structural markers. Word docs include [§ Heading Name] markers before each section. PDFs include [p. N] markers before each page's content.
Why PDFs are sliced by page: Unlike tabular data where each row is an independent fact, PDFs contain flowing text where context within a page matters. Splitting mid-paragraph would break meaning. One page = one chunk is a safe, predictable unit.
Plain text files
These are sliced line-by-line.
| Format | Extensions | Notes |
|---|---|---|
| Plain text | .txt, .md, .log | Each non-empty line is one searchable unit. |
| JSON | .json | Each top-level array element or top-level object key is one unit. |
Markdown files: Treated as plain text. Omna doesn't parse markdown structure — each line is indexed independently. For large .md files with clear headings, results will typically cluster around the relevant sections naturally.
Log files: Log line format varies widely. Omna indexes each line as a unit and uses keyword + semantic search to surface the relevant entries. Works well for error logs, access logs, and structured log lines.
File size limits
| Limit | Value | Why |
|---|---|---|
| Maximum file size | 10 GB | Hard cap per file |
| Maximum rows for full indexing | 5,000,000 rows | Above this, only BM25 keyword index is built (no embeddings) |
| Minimum row length for embeddings | 50 tokens | Very short rows (like single IDs) are BM25-only |
Files above 5M rows can still be sliced — the keyword search runs on all rows, but the semantic reranking step only applies to the BM25 survivors.
What is NOT supported
Screenshots and images (OCR-sliced)
.png, .jpg, .jpeg, .webp, .heic files are supported via OCR.
When you drop a screenshot onto the capsule or attach it in the browser extension, Omna uses macOS Vision to extract the text, then slices and masks it exactly like a .txt file. The AI receives extracted text (~200–400 tokens) instead of the raw image (~1,600 Vision API tokens).
AI Vision APIs charge ~1,600 tokens per image regardless of content. OCR + slicing can cut that by 70–90% when the image contains relevant text.
Unsupported image formats: .gif, .bmp, .svg — these are not screenshot formats and are released to the AI unchanged.
Other unsupported formats
| Format | Status |
|---|---|
Images (.gif, .bmp, .svg) | Not supported — not screenshot formats |
Audio (.mp3, .m4a, .wav) | Not supported |
Video (.mp4, .mov) | Not supported |
| Encrypted / password-protected files | Not supported — Omna cannot read encrypted content |
| ZIP / RAR archives | Not supported — extract first, then drop the individual file |
Binary files (.exe, .bin, .dmg) | Not supported |
Extension vs. desktop app — same formats?
Yes. The browser extension uses the same slicing engine as the desktop capsule (the Mac app). When you attach a file on claude.ai or chatgpt.com, the extension sends it to the Mac app for slicing and swaps the result in — same formats, same quality, same output.
One difference: The extension does its own PDF and Word text extraction in Chrome (using pdfjs-dist and mammoth) before sending to the Mac app. The Mac app handles all formats natively. End result for the user is identical.
How Omna picks the output format
| Input | Output |
|---|---|
| Tabular (CSV, Excel, Parquet) | .csv — header + relevant rows |
.txt — page markers + relevant page content | |
| Word / RTF / ODF | .txt — section markers + relevant sections |
| Plain text / log / markdown | .txt — relevant lines |
| JSON | .json — relevant top-level elements |
| Image (PNG/JPG/JPEG/WEBP/HEIC) | .txt — OCR'd text, relevant lines sliced |
The output file is attached to the AI chat as if you'd attached it yourself. The AI sees the sliced file — it never sees the original.