// REFERENCE

Supported File Formats

Omna can slice any of these file types before sending to AI. Each format is handled differently based on its structure.

5 min readWhat you can drop in

Tabular files (rows and columns)

These are sliced row-by-row. Omna finds the rows most relevant to your question and sends only those.

Format	Extensions	Notes
CSV	`.csv`	Most common. All columns preserved.
Excel	`.xlsx`, `.xls`, `.xlsb`	All sheets are concatenated into one table.
OpenDocument Spreadsheet	`.ods`	Same as Excel handling.
Parquet	`.parquet`	Column-store format common in data engineering. Fully supported — no size limit beyond the 10 GB file cap.

Output format: A clean CSV with a header row and only the relevant rows. Column names are preserved exactly as they appear in the original.

What counts as a "relevant row": Omna uses a combination of keyword matching (BM25) and semantic similarity (sentence embeddings) to score every row against your question. Rows with scores below a minimum threshold are dropped. For numeric questions ("over $500", "between 10 and 20"), an exact arithmetic filter runs first — all rows that arithmetically satisfy the condition are kept, regardless of semantic score.

Documents (text with structure)

These are sliced section-by-section. Omna finds the sections most relevant to your question.

Format	Extensions	Notes
Word	`.docx`, `.doc`, `.odt`, `.rtf`	Sliced by paragraph or heading section.
PDF	`.pdf`	Sliced by page. Each page becomes one searchable unit.

Output format: A .txt file with structural markers. Word docs include [§ Heading Name] markers before each section. PDFs include [p. N] markers before each page's content.

Why PDFs are sliced by page: Unlike tabular data where each row is an independent fact, PDFs contain flowing text where context within a page matters. Splitting mid-paragraph would break meaning. One page = one chunk is a safe, predictable unit.

Plain text files

These are sliced line-by-line.

Format	Extensions	Notes
Plain text	`.txt`, `.md`, `.log`	Each non-empty line is one searchable unit.
JSON	`.json`	Each top-level array element or top-level object key is one unit.

Markdown files: Treated as plain text. Omna doesn't parse markdown structure — each line is indexed independently. For large .md files with clear headings, results will typically cluster around the relevant sections naturally.

Log files: Log line format varies widely. Omna indexes each line as a unit and uses keyword + semantic search to surface the relevant entries. Works well for error logs, access logs, and structured log lines.

File size limits

Limit	Value	Why
Maximum file size	10 GB	Hard cap per file
Maximum rows for full indexing	5,000,000 rows	Above this, only BM25 keyword index is built (no embeddings)
Minimum row length for embeddings	50 tokens	Very short rows (like single IDs) are BM25-only

Files above 5M rows can still be sliced — the keyword search runs on all rows, but the semantic reranking step only applies to the BM25 survivors.

What is NOT supported

Screenshots and images (OCR-sliced)

.png, .jpg, .jpeg, .webp, .heic files are supported via OCR.

When you drop a screenshot onto the capsule or attach it in the browser extension, Omna uses macOS Vision to extract the text, then slices and masks it exactly like a .txt file. The AI receives extracted text (~200–400 tokens) instead of the raw image (~1,600 Vision API tokens).

AI Vision APIs charge ~1,600 tokens per image regardless of content. OCR + slicing can cut that by 70–90% when the image contains relevant text.

Unsupported image formats: .gif, .bmp, .svg — these are not screenshot formats and are released to the AI unchanged.

Other unsupported formats

Format	Status
Images (`.gif`, `.bmp`, `.svg`)	Not supported — not screenshot formats
Audio (`.mp3`, `.m4a`, `.wav`)	Not supported
Video (`.mp4`, `.mov`)	Not supported
Encrypted / password-protected files	Not supported — Omna cannot read encrypted content
ZIP / RAR archives	Not supported — extract first, then drop the individual file
Binary files (`.exe`, `.bin`, `.dmg`)	Not supported

Extension vs. desktop app — same formats?

Yes. The browser extension uses the same slicing engine as the desktop capsule (the Mac app). When you attach a file on claude.ai or chatgpt.com, the extension sends it to the Mac app for slicing and swaps the result in — same formats, same quality, same output.

One difference: The extension does its own PDF and Word text extraction in Chrome (using pdfjs-dist and mammoth) before sending to the Mac app. The Mac app handles all formats natively. End result for the user is identical.

How Omna picks the output format

Input	Output
Tabular (CSV, Excel, Parquet)	`.csv` — header + relevant rows
PDF	`.txt` — page markers + relevant page content
Word / RTF / ODF	`.txt` — section markers + relevant sections
Plain text / log / markdown	`.txt` — relevant lines
JSON	`.json` — relevant top-level elements
Image (PNG/JPG/JPEG/WEBP/HEIC)	`.txt` — OCR'd text, relevant lines sliced

The output file is attached to the AI chat as if you'd attached it yourself. The AI sees the sliced file — it never sees the original.