// PYTHON LIBRARY

understand_df()

`omna.understand_df(df)` gives you a fast, LLM-free read on any DataFrame — column labels, dtypes, null rates, and sample values — so you know what you're working with before you embed, search, or mask anything.

1 min readSchema inference, no LLM

Signature

python

omna.understand_df(df)

This is a top-level function, not a .omna namespace method. It makes no network call and uses no language model — it is pure local schema inference.

Example

python

import polars as pl
import omna

df = pl.read_csv("documents.csv")
omna.understand_df(df)

code

 column                dtype    null_pct   label     sample
 uid                   String     0.0%     category  24bb757...
 domain                String     0.0%     category  insurance, healthcare...
 document_type         String     0.0%     category  Invoice, ClaimForm...
 document_description  String     0.0%     text      An insurance claim...
 text                  String     0.0%     text      **Claim ID: 285-14...

Labels

Each column is tagged with an inferred semantic label, so you can quickly spot which columns hold free text (good candidates for [embed](/docs/embed) and [search](/docs/search)) and which hold identifiers:

email · phone · name · id · date · text · numeric · boolean · category · unknown

Tip: Start here. Knowing which column is your main text column tells you what to pass as the on argument to [search](/docs/search) and [filter](/docs/filter).

What's next

DocsAudit for PII — pii_report() →DocsVectorize a column — embed() →