// PYTHON LIBRARY
filter()
`df.omna.filter(query, on, threshold)` returns **every** row that is semantically similar to your query above a similarity threshold — not just the top k. Use it when you want all the matches, not a ranked sample.
Signature
df.omna.filter(query, on, threshold=0.3)| Parameter | Description |
|---|---|
query | The query string, in plain language |
on | The column to filter |
threshold | Minimum similarity (0–1). Default 0.3. Raise for precision, lower for recall |
Note: Run [
df.omna.embed("column")](/docs/embed) once before filtering.Example
import polars as pl
import omna
df = pl.read_csv("documents.csv")
df.omna.embed("text") # once
filtered = df.omna.filter("insurance claim denied", on="text", threshold=0.73)
# → every document above 0.73 similarity — all semantically related to claim denialssearch() vs. filter()
| Use | When |
|---|---|
[search(query, on, k)](/docs/search) | You want the top k most relevant rows, ranked |
filter(query, on, threshold) | You want every row above a similarity cutoff |
Raise the threshold for higher precision (fewer, tighter matches); lower it for higher recall (more, looser matches).