// PYTHON LIBRARY
mask_pii()
`df.omna.mask_pii()` redacts personal information across your DataFrame in one line — replacing values with reversible `[PERSON_1]`-style tokens, irreversibly redacting secrets, and saving a full audit log. It runs on Omna's six-layer Rust engine, the same kernel as the Mac app and browser extension, entirely on-device.
Signature
clean = df.omna.mask_pii() # L1 + L2 — instant, deterministic
clean = df.omna.mask_pii(model=True) # adds L3 — the on-device AI model| Parameter | Description |
|---|---|
model | When True, adds the on-device AI layer for contextual PII (bare names, addresses, medical context). Downloads the model (~809 MB) once |
Returns the redacted DataFrame. An audit log mapping each placeholder back to its source is saved to .omna/pii_audit.parquet automatically.
Example
import polars as pl
import omna
df = pl.read_csv("patient_notes.csv")
clean = df.omna.mask_pii()
# → reversible [PERSON_1]-style tokens; secrets always irreversibly [REDACTED:KIND]
# → audit log saved to .omna/pii_audit.parquetThe six-layer engine
Most "PII for DataFrames" tools are a single regex or a Presidio wrapper. Omna's masking is a six-layer detection engine, pure Rust, fully on-device:
- L1 — patterns + checksum validators: emails, SSNs, cards (Luhn), IBANs, and 30+ international IDs verified, not just pattern-matched
- L2 — secrets: 220+ rules (AWS keys, GitHub tokens, JWTs, private keys) with entropy checks
- L3 — on-device AI model (
mask_pii(model=True)): contextual PII no regex can catch — bare names, addresses, medical context - L4–L6: entity resolution, reversible
[PERSON_1]tokens (secrets always irreversibly redacted), and a full audit trail
What it detects
PERSON · EMAIL · PHONE · CREDIT_CARD · US_SSN · IP_ADDRESS · IBAN · MEDICAL_RECORD_NUMBER · BANK_ACCOUNT, plus 220+ secret types (API keys, tokens) and 30+ international IDs — checksum-validated where applicable.
Benchmarked openly
Measured on the Gretel PII Masking Benchmark — same scoring, only the engine changed:
| Gretel benchmark | Before (Presidio) | After (Omna engine) |
|---|---|---|
| Core-PII recall | 0.69 | 0.84 |
| All-types recall | 0.35 | 0.79 |
| All-types F1 | 0.50 | 0.82 |
Mask before you send to an LLM
The most important use of mask_pii() is as a pre-processing step before any text reaches a hosted model:
import polars as pl
import omna
from openai import OpenAI
notes = pl.read_csv("patient_notes.csv")
clean = notes.omna.mask_pii() # redact locally, in-process
client = OpenAI()
for row in clean.iter_rows(named=True):
resp = client.responses.create(model="gpt-4o", input=f"Summarize: {row['text']}")
print(resp.output_text)mask_pii() is a privacy layer, not a compliance certification. No automated system achieves 100% recall on all data shapes. For high-stakes workflows, review the audit log, and consult your legal team about your specific HIPAA / GDPR / CCPA obligations.