// PYTHON LIBRARY

mask_pii()

`df.omna.mask_pii()` redacts personal information across your DataFrame in one line — replacing values with reversible `[PERSON_1]`-style tokens, irreversibly redacting secrets, and saving a full audit log. It runs on Omna's six-layer Rust engine, the same kernel as the Mac app and browser extension, entirely on-device.

2 min readSix-layer on-device redaction

Signature

python

clean = df.omna.mask_pii()           # L1 + L2 — instant, deterministic
clean = df.omna.mask_pii(model=True) # adds L3 — the on-device AI model

Parameter	Description
`model`	When `True`, adds the on-device AI layer for contextual PII (bare names, addresses, medical context). Downloads the model (~809 MB) once

Returns the redacted DataFrame. An audit log mapping each placeholder back to its source is saved to .omna/pii_audit.parquet automatically.

Example

python

import polars as pl
import omna

df = pl.read_csv("patient_notes.csv")

clean = df.omna.mask_pii()
# → reversible [PERSON_1]-style tokens; secrets always irreversibly [REDACTED:KIND]
# → audit log saved to .omna/pii_audit.parquet

The six-layer engine

Most "PII for DataFrames" tools are a single regex or a Presidio wrapper. Omna's masking is a six-layer detection engine, pure Rust, fully on-device:

L1 — patterns + checksum validators: emails, SSNs, cards (Luhn), IBANs, and 30+ international IDs verified, not just pattern-matched
L2 — secrets: 220+ rules (AWS keys, GitHub tokens, JWTs, private keys) with entropy checks
L3 — on-device AI model (mask_pii(model=True)): contextual PII no regex can catch — bare names, addresses, medical context
L4–L6: entity resolution, reversible [PERSON_1] tokens (secrets always irreversibly redacted), and a full audit trail

What it detects

PERSON · EMAIL · PHONE · CREDIT_CARD · US_SSN · IP_ADDRESS · IBAN · MEDICAL_RECORD_NUMBER · BANK_ACCOUNT, plus 220+ secret types (API keys, tokens) and 30+ international IDs — checksum-validated where applicable.

Benchmarked openly

Measured on the Gretel PII Masking Benchmark — same scoring, only the engine changed:

Gretel benchmark	Before (Presidio)	After (Omna engine)
Core-PII recall	0.69	0.84
All-types recall	0.35	0.79
All-types F1	0.50	0.82

Mask before you send to an LLM

The most important use of mask_pii() is as a pre-processing step before any text reaches a hosted model:

python

import polars as pl
import omna
from openai import OpenAI

notes = pl.read_csv("patient_notes.csv")
clean = notes.omna.mask_pii()          # redact locally, in-process

client = OpenAI()
for row in clean.iter_rows(named=True):
    resp = client.responses.create(model="gpt-4o", input=f"Summarize: {row['text']}")
    print(resp.output_text)

Warning: mask_pii() is a privacy layer, not a compliance certification. No automated system achieves 100% recall on all data shapes. For high-stakes workflows, review the audit log, and consult your legal team about your specific HIPAA / GDPR / CCPA obligations.

What's next

DocsAudit before you redact — pii_report() →DocsAsk questions in plain English — ask() →