LLM Safety · PII

Mask PII Before Sending Data to ChatGPT, Claude, or Gemini

Hosted LLMs are powerful — and they leak. Anything you put in the prompt may be logged, sampled, or retained. The fix: strip PII locally before the prompt ever leaves your machine.

The pattern

import polars as pl
import omna
from openai import OpenAI

notes = pl.read_parquet("patient_notes.parquet")

# 1. Redact PII locally — names, emails, MRNs, SSNs, phone, addresses.
safe = notes.omna.mask_pii()

# 2. Send only the masked text to the LLM.
client = OpenAI()
for row in safe.iter_rows(named=True):
    resp = client.responses.create(
        model="gpt-5",
        input=f"Summarize: {row['note']}",
    )
    print(resp.output_text)

Why this matters

· HIPAA / PHI: hosted LLMs are not covered entities. Masking lets you use them without a BAA.
· GDPR: redaction is a recognized minimisation control under Article 5(1)(c).
· Customer data: emails, names, and account IDs in prompts may end up in vendor logs.

Why local masking beats API-based redaction

Cloud redaction APIs (AWS Comprehend, Google DLP, Azure PII) require shipping the raw record to a third party — exactly the egress event you're trying to avoid. Omna runs in-process with a Rust kernel: the raw PII never crosses a network boundary.

Try it now

pip install omna