LLM Safety · PII
Mask PII Before Sending Data to ChatGPT, Claude, or Gemini
Hosted LLMs are powerful — and they leak. Anything you put in the prompt may be logged, sampled, or retained. The fix: strip PII locally before the prompt ever leaves your machine.
The pattern
import polars as pl
import omna
from openai import OpenAI
notes = pl.read_parquet("patient_notes.parquet")
# 1. Redact PII locally — names, emails, MRNs, SSNs, phone, addresses.
safe = notes.omna.mask_pii()
# 2. Send only the masked text to the LLM.
client = OpenAI()
for row in safe.iter_rows(named=True):
resp = client.responses.create(
model="gpt-5",
input=f"Summarize: {row['note']}",
)
print(resp.output_text)Why this matters
- · HIPAA / PHI: hosted LLMs are not covered entities. Masking lets you use them without a BAA.
- · GDPR: redaction is a recognized minimisation control under Article 5(1)(c).
- · Customer data: emails, names, and account IDs in prompts may end up in vendor logs.
Why local masking beats API-based redaction
Cloud redaction APIs (AWS Comprehend, Google DLP, Azure PII) require shipping the raw record to a third party — exactly the egress event you're trying to avoid. Omna runs in-process with a Rust kernel: the raw PII never crosses a network boundary.
Try it now
pip install omna