1. Sanitising PII data needs to be de-santised on the client in order to keep the UX somewhat functional. For example, if you say my name is John which get's redact to [NAME] and the model responds with Hi [NAME] it needs to be converted back to Hi John. This means that you need to have a mechanism for reversing PII at the layer where the user is interacting. Of course, that is true if your care about user experience.
2. Redacted PII data is practically useless for most intents and purposes. The model wont be able to do much without some data and there are many things that are considered PII. For simple chat system this is fine. For something more complex where the user needs to interact with the LLM this becomes extremely challenging as the LLM may not be able to do anything at all. There is also the chance of hallucination.
Overall, it is a feature that we support at platform level but it is not something people tend to use due to these limitations.
In my mind the only practical thing to do is to remove some types of PII that represent a security risk and make sure that you use a trusted model that purges PII data as quickly as possible. This will require a very different type of system.
> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
> The released model has 1.5B total parameters with 50M active parameters.
> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.
It works pretty well for the use cases I was playing with.
The OpenAI model is small enough that I might enhance my tool to use it.
I quite like Moxie's Confer[1] approach to just encrypt the whole thing in such a way that no one except the end-user sees the plaintext.
For anything touching security or privacy, even small inconsistencies can quickly erode trust.
1. Pass the raw text through the filter to obtain the spans.
2. Map all the spans back to the original text.
Now you have all the PII information.
I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.
Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".
Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.
Bringing back the Open to OpenAI..
How would you actually use this if it can fail redacting 4% of the data. How do you reliably know which 4% failed?
Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.
Since you can't be 100% certain that a filter redacts all personal data, you'd have to make sure that you have measures in place which allow OpenAI to legally process personal data on your behalf. Otherwise you'd technically have a data breach (from a GDPR pov).
And if OpenAI can legally process personal data on your behalf, why bother filtering if processing with filtering is also compliant?
It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.
You need to do that part yourself after the model runs. The filter gives you spans; for each one, assign a stable ID (PERSON_1, PERSON_2) and keep {PERSON_1: "Harry", PERSON_2: "Ron"} next to the document. Swap IDs in before the LLM call, swap originals back in the reply.
Scoping that map to a document/project keeps the same person consistent across calls, so Harry stays PERSON_1 instead of becoming PERSON_3 the next time he's mentioned.
(Disclosure: I'm building a Mac privacy tool, RedMatiq, that does exactly this. The mapping layer turned out substantially harder than detection.)
Check it out: https://redact.cabreza.com
Anyway, I have no idea what the underlying data here looks like, but I bet it's pretty unusual.
When I was working on my first job out of college, we were given a large contract and told to redact with black Sharpie every name of a company; it was a basic document prep exercise ahead of a strategy session for a competitor. Standard practice was to share general information but not specific. Our redaction error rate on 200 pages of contract was ... not 100%.
I'm suggesting that a model designed for high-accuracy redaction can also be used to find all PII in unredacted text. For example, if I don't already know how to find PII (e.g., regex, NLP, etc.) I can use OpenAI's Privacy Filter model to do the work for me.
And because each span has a type (PRIVATE_NAME, etc.) I don't even need to do any work to find only the specific information I am looking for; something that simple diffing wouldn't do.
I'm not saying it's an issue, I just think it is interesting that a tool designed to protect PII can also be used to find it with minimal effort. And it looks like someone already implemented it: https://github.com/chiefautism/privacy-parser.
The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.
This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.Also, care to share your app link/homepage? I google, but couldn't find it.
Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.
Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?
The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.
Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.
Sure they do, computers repeatedly, quickly, and predictably do what they are programmed to do. Which includes any human errors in that programming.
Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.
As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...
And now they predictably do what they are not programmed to do.
Today we’re releasing OpenAI Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. This release is part of our broader effort to support a more resilient software ecosystem by providing developers practical infrastructure for building with AI safely, including tools and models that make strong privacy and security protections easier to implement from the start.
Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass.
At OpenAI, we use a fine-tuned version of Privacy Filter in our own privacy-preserving workflows. We developed Privacy Filter because we believe that with the latest AI capabilities, we could raise the standard for privacy beyond what was already on the market. The version of Privacy Filter we are releasing today achieves state-of-the-art performance on the PII-Masking-300k benchmark, when corrected for annotation issues we identified during evaluation.
With this release, developers can run Privacy Filter in their own environments, fine tune it to their own use cases, and build stronger privacy protections into training, indexing, logging, and review pipelines.
Privacy protection in modern AI systems depends on more than pattern matching. Traditional PII detection tools often rely on deterministic rules for formats like phone numbers and email addresses. They can work well for narrow cases, but they often miss more subtle personal information and struggle with context.
Privacy Filter is built with deeper language and context awareness for more nuanced performance. By combining strong language understanding with a privacy-specific labeling system, it can detect a wider range of PII in unstructured text, including cases where the right decision depends on context. It can better distinguish between information that should be preserved because it is public, and information that should be masked or redacted because it relates to a private individual.
The result is a model that is strong enough to deliver frontier-level privacy filtering performance. At the same time, the model is small enough to be run locally–meaning data that has yet to be filtered can remain on device, with less risk of exposure, rather than needing to be sent to a server for de-identification.
Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
This architecture gives Privacy Filter a few useful properties for production use:
The released model has 1.5B total parameters with 50M active parameters.
Privacy Filter predicts spans across eight categories:
private_personprivate_addressprivate_emailprivate_phoneprivate_urlprivate_dateaccount_numbersecretThe account_number category helps mask a wide variety of account numbers, including banking info like credit card numbers and bank account numbers, while secret helps mask things like passwords and API keys.
These labels are decoded with BIOES span tags, which helps produce cleaner and more coherent masking boundaries.
We developed Privacy Filter in several stages.
First, we built a privacy taxonomy that defines the types of spans the model should detect. This includes personal identifiers, contact details, addresses, private dates, many different kinds of account numbers such as credit and banking information, and secrets such as API keys and passwords.
Second, we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.
Third, we trained on a mixture of publicly available and synthetic data designed to capture both realistic text and difficult privacy patterns. In parts of the public data where labels were incomplete, we used model-assisted annotation and review to improve coverage. We also generated synthetic examples to increase diversity across formats, contexts, and privacy subtypes.
At inference time, the model's token-level predictions are decoded into coherent spans using constrained sequence decoding. This approach preserves the broad language understanding of the pretrained model while specializing it for privacy detection.
We evaluated Privacy Filter on standard benchmarks and on additional synthetic and chat-style evaluations designed to test harder, more context-sensitive cases.
On the PII-Masking-300k(opens in a new window) benchmark, Privacy Filter achieves an F1 score of 96% (94.04% precision and 98.04% recall). On a corrected version of the benchmark that accounts for dataset annotation issues identified during review, the F1 score is 97.43% (96.79% precision and 98.08% recall).
We also found that the model can be adapted efficiently. Fine-tuning on even a small amount of data quickly improves accuracy on domain-specific tasks, increasing F1 score from 54% to 96% and approaches saturation on the domain-adaption benchmark we evaluated.
Beyond benchmark performance, Privacy Filter is designed for practical privacy filtering in noisy, real-world text. That includes long documents, ambiguous references, mixed-format strings, and software-related secrets. The model card (opens in a new window)also reports targeted evaluation on secret detection in codebases and stress tests across multilingual, adversarial, and context-dependent examples.
Privacy Filter is not an anonymization tool, a compliance certification, or a substitute for policy review in high-stakes settings. It is one component in a broader privacy-by-design system.
Its behavior reflects the label taxonomy and decision boundaries it was trained on. Different organizations may want different detection or masking policies, and those policies may require in-domain evaluation or further fine-tuning. Performance may also vary across languages, scripts, naming conventions, and domains that differ from the training distribution.
Like all models, Privacy Filter can make mistakes. It can miss uncommon identifiers or ambiguous private references, and it can over- or under-redact entities when context is limited, especially in short sequences. In high-sensitivity domains such as legal, medical, and financial workflows, human review and domain-specific evaluation and fine-tuning remain important.
We are releasing OpenAI Privacy Filter to support stronger privacy protections across the ecosystem.
The model is available today under the Apache 2.0 license on Hugging Face(opens in a new window) and Github(opens in a new window). It is intended for experimentation, customization, and commercial deployment, and it can be fine-tuned for different data distributions and privacy policies.
Alongside the model, we are sharing documentation covering the model architecture, label taxonomy, decoding controls, intended use cases, evaluation setup, and known limitations, so teams can understand both what the model does well and where it should be used carefully.
Privacy protection for AI systems is an ongoing effort across research, product design, evaluation, and deployment.
Privacy Filter reflects one direction we believe is important: small, efficient models with frontier capability in narrowly defined tasks that matter for real-world AI systems. We are releasing it because we think privacy-preserving infrastructure should be easier to inspect, run, adapt, and improve.
Our goal is for models to learn about the world, not about private individuals. Privacy Filter helps make that possible.
We’re releasing this preview of Privacy Filter to receive feedback from the research and privacy community and iterate further on model performance.