I could think of quite a few things. I know that my bank and brokerage use voice ID.
The deeper problem is that most of these companies collected this data because they could, not because they needed it for the core service. 'Datensparsamkeit' is the right frame: the voice samples were a liability sitting on a server waiting for exactly this.
I love that the answer here is basically.. - you don't -
But maybe mitigate at unreasonable personal costs.
How about services simply stop taking public information as proof of identity?
Half the time I call a company they say “we are recording your voice for security / authentication purposes”.
The companies that do that have all the information on me that they require for me to set up an account, so their data breaches will be just like this one, but 1000x larger.
Can we just fast forward through the part where this works for ID theft, past the firefox age verification plugin that uses these datasets, and even through the part where people in the plugin dataset are digital outcasts (this voice has been used too many times. Want to try another?)
At the end of this dark predictable tunnel, maybe there will be a ban on biometrics for important stuff, a repeal of the age verification laws, and actual privacy legislation with teeth.
The scarier piece is that an attacker pulls a contractor from the dump, finds their employer on linkedin, then calls that companys IT helpdesk for a password reset with the cloned voice.
Fwiw we put up a free realtime face swap demo a while back at https://www.callstrike.ai/deepfake-security-training .. worth a look if you want to actually feel how trivial this has gotten.
Awesome, if you're a victim of an AI company having your voice, you can help yourself by sending another AI company your voice!
> Audio is never used to train commercial models without explicit consent
I'm sure Mercor has explicit consent as well, legal teams are reasonably good at legally covering their asses with license terms.
Happy to discuss the forensic detection side. AudioSeal
watermarks, AASIST anti-spoofing, and how the detection landscape changes
once voice biometrics start leaking at scale.Germans (because of course) have a word for this: "Datensparsamkeit". Being frugal with your data.
I jest but the majority of the "normal" people I know are happy to hand over biometrics because _it's easier_. We need to start branding biometrics as "forever passwords" or something to help people understand just what they're handing over when they validate access to their checking account or enter Disney World or whatever else.
good luck with this. most finance people deal with hundreds to thousands of clients. they obviously cant remember everyones code word. commonly used finance systems arent setup to securely store these codewords. they dont have processes or policies in place to implement or adhere to any sort of codeword verification.
>Rotate where voiceprints are still in use. [...] Do that now, ideally from a new recording in a different acoustic environment than the leaked sample.
would this even have an effect? i have never heard of "rotating" a voice print. isnt the whole point of a voice print that you cant really change it? if simply switching your environment completely changes your voice print, that would make voice prints utterly useless to begin with.
I've had to open a bank account for a company here a few years ago and that was right on the bubble of this happening and they still had an option to come by in person with the proper documentation, which I did, now it is all outsourced.
These companies are the fattest targets and they're run by incompetents. You should assume that anything you give them will eventually be part of some hack.
Can't wait for them to crash and burn.
A lot of people were basically wiretapping themselves AND their businesses!
While a lot of Mercor "contractors" claim Mercor over-reached with data gathering via Insightful, it's kind of smart because people are too afraid to complain too much knowing they'll not only lose their primary job, but also open themselves up to uncapped liability for willful misconduct.
[0] https://www.wsj.com/tech/ai/mercor-ai-startup-personal-data-...
The good thing about the grift economy is it grifts itself, like the turtles!
I don’t even use biometrics on apple devices, I use a 6 digit pin.
It was always a stupid idea.
The thing about been willing to trade convenience for security is you get called paranoid and then when the other shoe does drop and you are still doing that you still get called paranoid for the current thing you are not doing that “everyone does”.
Even have a nice UI on top.
The other use cases (like calling payroll, etc) likely don’t have the same protections and probably would be more effective.
Now 40k people have learned that biometrics aren't passwords. You can't rotate your voice.
In the idealized world, the legal system is meant to provide an accessible alternative to violence for reconciling disputes, but it's increasingly wielded as an impossibly kafkaesque system meant to maintain corporate power over individuals.
I think "CYA" is an overly-flowery term for the reality that they're blocking every avenue for legal recourse, while a variety of other avenues still exist for which adding friction requires the maintenance of expensive and ongoing costs (owning multiple residences, hiring security, etc.)
(To be clear, I am advocating for a more accessible and level legal system, not for UHC-style violence.)
Selling the solution to the problem you caused ought to be illegal.
Mercer hasn't released many public statements over the incident. Social media posts aren't necessarily public; but I did find this breach notification sample filed with CA - https://oag.ca.gov/ecrime/databreach/reports/sb24-621099 . I guess we'll see if our legislators finally take data privacy seriously.
Voices aren't strong.
There just aren't that many unique characteristic parameters behind a voice - it's largely dictated by an evolutionary shared shared larynx and vocal tract. They aren't fingerprints.
The fact that human voice impersonation is not only widely possible but popular should give you an indication of this. Prosody, intonation, range, etc. - it's all flexible and can be learned and duplicated.
The signals are simple too, because we have to encode and decode them quickly. You may or may not be able to picture and rotate an apple tree in your head, but you can easily read this sentence in the voice of David Attenborough.
Moreover, you can easily fine tune a voice model to fit any other speaker. You can store the unique speaker embeddings in a very thin layer. Zero and few shot unseen sampling can even come close to full reproduction. You can measure this all quantitatively.
Voices are not, and never have been, fingerprints. They're just not that unique.
Mercor has definitely released statements with boilerplate "investigations are underway."
Court records are public in the US. If creditors want to know if you’ve been in financial trouble, they should check for bankruptcies and lawsuits, not the extrajudicial version of those that the credit reporting companies run based on hearsay.
Except no company is learning this lesson.
The enterprise threat model includes "our own users", and the modus operandi is to maintain as much information on that threat as possible.
in a certain light, it's kind of admirable. they live like the world is the way it should be
there are automated systems for this already. my bank, isp, etc. use them when you call in to skip the traditional verification steps. this fact is also highlighted in the article.
the problem is that there isnt typically a system in place for setting up or validating code words, so the advice given is not practical to implement.
"My voice is my passport. Verify me."
I have to renew my passport every 10 years or so. How do I do that with my voice? I guess it's time to take some vocal lessons.
Ah, I see. So, when discussing ways to ensure cuatomers cannot utilize our warranty process, I'll make sure to do so in ways that are not traceable and won't show up in discovery.
Most tech solutions are built on the problems they created. This includes phones, cars, computers, every software upgrade, and almost every electronic gadget. You are forced to use them because the world around you is no longer compatible with the way of life that was before the introduction of these tech.
This is an overly flowery way of saying: violence.
The worst of the consequences are the same. People end up dead, destitute, and/or with long-term health consequences and are unable to enjoy the fruits labor in the worst cases. In the milder cases i think i'd prefer a bruise for a week to a huge financial loss.
I don't know if it's the reason you imply. In the 70s, there were big debates in Germany about privacy and data storage. They spoke of one's data shadow (Datenschatten). I suspect this word comes from that tradition. The reason the word exists would then be the reflection (Verwaltigung) on WW2.
Fingerprints, DNA, iris scans, gait patterns, etc. are all something you can't change (much like a permanent account ID) and are constantly being presented to the world (much like an email address). In addition under US law, police can compel presentation of fingerprints, but passwords are protected under the 5th amendment.
So I could easily see a lot of people viewing this as a positive.
I mean, just look at what happened to Crowdstrike....
The fediverse take on that was "customers are advised to rotate their faces and birthdays."
Heard that first from a US mil commander who once ran for a minor political office like state rep.
In the US of course the government buys this sort of information legally from corporations.
So yeah, of course they've developed that type of distrust. Americans should have also after the 50-60s paranoia of red scare, black people etc. Instead they just spend a few decades building a anti-social state.
Nowadays you just throw all the data into a black box and believe whatever it says blindly.
[0] https://www.opensourceshakespeare.org/views/plays/play_view....
Kind of nuts all the ways audio data can be used now.
The bigger the company, the more speculation there is about stuff people don't actually understand.
Similarly, phones are required now for some activities, like online banking. First it was an option, then it became the norm.
But yes. We Americans know Germans more for their silly big words. But statements like that can be misinterpreted as the German perspective of themselves doesn't quite match the American stereotypes.
There is also the rather famous example of how earlier census data was used in the 40’s.
Once the government has your data, they have it. The next generation of representatives may not follow all the same rules and norms
- we learned the hard way that data will be used to kill people, during the Nazi regime
- we learned it again in the GDR with the Stasi being a little less obvious but still ruining people's livelihoods
- and German comes up with compound words for such things
Who doesn’t want that old post going extinct forever when they were shit faced outside of a bar in Nashville but now they are in their mid-life and are “respectable” members of society.
Related "monetizing user data" seems to just mean ads. Ads on everything, forever, until the userbase gets fed up and moves to a new service that definitely won't do that, and the cycle repeats about every 3 years.
Do you need to calibrate it to be able to repeat it, and does that calibration change if you are at a different altitude and in different conditions, such as humidity?
Does merely changing altitude (or ambient pressure) change voice enough to be considered different by a recognition or synthesizing system?
In other cases I have heard people who ought to know better speculating about “what if” they didn’t have to follow the letter of some corporate policy that was rooted in risk avoidance. Again, it looks bad but it doesn’t mean anything concrete (except that the person might have iffy judgment).
I see this whenever an LLM’s impact is assessed. We know. The issue is scale and the ability for smaller and smaller groups (down to individuals) to execute at scale.
Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable.
Them being forever passwords is the value prop. The risk scene has changed, but that was essentially always the pitch.
I guess you don't listen to Sinatra.
Forensic intelligence // Breach analysis
By the ORAVYS forensic desk Published April 24, 2026 ~7 min read
On April 4, 2026, the extortion group Lapsus$ posted Mercor on its leak site. The dump is reported at roughly four terabytes and bundles a payload that breach analysts have been warning about for two years: voice biometrics paired with the same person's government-issued identity document. According to the leaked sample index, the archive covers more than 40,000 contractors who signed up to label data, record reading passages, and run through verification calls for AI training.
Five contractor lawsuits were filed within ten days of the post. The plaintiffs argue that the company collected voice prints under a "training data" framing without making clear they were also a permanent biometric identifier. The lawsuits matter, but the people whose voices were already exfiltrated have a more immediate question. What does an attacker actually do with thirty seconds of someone's clean read voice plus a scan of their driver's license?
Most voice leaks in the last decade fell into one of two buckets. Either a call center got popped and recordings were stolen with no easy way to map them back to identity. Or an ID-document broker leaked driver's licenses and selfies without any audio attached. Mercor merged both columns. The contractor onboarding pipeline asked for a passport or driver's license scan, then a webcam selfie, then a sit-down voice recording reading scripted prompts in a quiet room. That sequence, in one row of one database, is exactly what a synthetic voice cloning service needs as input.
The Wall Street Journal reported in February 2026 that high-quality voice cloning now requires roughly fifteen seconds of clean reference audio for tools available off the shelf. The Mercor recordings are reported to average two to five minutes of studio-clean speech per contractor. That is far past the threshold. Pair it with a verified ID document and the attacker has both the clone and the credential needed to put the clone to work.
The threat models below are not speculative. Each is a documented technique already used in the wild before this breach.
If you ever uploaded a voice sample to Mercor, or to any of the other AI training brokers that operated through 2025, treat your voice the way you would treat a leaked password. You cannot rotate it, but you can change what it unlocks. Here is the short list.
When a sample lands on a forensic analyst's desk, the following artifacts are the first pass. Each is something a synthetic voice tends to get slightly wrong, even when the perceptual quality is high.
If you were a Mercor contractor and you believe your voice may already be in circulation, ORAVYS will analyze the first three suspect samples free of charge. You will receive a forensic report covering watermark detection, anti-spoofing score, and the artifact checklist above. No card required, no quota gate.
Sources cited in this article: Lapsus$ leak site index (April 2026), Wall Street Journal voice cloning report (February 2026), Pindrop Voice Intelligence Report 2025, FBI IC3 Elder Fraud Report 2026, Krebs on Security archives. Lawsuit references are matters of public record. ORAVYS does not host or redistribute the leaked dataset and does not accept it as input.
Or did you mean the "big data" crowd which thought 500GB was noteworthy? I don't think anyone took those serious, neither in 2010s nor now. That was always "small" data
I have the faintest possible hope that such things are going to be the death knell of social media. Yeah a lot of credulous idiots are happily giving AI thirst traps their money for stroking their confirmation bias, but that's just who's left at this point. It feels like every social media app I use is gradually bleeding users who aren't hopelessly addicted to the dopamine treadmill, because what's left is just plain unappealing to them, which selects for the people who are most vulnerable to AI shit, which is far from ideal, but also means those platforms are comprised ever more of that vulnerable population and nobody else. And the problem with all these businesses going through that is without a diverse, growing audience, you just become InfoWars, slinging the same slop to the same people every day, and every ounce of said slop is great for what's left of your audience, but absolute garbage for getting anyone new in it. And it just goes on that way until you sputter out and die (or harass the wrong group of parents I guess).
I wish all social media sites a very haha die in a fire.
500GB is in the "fits" category.
My concern is that I can open up chatGPT and even with a free, “anonymous” account run an assembly line generating tens of thousands of words a day to pump to Twitter that are good enough to prop up multiple fake accounts and cause mayhem.
Now make it thousands of people like me doing it. Now add funding and political orgs. Add company leadership that turns a blind eye so long as it drives engagement. This scale and pipeline wasn’t possible 5 years ago, even if we clearly see the throughline.
I’m not even getting into fake images either. That used to require some know how. There are basically no hurdles and even if most people learn it’s fake, millions likely won’t. If you’re a little lucky, less scrupulous “news” outlets will amplify it for you as well for free.
No media uploading, memes are few and far between (usually punished), etc.
https://www.tomshardware.com/pc-components/ssds/kioxia-unvei...
Or 250 of these ~$400 4tb flash drives and an insane number of dongles to connect them all:
https://www.slashgear.com/1847725/largest-usb-thumb-drive-hi...