Reading the MPEG1 specs back in the 90s as a child opened my eyes to how to define complex systems. For a media coding standard, they spent most of their time saying how to interpret encoded bytes, which I realized is genius. Be descriptive about decoding and you don't have to be prescriptive about encoding. Encoding is where you can apply all the creativity, but you need to provide a way to have a shared understanding of the encoded bytes.
HN hug of death
The AV2 Video Standard Has Released (Final v1.0 Specification)
AV1 software decoding is already very intensive so AV2 decoding benchmarks are the next thing that would be really interesting (or mortifying) to see.
... improvements around 25% compared to AV1
AV2 decoding is roughly five times more complex than AV1 decoding
I'm not sure what these two lines mean or if we can compare them, any help?I guess 5 years ago (around the time when Intel stopped making SSE-only chips) is technically "older", but I wouldn't prioritize avx2 when devices intended for consuming media definitely experience much less pressure to upgrade than workstations…
But now with AV2 and Dav2d, that completely breaks. Are we eventually going to get AV3/Dav3d and AV4/Dav4d, which will read like Ave/Daved and Ava/Davad? Seems a bit awkward. Was the idea from the start to have the 1 be the version number, and have it specifically be part of the name?
Like we had weird examples like C compilers and Bun. This is a much more interesting example because its highly nontrivial.
AV1 exists, Dav1d exists. Lets see AI take the AV2 spec and Dav1d code and try to make a working high performance AV2 decoder.
AV1 was designed as royalty-free, but Sisvel’s pool and the recent Dolby/Snap proved the contrary.
https://accessadvance.com/2026/03/24/access-advance-licensor...
Yes, this is going to be fun to watch.
Studios still release new dvds with mpeg2 video. Online videos tend to be available in many codecs. Video conferencing tends to negotiate to best available or has settled on ancient codecs and won't change quickly.
https://www.youtube.com/watch?v=XqZsoesa55w
That extra 25% becomes worth it.
Nothing will become obsolete. AV1 will stick around for a long time. And YouTube still does H.264 encodes to support old devices.
Tangent but I cannot wait for h269 (or h267 for the younger gen)
*Da5id
Rust does not bring more performance. Just more safety.
``` Too Many Requests The page you have tried to access is not available because the owner of the file you are trying to access has exceeded our short term bandwidth limits. Please try again shortly.
Details: Actioning this file would cause "jbkempf.com//blog/2026/dav2d/" to exceed the per-day file actions limit of 160000 actions, try again later ```
Hope we get a similar option with future lineups that support AV2, especially given how popular video creation and streaming are now.
https://gist.github.com/MartinEesmaa/2f4b261cb90a47e9c41ba11...
Devices with AV1 hardware decoding - rare as they are - won't be obsoleted for a long time.
Shrek 1 at 8.34MB including audio.. insane
Netflix uses AV1: https://netflixtechblog.com/av1-now-powering-30-of-netflix-s...
YouTube uses AV1. It's tough to be more mainstream than that.
Right click on a YouTube video and select Stats for Nerds. If your system is capable of it, chances are it will be playing back in AV1.
Most of the YouTube videos I watch these days are AV1 encodes. Sometimes it's in VP9 and occasionally it's H.264.
The point of encoding is to reduce downstream bandwidth for the viewer, and upstream bandwidth for the distribution network.
The content creator only needs to upload it once.
This is an odd signoff. Are people having a go at dav2d?
The answer is probably the same as for why not AV2 everything; a lot of hardware couldn't support it today. But in 10 years?
It seems we're running up against fundamental limits of human-engineered video codecs at this point. There might be a lesson in there.
AV2 saves 25% bandwidth at the cost of 5x more decoding complexity.
If you can stand Lex Friedman for a bit, the VLC authors talk about why you use ASM for a video decoder instead of pure C or rust.
It's a semi-common last name.
Either go back read the answers there first, or I will assume you are part of a FUD campaign (yes, I know HN guidelines, but again every single AV2 news in the last week has seen the same rhetorical "questions" as top "comments").
[1] https://archive.org/details/Shrek-Video-GBA [2] https://www.youtube.com/watch?v=CyOfPZQl4MI
The way they weave these instructions can be very hard to express with a high level language.
Further, there's a ton of work with arrays and importantly parts of arrays. They can, for example, need to extract every other element up to 1/2 the array. Unfortunately, rust has runtime array bounds checks which make writing that sort of code slower. The compiler can elade those checks, but usually only in simple cases.
The authors would be writing a bunch of unsafe rust to get the performance they want and rust makes that more painful on purpose.
I like rust, but C/ASM really is the right choice here. This is one of the few cases where rust's safety is a major detriment.
it's not much slower than the original C/ASM implementation (last i checked ~5%?) but that matters here
https://code.videolan.org/videolan/dav2d/-/blob/main/src/dat...
There is a project to write an AV1 decoder in Rust: Rav1d (really stretching the name here).
https://github.com/memorysafety/rav1d
They got within 5% of the performance of dav1d and held a contest to close the gap but I think I read somewhere that this wasn't achieved.
https://www.memorysafety.org/blog/rav1d-perf-bounty/
They claimed
> This is enough of a difference to be a problem for potential adopters, and, frankly, it just bothers us.
But in my opinion nobody actually cares about 5% in absolute terms. It's likely just Rust naysayers using that as an excuse.
I think the likely reason for dav2d using C is that they can reuse lots of code and infrastructure from dav1d. But I agree it would be much better if they worked on Rav2d instead (these names!). You can hardly complain about a 5% overhead if you're opting in to 5x more decoding complexity.
They're claiming that there are patents, but that doesn't mean there are.
I think you might be misunderestimating how incredible the dav1d AV1 decoder is. Not only does it require less total time than the reference decoder to decode the same video, but it can spread that out over far more threads. I was unable to watch 4k 60fps av1 video on my media center PC (it's from 2019, so predates hardware av1 decoding, and, well, the CPU was a little long in the tooth) until I switched to dav1d. With dav1d I am now able to watch 4k 60fps av1 using software decoding, and my machine uses 10% CPU while doing so. Really amazing piece of software.
With any luck, the dav2d 5x claim will hold true, and 10% CPU usage will scale to 50% CPU usage, meaning I'm still able to watch 4k 60fps video on my media center without a hardware upgrade. (that machine doesn't have hyperthreading, so 50% cpu is actually 50%, not 100% in a fancy suit)
That sounds like one of these high-risk, high-reward things that are great for people / projects / companies who have nothing to lose, but is not a great baseline strategy for an established market player. AV2 is here with support from aomedia and its members. AV2 will be used, and we need a production-grade decoder regardless of where AI is at, so it makes much more conservative business sense to use established approaches (language: c/asm, devteam: ffmpeg/dav1d) as a starting point. While that's happening, we can dabble in AI and other risky stuff and see if it helps. If so, great, and if not, nothing lost.
What's missing mostly: live streams which are h264.
Currently, and I say currently, dav1d is so fast, no worries on that side.
This could work / works for video too, give it lower resolution / quality images and AI upscale. Its predecessor would calculate intermediate frames for example.
Consumer Display Device: EUR 0.32
Consumer Non-Display Device: EUR 0.11
(source here: https://www.sisvel.com/licensing-programmes/audio-and-video-...)
And it's not really hardware hitting limits, it's specifically software decoding on somewhat weaker machines.
Even on 1080p videos running on AV1 on 1x, the TV system bogs down and any kind of interaction has a variable 1-3s lag. On some TVs if you do 1.25x the TV automatically "downgrades" the resolution to 480p to avoid dropping frames.
I wish there was an option to still use VP9 / H.264 on those systems (even limited to 1080p).
Adding custom hardware like tensor cores to the stack would serves a different use case.
I assume there will be an SVT-AV2 too which will semi-automatically gain from the SVT foundation for working with lots of cores but will still need specific work to support and then tune AV2 encoding.
I am not sure if it is that much safer than the C version when raw assembly is still required.
I didn't mean that the Dav1d people should yolo vibe code Dav2d. My point was this this is a very interesting possible experiment since there is no existing Dav2d contamination in the training data.
for other cases, I can just wait more for my cpu/gpu/cloud to do the job
Dolby is somewhat more interesting in that rather than scare tactics, media hype, and attempting to form a pool about it they are actually taking a patent assertion claim to court.
So they seem to be attempting to pull a fast one and use unproven claims to try and convert their competitor into a replacement revenue source.
It'll probably be a case of whoever has the best lawyers + contacts + persistence wins.
But it'll be interesting if discovery shows evidence they know they don't have a case and are trying it anyway. "Piercing the corporate veil" can theoretically be a consequence of that AFAIK.
I can claim the same and offer licenses per device.
Just “AV”
Next, AV Series 1 and 2 (released simultaneously)
Later, AV Edition but it costs $10,000
Forgive the ignorance, I have worked entirely in the abstracted layers of the stack, and mostly web.
The host here has a limit of 160000 files served each day. That is extremely low. If the site has an icon, css, a js file and a few images it's 10 files each visit. That's will limit it to 16k visits/day. If there are more files loaded it might just handle a few thousand visits, and they have received more than that from HN now.
An uncompressed 1080p, 60fps video with 24-bit color depth would need around 3Gbps to be streamed. And even if you don't need to stream it, that would still consume a sizeable portion of the write throughput of the fastest SSDs currently available; if you go up to 4K, you'd actually exceed that by a lot (not to mention, 1tb of storage would last for about 10 minutes of video).
It does if you ask them, or at least research the topic at hand.
Though more safety can in some cases bring a bit more performance. For instance, with Rust you can often avoid "defensive copies" of objects.
And as soon as you walk into concurrency territory for a complex codec like this then it seems almost impossible for humans to do correctly while retaining safety.
For a web browser, or a server in a bank, sure. For anything else, questionable.
> adding a sandbox around a memory-unsafe codec is going to be way more expensive
In modern world, overhead of strong sandboxes is surprisingly small. A nuclear but most reliable option is hardware assisted VM. On modern computers with SLAT and virtualized IO the overhead for most use cases is negligible. If you want something lighter weight, can use a multi-user nature of all modern OS kernels and isolate into a separate process with restricted permissions. Sandboxing overhead is approximately zero.
Young AV?
A codec does not really exist until everyone can decode it.
Today, we announce dav2d, a fast decoder for the new AV2 codec, developed by members of the VideoLAN community.
A few weeks ago, we opened the repository and started development in public. Since then, AV2 itself has reached its first official specification release, making this a good moment to explain what dav2d is, why we started it, and where the project stands today.
dav2d is the continuation of the work we started with dav1d, our AV1 decoder.
The goal is similar: provide a small, fast, portable and correct decoder, suitable for real applications, media players, browsers, test tools and operating systems.
AV2 is the successor to AV1 and the latest royalty-free video codec from the Alliance for Open Media.
The specification is now publicly available at:
AV1 was finalized in 2018 and became one of the most successful video codecs ever deployed. Today it is available in browsers, mobile devices, operating systems, televisions, streaming services and video applications around the world.
AV2 builds on that success. The codec introduces new coding tools across prediction, transforms, entropy coding, filtering and chroma processing, while continuing the goal of improving compression efficiency.
The reported gains vary depending on the test conditions, but improvements around 25% compared to AV1 are commonly seen, with some evaluations reporting even larger gains.
AV2 decoding is roughly five times more complex than AV1 decoding. In practice, that means software running on today’s hardware will struggle to decode AV2 in real time without careful, architecture-specific optimization.
This is why we started dav2d early rather than waiting for the specification to stabilize.
The origins of dav2d go back to the beginning of dav1d.
When AV1 was being finalized, we pushed for a fast software decoder, because we did not believe hardware decoding would become available quickly enough, or on enough devices.
Not everyone agreed with that assessment. Some members of the AOM community felt that hardware implementations and the reference decoder would be sufficient.
We thought otherwise. Browsers, media players, operating systems and mobile devices would need a production-quality decoder long before dedicated hardware became commonplace.
In the end, AOM itself funded part of the initial development work and some members of the Alliance eventually joined that effort.
The result was dav1d.
In hindsight, the need for a fast software decoder proved larger than many people expected.
Today, dav1d is the most widely deployed AV1 software decoder.
It is used in VLC, FFmpeg, mpv, Firefox, Chrome, Safari, Android, Windows, Linux and many other applications and platforms.
The project has also become the reference AV1 decoder implementation for many developers working on AV1 deployment, testing and optimization.
You can read the full history of dav1d on this blog: Introducing dav1d, the road to the first release, First release, dav1d 1.2 and 1.5 “Sonic”.
With AV2, we are trying to start that work earlier.
A codec specification is important, but it is not enough. Developers need a decoder that can be built, tested, benchmarked, integrated and compared against other implementations.
This is what dav2d is meant to provide.
The current dav2d tree already contains a feature-complete AVM v15 decoder supporting both 8-bit and 10-bit decoding.
Most major parts of the codec are already implemented and are now being optimized, including:
This is still early work, and the AV2 ecosystem itself is still young, but the decoder is already functional and far beyond an empty announcement repository.
A growing part of the work is now focused on correctness, conformance, optimization and platform support.
One reason the project has progressed so quickly is that dav2d does not start from scratch. AV2 shares many concepts with AV1, and dav1d already solved a number of architectural questions around threading, SIMD organization, testing, portability and API design.
While AV2 requires substantial new decoder code, a lot of the experience accumulated over years of dav1d development transfers directly to dav2d.
The performance work has already started.
On x86, dav2d already contains AVX2 code for several inverse transform sizes, as well as work around CCTX, deblock, intra prediction and CfL-related paths.
On ARM, there is already AArch64 NEON work for entropy decoding, SAD, intra prediction, palette prediction, DC predictors, smooth predictors and motion-related functions. Some arm32 work has also started.
There is also early RISC-V work, mostly around re-enabling and adapting existing intra prediction and motion compensation assembly.
This is the same kind of progression we had with dav1d: first a clean C implementation, then validation infrastructure, then architecture-specific optimized code for the most important hot paths.
One important difference compared to the early days of dav1d is tooling.
During the development of dav1d, we created checkasm, a framework used to validate and benchmark optimized implementations against their C equivalents.
dav2d benefits from that infrastructure from day one.
Combined with the architectural experience gained from dav1d, this has allowed the project to progress considerably faster than dav1d did at a comparable stage.
The current tree already contains checkasm coverage for several areas, including inverse transforms, motion compensation, film grain, CfL and reference motion-vector code.
This should make future optimization work both faster and safer.
Like dav1d before it, dav2d is developed as an open source project.
The decoder is released under the same BSD-style license as dav1d, making it easy to integrate into open source and proprietary applications alike.
As with most VideoLAN projects, development happens in public from day one:
We believe open implementations are essential for the healthy deployment of new media technologies. They provide interoperability, independent validation of specifications, easier experimentation, and a common foundation for the ecosystem.
There is still a lot of work ahead.
We need to continue tracking the AV2 specification, improve conformance, extend test coverage, optimize further x86 and ARM, work on RISC-V, improve high bit-depth performance, improve threading, reduce memory usage and prepare future releases.
But the foundations are already there: the tooling, the architecture and the experience gained from dav1d, with additional improvements.
dav1d helped make AV1 practical long before hardware support became ubiquitous.
We intend to do the same for AV2.
Let dav2d be. From VideoLAN, with love.
What I'm saying is the performance problem is a "code smell". The algorithms are getting so complicated that perhaps we are approaching fundamental limits of heuristiccs; we might get better + faster results ditching "smart" algorithms and just learning the codec in a much higher dimensional space.
Again specialized hardware, but a different approach to it.
rav1d is not a full rewrite of dav1d to rust. So it really doesn't show that. It's currently C + rust + asm.
I don't think we can say anything about what this does or does not prove about the performance of safe code.
> Performance should not be priority #1. Security should be.
Entirely depends on the application. The reason rust has `unsafe` is because there's some situations where performance needs to preempt potential security problems.
How can you claim nobody cares about 5%? A 5% performance increase is significant. And video decoding is not always for playback, where 5% may not matter as much.
Do you not have 98% high speed 5G coverage?
Compressing to AV1/h264/265 etc is really only done for the final version, but that doesn't mean that videos are stored in RAW format during editing, where it is very common to store frames locally in Apple ProRes, Avid DNxHD, or some other compressed format that's targeted towards professional editing.
Contrary to AV1 or whatever similar format which offer compression ratios of 1000x and more, these formats have a compression ratio of around 10x. They are very simple, and the quality loss is low enough that it doesn't matter. They also tend to store images with 30 bits per pixel instead of the 24 bpp that's normally used for streaming.
I noticed that too. When I tried extreme screen recording compression with AV1 audio became a noticeable part of the bottleneck.
C makes it easy to be fast but hard to be safe. Rust makes it easy to be safe but hard to be fast.
Also note that video codecs tend to wrap C or Rust around handcrafted ASM. Performance is king.
Rust can only prove a limited subset of correct programs to be safe, when you're doing bare metal stuff you've often not in that subsystem and drop down to unsafe. I'm guessing there's always stuff that's not perf critical and can live in Rust sandbox - so not saying no wins - but it doesn't sound like Rust is a no-brainer.
Difficult to tell - that's the point!
Why shouldn't safety be the default? If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.
Software decoding has a safety culture problem, and we need to talk about it.
If so, FFmpeg's stance is very understandable in my opinion.
Leaner delivery is not just ethical, but it also makes better business sense.
You also will need _some_ sort of encoding locally before uploading, even if it's minimal, which could lead to issues when encoded again (although there are codecs available to minimize this).
ProRes and the like are used for proxies or quick and dirty productions that are mostly shooting their look in camera because of a fast turnaround time. This is usually event work on a budget or something for social media.
I still think no one should in 2026 be writing a nontrivial codec or anything parsing untrusted data, in C. There's just no excuse.
The gains are re-use of skill and code. And I hope that's the reason this is continuing with C, this is basically a v2 of an existing project, not a greenfield codec, even if it's much larger.
Because safe code isn't fast enough to decode live video.
> If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.
Yeah, but then you are undermining safety in a critical way that does lead to security vulnerabilities (buffer overflow). And you are also now maintaining and requiring other devs for a project to use a custom version of rustc. That's certainly part of the reason that's simply not happened.
But another major part of it is that encoders end up with a lot of custom ASM regardless. That custom ASM is going to be where vulnerabilities end up. You don't really escape that by using rust.
If you are already abandoning where you critically need safety the most for performance, then why pick a language that additionally penalizes you for using unsafe constructs?
> Software decoding has a safety culture problem, and we need to talk about it.
Compilers and languages have an optimization problem that we need to talk about. SIMD optimizations remain a very hard thing for compilers to get right. We should talk about what it'd take to make compilers better and the reasons for why codec devs need to drop down to asm instead of using a high level compiler.
There might not be a solution to this problem, there are reasons for it.
Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks. It feels like the C developers are getting the parts outside the ASM kernels wrong.
It is not disingenuous given the context. Gp was responding to ggp's hypothetical:
>> Is there a compelling reason encoding needs to be done locally?
Media decoders are one of the highest risk programs since they deal with untrusted user input and are incredibly complex. So just because a large project like ffmpeg uses C, doesn't mean there isn't very good reason to consider a language like Rust for saftey reasons.
For any specific bitrate and quality target, there's a good chance it'll be faster.
Hand written assembly. It's quite easy to accidentally start reading or manipulating a block of memory you didn't intend to when doing complex SIMD transformations.
> Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks.
I think you don't really understand how codecs work. It is not uncommon for a transformation like `a = b[c[i] * 3 + offset];`. There's no way for a compiler to omit the bounds check because it can't prove the contents of `c` aren't going to exceed the bounds of `b`.
This isn't a "crappy C developer" problem. This is a "There isn't a language that does a great job at capturing high level SIMD expressions" problem.
I strongly doubt that.
And if any implementation of AV2 can be "fast enough", then there should be no question at all that we can write "fast enough" safe decoders for every other codec. Absolutely no way safe code is inherently that much slower.
Whether that counts is up to you. I suppose it's still "sandboxed" in that it runs in a less privileged context than the kernel.
The disadvantage in speed when using Rust is pretty obvious.[1] When it comes to video encoding and decoding, I and FFmpeg care a lot more about speed than memory safety. So those reasons have been considered and largely discounted.
[1] https://xcancel.com/FFmpeg/status/1924137645988356437 (to be fair, this is only transpiled from C, so it could probably be optimised further, but that apparently needed a 20k USD bounty to then not even happen (as far as I can tell))
I've actually done a version this for some multi-system live AV at an event before. Between the main software mixer workstations at various fields in the event it was a dumb but simple encoding they could do in hardware at a high bitrate and then in the machine compositing for the livestream out it did AV1 software encoding to upload to the streaming site to minimize bandwidth requirement from the venue and maximize quality on the streaming site. We've since upgraded to hardware with AV1 encode though.
The practical downside is AV2 is only providing a 30% advantage over AV1. For the streaming providers their bandwidth costs are pretty cheap compared to revamping the transcoding infrastructure, so it'd probably only make financial sense once the remove end can do the most complex and quality encoding used and the rest are all simpler.
You can doubt all you like. Ultimately, there's a reason why dav1d includes hand coded SIMD for common platforms.
It's simply impossible to get a compiler to emit something like this [1].
[1] https://github.com/videolan/dav1d/blob/master/src/x86/ipred_...
More importantly, if you can show that your assembly code isn't altering pointers it shouldn't alter, and isn't going out of bounds on its reads, you're most of the way to having assembly in your verified safe code. And rough bounds checking with padding can as cheap as a bitmask.
1. I didn't make that claim.
2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.
Then you didn't read my previous comment correctly. AV2 must be "fast enough" if the designers aren't crazy. And AV2 is 5x slower than AV1. Therefore if compiled code is within a factor of five of hand-written assembly, it's "fast enough" for AV1, and h.264, and probably h.265 too.
You were disagreeing with my claim that other codecs could be "fast enough" with a safe compiler, right? If you weren't disagreeing, I don't know why you challenged me to show you some particular kind of code.
> 2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.
You're saying it's "simply impossible" for a compiler to optimize instructions to a certain level. But anything one person can code, another person can teach a compiler to do in similar situations. I don't need to show you an example, I just need to point you at the Church-Turing thesis and related documents.