My GameBoy emulator generates one "audio sample" per clock tick (which is ~1 mhz, so massive 'oversampling'), decimates that signal down to like 100 ksample/sec, then uses a low-pass biquad filter or two to go down to 16 bit / 48 khz and remove beyond-Nyquist frequencies. Doesn't have any of the "muffling" properties this guy is seeing, aside from those literally caused by the low-pass.
Audio was the thing I could never figure out on my Gameboy emulator. I couldn’t get it to pass basic tests, even without bothering to output sound on the computer.
Besides, I personally prefer to play my vgm at the original sample rate, and my soundcard adjusts to the correct rate for each song through fb2k plugins.
I’ll add some context here—why don’t more games run their audio at 32768 Hz, if that’s such a natural rate to run audio? The answer lies in how you fill the buffers. In any modern, sensible audio system, you can check how much space is available in the audio buffer and simply fill it. The GBA lacks a mechanism to query this. Instead, what you do is calculate this yourself, and figure out when to trigger additional audio DMA from the VBlank interrupt. You know the VBlank runs every 280896 cycles, and you know that the processor runs at 16777216 Hz, so you can do some math to calculate how much data is remaining in the audio DMA stream.
A lot of games simplify the math—it’s easier to start a new audio DMA in your VBlank handler, but that means running at a lower sample rate, which will sound pretty crispy.
YMMV, some people like the crispy aliased audio. If the audio weren’t crispy, the sound designers probably would have adjusted the samples to compensate. Other factors being equal, I’d rather listen to what the original artists heard when they were testing on real hardware, because that is probably closer to what they intended, even though it has a lot of artifacts in it.
But this is a bit like those who use smoothing filters. It's ultimately about taste, but it should be recognized that unless the filter is attempting to accurately recreate the original hardware of the era then the original design intent is not being adhered to, and so something may be lost in the "enhancement".
It's basically doing an accidental and low-quality form of spectral band replication: https://en.wikipedia.org/wiki/Spectral_band_replication which is used in modern codecs.
I've written some code to play back 8-bit samples (and indeed to wavetable, FM, and VA synthesis) on 8-bit Arduinos using the PWM to output 8-bit audio. That runs at 31373Hz which is a pretty crazy sample rate.
Why?
Because the chip is clocked at 16MHz, and if you program the PWM for no prescaler and "phase correct" PWM where it counts up and back down, so you get a widening pulse in the middle of a "burst", then it counts 510 "steps" of the counter. It's an 8-bit counter so it counts from 0 to 255, then the next step counts back down to 254, and so to 0 again, when the next step takes it to 1.
And 16000000/510 is 31372.55 ;-)
In the mid-1980s the first really affordable sampler was the Ensoniq Mirage, which used the Bob Yannes-designed ES5503 DOC (Digital Oscillator Chip) to generate its waveforms. It played back 8-bit samples and used a fairly simple phase accumulator that didn't do any form of interpolation (I don't count "leftmost neighbour" as interpolation). Particularly when you pitch it down, you get a rough, clanky, gritty "whine" to samples, that the analogue filters didn't necessarily do a lot to remove.
Later on they released the EPS which had 13-bit sampling. Why 13-bit? I don't know, I guess because the Emulator I and II used 8-bit samples but μ-law coding, giving effectively 13-bit equivalent resolution. It also used linear interpolation to smooth the "jumps" between samples, and even if you loaded in and converted a Mirage disk the "graininess" when you pitched things down was gone.
I'm currently writing some code to play back Mirage samples from disk images, and I've actually added a linear interpolator to it. Some things sound better with it, some things sound worse. I think I'll make it a front panel control, so you can turn it on and off as you want.
I don't think so, I think you're just getting a high end that isn't in the original audio. In the places where there are high frequencies the aliasing and the hiss just gets in the way.
that drives emotional energy
Seems like a hyperbolic rationalization.
This post describes an audio enhancement that a Game Boy Advance emulator can implement to reduce audio aliasing and noise, at a fairly high level.
To start with, here’s a comparison from Metroid: Zero Mission as an example of what this can do:
Metroid: Zero Mission - Brinstar (Accurate interpolation)
Metroid: Zero Mission - Brinstar (Enhanced interpolation)
Much cleaner! The second recording does sound a little more muffled, but I’ll take that over the horrible audio aliasing in the first recording.
There are some alternative approaches to improving GBA audio that produce higher-quality results, such as NanoBoyAdvance’s excellent MP2K HQ feature, but the interpolation approach is notable in that it works with any GBA game (though quality can vary by game). MP2K HQ for example only works with games that use the MP2K audio driver (aka M4A aka Sappy), which is many games but not every game.
This approach is not particularly novel - VBA-M has supported enhanced audio interpolation for a very long time. I believe the implementation details are a bit different though.
The previous post goes into more detail on how the GBA audio hardware works, but to summarize the part that’s most relevant for improving audio interpolation:
The GBA audio hardware outputs the final mixed audio samples using PWM at 1 of 4 possible sampling frequencies ranging from 32768 Hz to 262144 Hz. The vast majority of GBA games use 65536 Hz, though occasionally you’ll see 32768 Hz (e.g. Castlevania: Circle of the Moon).
The GBA resamples from each audio channel’s frequency to the PWM sampling frequency by applying nearest neighbor interpolation, i.e. just outputting the channel’s current sample. This causes extremely noticeable audio aliasing in the final audio output, particularly when games use very low sample rates with the 2 PCM channels, which many games unfortunately do - sample rates in the 10000-14000 Hz range are very common (e.g. the Metroid example above is 13379 Hz).
The core idea behind enhancing interpolation is fairly simple: what if, instead of accurately emulating how the GBA PWM hardware works, the emulator uses its own interpolation algorithm to resample from audio channels’ sample rates directly to the emulator’s audio output sample rate?
The first step is figuring out the source sample rate to resample from.
As mentioned in the previous post, you can compute a PCM channel’s sample rate using the clock divider and counter reload value of the GBA timer that it’s tracking:
| |
The sample rate can be a fractional number, though the final clock divider value ((0x10000 - reload) * divider) is always a positive integer.
Games generally play most of their PCM audio at the same sample rate, but you will occasionally see games use different sample rates for different songs (e.g. Castlevania: Circle of the Moon again, it uses a much higher sample rate for its main menu music).
You need to recalculate the channels’ sample rates (or at least check if they’ve changed) whenever any of the following occurs:
When a PCM channel is tracking a disabled timer, it constantly outputs the last sample that it popped from its FIFO. While not completely accurate, you could emulate this as the channel being muted. This will cause problems if a game frequently disables and re-enables its audio timer but that is extremely uncommon. (I wish I could say I’ve never seen it, but…Driver 2 Advance.)
I’ve never seen a game use timer 1 in cascading mode as an audio timer, but the hardware theoretically supports it. You could calculate the effective sample rate like so:
| |
However, in my own implementation I just fall back to nearest neighbor resampling when a channel is tracking timer 1 in cascading mode (for that channel only, not for everything). I haven’t seen a game do this and I can’t think of a reason why a game would want to do this.
Once you know each PCM channel’s source sample rate, you then need to resample from that to your output sample rate (e.g. 48000 Hz). You can ignore the PWM sampling frequency; if you’re not going for accurate emulation, there’s no reason to resample to the PWM sampling frequency as an intermediate step when you can resample directly to the output sample rate.
At this point this is no longer a GBA-specific problem, so you can plug in whatever resampling algorithm you want (or use a library), and then send it an input sample each time the PCM channel pops from its sample FIFO. I use my own resampling implementation so that I can control exactly how interpolation is performed. (Maybe a topic for a future post…)
You’ll need to separately resample the PSG channels down to your output sample rate. You could do nearest neighbor, which will sound somewhat similar to actual hardware, though not exactly the same since actual hardware nearest neighbor resamples to the PWM sampling frequency (usually 65536 Hz).
For higher quality PSG resampling you could do the same thing you can do in a GB/GBC emulator: sample all 4 PSG channels at 2097152 Hz (every 8 GBA CPU cycles), mix them at 2097152 Hz, then interpolate down from 2097152 Hz to your output sample rate. This works because every possible PSG channel frequency is an even divisor of 2097152 Hz, and due to how PSG audio generation works you can safely assume that samples are infinitely repeated in between sample changes. (That’s not really true in physical reality, but it’s true if you’re sampling at 2097152 Hz.) Whatever interpolation you perform will need to involve a low-pass filter to avoid audio aliasing.
For final mixing, you probably want to leave everything as signed samples and completely ignore the GBA sound bias functionality. There’s also no reason to truncate the lowest bits as you’d do with accurate emulation, or even to round to the nearest integer; the resampling process will give you floating-point output samples, so you can just leave them that way and do floating-point mixing. If you want you can even scale the samples down to make clipping impossible (unlike on actual hardware), though that will make the audio output very quiet most of the time.
In my enhanced resampling implementation I support two different interpolation algorithms: 6-point cubic Hermite interpolation and windowed sinc interpolation.
Cubic Hermite is implemented exactly as described in a previous post on Sega CD, originally from here:
| |
This algorithm works pretty well for upsampling, where your output sample rate is higher than the source sample rate. This is almost always the case for the GBA PCM channels, with the notable exception of Golden Sun: The Lost Age with its exceptionally high 63072 Hz sample rate. It does not work as well for downsampling unless you apply a low-pass filter to the source signal before interpolating; otherwise you’ll introduce some audio aliasing.
Sinc interpolation, or band-limited interpolation, is extremely high-quality but much more complex. Two approaches to implementing that:
Note that with the low sample rates common on GBA, band-limited interpolation doesn’t necessarily produce a more pleasant sound than simpler interpolation algorithms! Yes, it’s time for some comparisons.
First, Brinstar again, but this time also with a windowed sinc version - the example at the top used cubic Hermite. For reference, this game uses a sample rate of ~13379 Hz:
Metroid: Zero Mission - Brinstar (Accurate resampling)
Metroid: Zero Mission - Brinstar (Cubic Hermite interpolation)
Metroid: Zero Mission - Brinstar (Windowed sinc interpolation)
The windowed sinc version sounds less aliased but really muffled. I personally prefer the sound of the cubic version.
In general, sinc interpolation does a much better job at eliminating audio aliasing and noise than cubic interpolation does, but the complete removal of wave frequencies above the source signal’s Nyquist frequency (~6689 Hz here) muffles the sound. This is technically more accurate as far as resampling the original 13379 Hz signal, but in this case at least I don’t think it sounds better.
Here’s another 13379 Hz example, this one from Fire Emblem: The Blazing Blade:
Fire Emblem - Strike (Accurate resampling)
Fire Emblem - Strike (Cubic Hermite interpolation)
Fire Emblem - Strike (Windowed sinc interpolation)
I also prefer the cubic version for this one, again because the sinc version sounds very muffled.
Windowed sinc doesn’t always sound like that though! Here’s a song from Mega Man Zero 4, sample rate ~21024 Hz:
Mega Man Zero 4 - Esperanto (Accurate resampling)
Mega Man Zero 4 - Esperanto (Cubic Hermite interpolation)
Mega Man Zero 4 - Esperanto (Windowed sinc interpolation)
All three versions of this have noticeable static/hissing (thank you 8-bit sample quantization), but the windowed sinc version definitely has the least, and unlike the Metroid and Fire Emblem examples it doesn’t sound significantly more muffled than the cubic version.
Here’s an interesting example from Castlevania: Circle of the Moon, the aforementioned main menu music. It uses a PCM sample rate of 42048 Hz here but with the PWM sampling frequency set to 32768 Hz, which sounds fairly awful with accurate emulation:
Castlevania: Circle of the Moon - Main Menu (Accurate resampling)
Castlevania: Circle of the Moon - Main Menu (Cubic Hermite interpolation)
Castlevania: Circle of the Moon - Main Menu (Windowed sinc interpolation)
(If you’ve played Rondo of Blood, yes, it’s the same song.)
Finally, an example from Golden Sun: The Lost Age with its crazy high 63072 Hz sample rate. This song has really noticeable audio noise despite the high sample rate and the game’s high-quality audio mixing code:
Golden Sun: The Lost Age - Gloomy Caves (Accurate resampling)
Golden Sun: The Lost Age - Gloomy Caves (Cubic Hermite interpolation)
Golden Sun: The Lost Age - Gloomy Caves (Windowed sinc interpolation)
It’s not as noticeable as in the other examples, but sinc interpolation does remove some of the audio noise! Cubic interpolation doesn’t fare quite as well here, though the difference isn’t massive.
Here’s a song from Castlevania: Aria of Sorrow that uses both the PCM channels and the PSG channels:
Castlevania: Aria of Sorrow - Castle Corridor (Accurate resampling)
Castlevania: Aria of Sorrow - Castle Corridor (Cubic Hermite interpolation)
Castlevania: Aria of Sorrow - Castle Corridor (Windowed sinc interpolation)
I picked out Aria of Sorrow here because it uses a PCM sample rate of ~10512 Hz, very low.
Higher-quality interpolation removes most of the high-frequency aliasing from the PCM channel output, but the PSG output still accurately contains high-frequency wave components. This makes the PSG channels stand out much more than they’re supposed to in music that uses both the PCM and PSG channels, particularly at low PCM sample rates.
One possible solution to this is to try removing the high-frequency waves from the PSG output, i.e. a low-pass filter.
Here’s what happens after applying a low-pass filter to all PSG output with a cutoff frequency of 5256 Hz, right at the Nyquist frequency of sample rate 10512 Hz:
Castlevania: Aria of Sorrow - Castle Corridor (Cubic Hermite interpolation + PSG LPF)
Castlevania: Aria of Sorrow - Castle Corridor (Windowed sinc interpolation + PSG LPF)
I think that sounds a lot better than not low-pass filtering the PSG, and better than bluntly reducing PSG volume relative to PCM (I tried that, didn’t work well). The windowed sinc version could maybe benefit from using a low-pass filter with either a lower cutoff frequency or a steeper post-cutoff attenuation slope, but it still sounds a lot better than the version with unfiltered PSG I think.
This doesn’t work quite as well when games do the reverse, with the PSG leading the music while the PCM channels play secondary instruments (this one is also 10512 Hz):
Mega Man Battle Network 2 - Title Screen (Accurate resampling)
Mega Man Battle Network 2 - Title Screen (Cubic Hermite interpolation)
Mega Man Battle Network 2 - Title Screen (Cubic Hermite interpolation + PSG LPF)
This particular song has an additional issue where higher-quality PSG resampling makes the noise channel sound much quieter than it does on actual hardware (because it’s playing at an ultrasonic frequency), but even aside from that, I don’t think the interpolated version sounds better here.
In my implementation I dynamically set the low-pass cutoff frequency to (0.5 * pcm_sample_rate), designing a new 2nd-order Butterworth filter each time the sample rate changes. This doesn’t do a perfect job, but it works well enough for the task of moderately attenuating high-frequency PSG output.
I apply the filter after resampling from 2097152 Hz to 48000 Hz, both because it attenuates much more sharply (a mere 2nd-order Butterworth filter does not work well at very high source frequencies) and because it’s significantly less computationally expensive.
I think this approach works pretty well for cleaning up GBA audio in real time, though the performance impact can be pretty significant (mainly because of the PSG downsampling). It can’t completely eliminate audio noise (static/hissing sound) due to still receiving noisy 8-bit samples as input, but it significantly reduces audio aliasing and it does still remove some noise.