Same principle applies to other "looks safe" redactions — pixelation with small block sizes, partial masking of credentials, etc. If you can describe the transform as a linear operation, there is probably a pseudoinverse waiting to undo it.
Well, if you have a large enough averaging window (like is the case with bluring letters) they have constraints (a fixed number of shapes) information for which is partly retained.
Not very different from the information retained in minesweeper games.
Enhance really refers to combining multiple images. (stacking) Each pixel in a low res image was a kernel over the same high res image. So undoing a 100 pixel blur is equivalent to combining 10,000 images for 100x super resolution.
This makes it all seem really too pat. In fact, this probably doesn't get us the original pixel value, because of quantizing deleting information when the blur was applied, which can never be recovered afterwards. We can at best get an approximation of the original value, which is rather obvious given that we can vaguely make out figures in a blurred image already.
> Nevertheless, even with a large averaging window, fine detail — including individual strands of hair — could be recovered and is easy to discern.
The reason for this is that he's demonstrating a box blur. A box blur is roughly equivalent to taking the frequency transform of the image, then multiplying it by a sort of decaying sin wave. This achieves a "blur" in that the lowest frequency is multiplied by 1 and hence is retained, and higher frequencies are attenuated. However, visually we can see that a box blur doesn't look very good, and importantly it doesn't necessarily attenuate the very highest frequencies by much more than far lower frequencies. Hence it isn't surprising that the highest frequencies can be recovered in good fidelity. Compare a gaussian blur, which is usually considered to look better, and whose frequency transform focuses all the attenuation at the highest frequencies. You would be far less able to recover individual strands of hair in an image that was gaussian blurred.
> Remarkably, the information “hidden” in the blurred images survives being saved in a lossy image format.
Remarkable, maybe, but unsurprising if you understand that jpeg operates on basically the same frequency logic as described above. Specifically, it will be further attenuating and quantizing the highest frequencies of the image. Since the box blur has barely attenuated them already, this doesn't affect our ability to recover the image.
If, however, you observe after turbulence has set in, then some of the information has been lost, it's in the entropy now. How much, that depends on the turbulent flow.
Don't miss out on this video by smarter every day
https://youtu.be/j2_dJY_mIys?si=ArMd0C5UzbA8pmzI
Treat the dynamics and time of evolution as your private key, laminar flow is a form of encryption.
I mean knowledge like "a human face, but the potential set of humans is known to the attacker" or even worse "a text, but the font is obvious from the unblurred part of the doc".
Frequency-domain deconvolution is frequency-domain deconvolution, right? It doesn’t really matter what your kernel is.
It's essentially like "cracking" a password when you have its hash and know the hashing algorithm. You don't have to know how to reverse the blur, you just need to know how to do it the normal way, you can then essentially brute force through all possible characters one at a time to see if it looks the same after applying the blur.
Thinking about this, adding randomness to the blurring would likely help.
Or far more simply, just mask the sensitive data with a single color which is impossible to reverse (for rasterized images, this is not a good idea for PDFs which tend to maintain the text "hidden" underneath).
If, however, one just blindly uses the (generalized)inverse of the point-spread function, then you are absolutely correct for the common point-spread functions that we encounter in practice (usually very poorly conditioned).
One way to deal with this is to cut off those frequencies where the signal to noise in that frequency bin is poor. This however requires some knowledge about the spectrum of the noise and signal. Weiner filter uses that knowledge to work out an optimal filter.
https://en.wikipedia.org/wiki/Wiener_deconvolution
If one doesn't know about the statistics of the noise, not about the point-spread function, then it gets harder and you are in the territory of blind deconvolution.
So just a word of warning, if you a relying only on sprinkling a little noise in blurred images to save yourself, you are on very, very dangerous ground.
The reason the filters used in the post are easily reversible is because none of them are binomial (i.e. the discrete equivalent of a gaussian blur). A binomial blur uses the coefficients of a row of Pascal's triangle, and thus is what you get when you repeatedly average each pixel with its neighbor (in 1D).
When you do, the information at the Nyquist frequency is removed entirely, because a signal of the form "-1, +1, -1, +1, ..." ends up blurred _exactly_ into "0, 0, 0, 0...".
All the other blur filters, in particular the moving average, are just poorly conceived. They filter out the middle frequencies the most, not the highest ones. It's equivalent to doing a bandpass filter and then subtracting that from the original image.
Here's an interactive notebook that explains this in the context of time series. One important point is that the "look" that people associate with "scientific data series" is actually an artifact of moving averages. If a proper filter is used, the blurryness of the signal is evident. https://observablehq.com/d/a51954c61a72e1ef
Emphasis mine. Quote from the beginning of the article.
This isn't meant to be a textbook about blurring algorithms. It was supposed to be a demonstration of how what may seem destroyed to a causal viewer is recoverable by a simple process, intended to give the viewer some intuition that maybe blurring isn't such a good information destroyer after all.
Your post kind of comes off like criticizing someone for showing how easy it is to crack a Caesar cipher for not using AES-256. But the whole point was to be accessible, and to introduce the idea that just because it looks unreadable doesn't mean it's not very easy to recover. No, it's not a mistake to be using the Caesar cipher for the initial introduction. Or a dead-simple one-dimensional blurring algorithm.
Other than that, you're not wrong about theoretical Gaussian filters with infinite windows over infinite data, but this has little to do with the scenario in the article. That's about the information that leaks when you have a finite window with a discrete step and start at a well-defined boundary.
FWIW, this does not read as constructive.
If you apply a fake motion blur like in photoshop or after effects then that could probably be reversed pretty well.
You note the pitfall of text remaining behind the redaction in PDFs (and other layered formats), but there are also pitfalls here around alpha channels. There have been several incidents where folks drew not-quite-opaque redaction blocks over their images.
Also not a good idea for masking already compressed images of text, like jpg, because some of the information might bleed out in uncovered areas.
https://en.wikipedia.org/wiki/Wiener_deconvolution
If one blindly inverts the linear blur transform then yes, the reconstruction would usually be a complete unrecognisable mess because the inverse operator is going to dramatically boost the noise as well.
This is how one of the more notorious pedophiles[1] was caught[2].
I've seen my phone camera's real-time viewfinder show text on a sign with one letter different from the real sign. If I wasn't looking at the sign at the same time, I might not have noticed the synthetic replacement.
(My grandmother always told me to "never get old." I wish I followed her advice.)
For instance: https://deepinv.github.io/deepinv/auto_examples/blind-invers...
https://helpx.adobe.com/photoshop/using/reduce-camera-shake-...
Or... from the note at the top, had it? Very strange, features are almost never removed. I really wonder what the architectural reason was here.
Its somewhere here: https://www.microsoft.com/en-us/research/product/computation...
That's relatively easy if you're assuming simple translation and rotation (simple camera movement), as opposed to a squiggle movement or something (e.g. from vibration or being knocked). Because you can simply detect how much sharper the image gets, and hone in on the right values.
Like the JBIG2 algorithm used in a zero click PDF-as-GIF exploit in iMessage a while back: https://projectzero.google/2021/12/a-deep-dive-into-nso-zero...
The vulnerability of that algorithm to character-swapping caused incorrect invoices, incorrect measurements in blueprints, incorrect metering of medicine, etc. https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
Generally when you're dealing with a blurry image you're gonna be able to reduce the strength of the blur up to a point but there's always some amount of information that's impossible to recover. At this point you have two choices, either you leave it a bit blurry and call it a day or you can introduce (hallucinate) information that's not there in the image. Diffusion models generate images by hallucinating information at every stage to have crisp images at the end but in many deblurring applications you prefer to stay faithful to what's actually there and you leave the tiny amount of blur left at the end.
Except the size of the blocked section ofc. E.g If you know it's a person's name, from a fixed list of people, well "Huckleberry" and "Tom" are very different lengths.
JPEG compression can only move information at most 16px away, because it works on 8x8 pixel blocks, on a 2x down-sampled version of the chroma channels of the image (at least the most common form of it does)
[1] Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise, Bansal et al., NeurIPS 2023
If you follow information security discussions on the internet, you might have heard that blurring an image is not a good way of redacting its contents. This is supposedly because blurring algorithms are reversible.
But then, it’s not wrong to scratch your head. Blurring amounts to averaging the underlying pixel values. If you average two numbers, there’s no way of knowing if you’ve started with 1 + 5 or 3 + 3. In both cases, the arithmetic mean is the same and the original information appears to be lost. So, is the advice wrong?
Well, yes and no! There are ways to achieve non-reversible blurring using deterministic algorithms. That said, in other cases, blur filters can preserve far more information than would appear to the naked eye — and do so in a pretty unexpected way. In today’s article, we’ll build a rudimentary blur algorithm and then pick it apart.
If blurring is the same as averaging, then the simplest algorithm we can choose is the moving mean. We take a fixed-size window and replace each pixel value with the arithmetic mean of n pixels in its neighborhood. For n = 5, the process is shown below:
[

Moving average as a simple blur algorithm.
Note that for the first two cells, we don’t have enough pixels in the input buffer. We can use fixed padding, “borrow” some available pixels from outside the selection area, or simply average fewer values near the boundary. Either way, the analysis doesn’t change much.
Let’s assume that we’ve completed the blurring process and no longer have the original pixel values. Can the underlying image be reconstructed? Yes, and it’s simpler than one might expect. We don’t need big words like “deconvolution”, “point spread function”, “kernel”, or any scary-looking math.
We start at the left boundary (x = 0). Recall that we calculated the first blurred pixel like by averaging the following pixels in the original image:
\(blur(0) = {img(-2) \ + \ img(-1) \ + \ img(0) \ +\ img(1)\ +\ img(2) \over 5}\)
Next, let’s have a look at the blurred pixel at x = 1. Its value is the average of:
\(blur(1) = {img(-1)\ +\ img(0)\ +\ img(1)\ +\ img(2)\ +\ img(3) \over 5}\)
We can easily turn these averages into sums by multiplying both sides by the number of averaged elements (5):
\(\begin{align} 5 \cdot blur(0) &= img(-2) + \underline{img(-1) + img(0) + img(1) + img(2)} \\ 5 \cdot blur(1) &= \underline{img(-1) + img(0) + img(1) + img(2)} + img(3) \end{align} \)
Note that the underlined terms repeat in both expressions; this means that if we subtract the expressions from each other, we end up with just:
\(5 \cdot blur(1) - 5 \cdot blur(0) = img(3) - img(-2) \)
The value of img(-2) is known to us: it’s one of the fixed padding pixels used by the algorithm. Let’s shorten it to c. We also know the values of blur(0) and blur(1): these are the blurred pixels that can be found in the output image. This means that we can rearrange the equation to recover the original input pixel corresponding to img(3):
\(img(3) = 5 \cdot (blur(1) - blur(0)) + c\)
We can also apply the same reasoning to the next pixel:
\(img(4) = 5 \cdot (blur(2) - blur(1)) + c\)
At this point, we seemingly hit a wall with our five-pixel average, but the knowledge of img(3) allows us to repeat the same analysis for the blur(5) / blur(6) pair a bit further down the line:
\(\begin{align} 5 \cdot blur(5) &= img(3) + \underline{img(4) + img(5) + img(6) + img(7)} \\ 5 \cdot blur(6) &= \underline{img(4) + img(5) + img(6) + img(7)} + img(8) \\ \\ img(8) &= 5 \cdot (blur(6) - blur(5)) + img(3) \end{align} \)
This nets us another original pixel value, img(8). From the earlier step, we also know the value of img(4), so we can find img(9) in a similar way. This process can continue to successively reconstruct additional pixels, although we end up with some gaps. For example, following the calculations outlined above, we still don’t know the value of img(0) or img(1).
These gaps can be resolved with a second pass that moves in the opposite direction in the image buffer. That said, instead of going down that path, we can also make the math a bit more orderly with a good-faith tweak to the averaging algorithm.
The modification that will make our life easier is to shift the averaging window so that one of its ends is aligned with where the computed value will be stored:
[

Moving average with a right-aligned window.
In this model, the first output value is an average of four fixed padding pixels (c) and one original image pixel; it follows that in the n = 5 scenario, the underlying pixel value can be computed as:
\(img(0) = 5 \cdot blur(0) - 4 \cdot c\)
If we know img(0), we now have all but one of the values that make up blur(1), so we can find img(1):
\(img(1) = 5 \cdot blur(1) - 3 \cdot c - img(0)\)
The process can be continued iteratively, reconstructing the entire image — this time, without any discontinuities and without the need for a second pass.
In the illustration below, the left panel shows a detail of The Birth of Venus by Sandro Botticelli; the right panel is the same image ran through the right-aligned moving average blur algorithm with a 151-pixel averaging window that moves only in the x direction:
[

Venus, x-axis moving average.
Now, let’s take the blurry image and attempt the reconstruction method outlined above — computer, ENHANCE!
[

The Rebirth of Venus.
This is rather impressive. The image is noisier than before as a consequence of 8-bit quantization of the averaged values in the intermediate blurred image. Nevertheless, even with a large averaging window, fine detail — including individual strands of hair — could be recovered and is easy to discern.
The problem with our blur algorithm is that it averages pixel values only in the x axis; this gives the appearance of motion blur or camera shake.
The approach we’ve developed can be extended to a 2D filter with a square-shaped or a cross-shaped averaging window. That said, a more expedient hack is to apply the existing 1D filter in the x axis and then follow with a complementary pass in the y axis. To undo the blur, we’d then perform two recovery passes in the inverse order.
Unfortunately, whether we take the 1D + 1D or the true 2D route, we’ll discover that the combined amount of averaging per pixel causes the underlying values to be quantized so severely that the reconstructed image is overwhelmed by noise unless the blur window is relatively small:
[

Reconstruction from a 1D + 1D moving-average blur (x followed by y).
That said, if we wanted to develop an adversarial blur filter, we could fix the problem by weighting the original pixel a bit more heavily in the calculated mean. For the x-then-y variant, if the averaging window has a size W and the current-pixel bias factor is B, we can write the following formula:
\(blur(n) = {img(n - W) + \ldots + img(n - 1) + B \cdot img(n) \over W + B}\)
This filter still does what it’s supposed to do; here’s the output of an x-then-y blur for W = 200 and B = 30:
[

Venus, heavy X-Y blur.
Surely, there’s no coming back from tha— COMPUTER, ENHANCE!
[

Venus, recovered from a heavy blur.
As a proof of concept for skeptics, we can also make an adversarial filter that operates in two dimensions simultaneously. The following is a reconstruction after a 2D filter with a simple cross-shaped window:
[

Reconstruction from a simultaneous 2D filter (W = 600×600, B = 10).
Remarkably, the information “hidden” in the blurred images survives being saved in a lossy image format. The top row shows images reconstituted from an intermediate image saved as a JPEG at 95%, 85%, and 75% quality settings:
[

Recovery from a JPEG file (1D + 1D filter, W = 200, B = 30).
The bottom row shows less reasonable quality settings of 50% and below; at that point, the reconstructed image begins to resemble abstract art.
For more weird algorithms, click here or here. Thematic catalog of posts on this site can be found on this page.
No posts