Okmain: How to pick an OK main colour of an image

As a designer, I've built variants of this several times throughout my career.

The author's approach is really good, and he hits on pretty much all the problems that arise from more naive approaches. In particular, using a perceptual colorspace, and how the most representative colour may not be the one that appears the most.

However, image processing makes my neck tingle because there are a lot of footguns. PNG bombs, anyone? I feel like any library needs to either be defensively programmed or explicit in its documentation.

The README says "Finding main colors of a reasonably sized image takes about 100ms" -- that's way too slow. I bet the operation takes a few hundred MB of RAM too.

For anyone that uses this, scale down your images substantially first, or only sample every N pixels. Avoid loading the whole thing into memory if possible, unless this handled serially by a job queue of some sort.

You can operate this kind of algorithm much faster and with less RAM usage on a small thumbnail than you would on a large input image. This makes performance concerns less of an issue. And prevents a whole class of OOM DoS vulnerabilities!

As a defensive step, I'd add something like this https://github.com/iamcalledrob/saferimg/blob/master/asset/p... to your test suite and see what happens.

OKPalette by David Aerne is my favorite tool for this, it chooses points sensibly but then also lets you drag around or change the number of colors you want: https://okpalette.color.pizza/

I've wanted something like this for level of detail processing.

This is a render from Second Life, in which all the texture images were shrunk down to one pixel, the lowest possible level of detail, producing a monocolor image. For distant objects, or for objects where the texture is still coming in from the net, there needs to be some default color. The existing system used grey for everything. I tried using an average of all the pixels, and, as the original poster points out, the result looks murky.[1] This new approach has real promise for big-world rendering.

[1] https://media.invisioncic.com/Mseclife/monthly_2023_05/monoc...

I've been doing something similar! I've got a Home Assistant dashboard on my desk and wanted the media controls to match the current album art. I need three colors: background, foreground, and something vibrant to set my desk lamp to [1].

The SpotifyPlus HA integration [2] was near at hand and does a reasonably good job clustering with a version of ColorThief [3] under the hood. It has the same two problems you started with though: muddying when there's lots of gradation, even within a cluster; and no semantic understanding when the cover has something resembling a frame. A bit swapped from okmain's goal, but I can invert with the best of them and will give it a shot next time I fiddle. Thanks for posting!

[1] https://gist.github.com/kristjan/b305b83b0eb4455ee8455be108a... [2] https://github.com/thlucas1/homeassistantcomponent_spotifypl... [3] https://github.com/thlucas1/SpotifyWebApiPython/blob/master/...

It reminds me a bit of this post from the Facebook engineering blog (2015) [1] where they discuss embedding a very tiny preview of images into the html itself so they show immediately while loading the page, especially with very slow connections.

[1] https://engineering.fb.com/2015/08/06/android/the-technology...

I’m surprised the baseline to compare against is shrinking the image to one pixel, that seems extremely hacky and very dependent on what your image editor happens to do (and also seems quite wasteful… the rescaling operation must be doing a lot of extra pointless work keeping track of the position of pixels that are all ultimately going to be collapsed to one point).

So, making a library that provides an alternative is a great service to the world, haha.

An additional feature that might be nice: the most prominent colors seem like they might be a bad pick in some cases, if you want the important part of the image to stand out. Maybe a color that is the close (in the color space) to the edges of your image, but far away (in the color space) from the center of your image could be interesting?

This is nice! I looked into this quite a lot some years back when I was trying to summarize IKEA catalogs using color and eventually wrote an R package if you want to look into an alternative to e.g. k-means: https://github.com/lemonad/colorhull (download https://github.com/lemonad/ikea-colors-through-time/blob/mas... for more details on how it works)

Really interesting read. Thanks for sharing. Is the performance bottleneck around the resizing to 250k pixels? Would it still work if you sampled 15,625 4x4 patches evenly around the image to gather those pixels instead of resizing?

In the past when i tried just using image magick's built in -kmeans for this, i found chosing the second most prominent colour often looked really good. The primary was too much of the same thing.

I’d be interested in trying this out as a command-line tool. It would be useful on its own and the fastest way to evaluate results.

> simple 1x1 resize

How is it "simple"? There are like a ton of different downscaling algorithms and each of them might produce a different result.

Cool article otherwise.

I really like this approach. I worked on this problem (create a nice background for an image) for a couple weeks many years ago while organizing my desktop wallpaper collection, and never came up with a good answer. Unfortunately, I think that it's been "solved" in the tiktok era; an enlarged and blurred version of the image is used to fill the background space.

The blurred mirror is inoffensive to almost everyone, and yet it always strikes me as gauche. Easy to ignore and yet I feel that it adds a lot of useless visual noise.

I’d be interested in trying this out as a command-line tool. It would be useful on its own and the fastest way to evaluate results.

ImageMagick is a wonderful command line tool, IMO. You could use it to extract various information, e.g. the 5 most used colors of an image, as in

    convert $IMG -colors 5 -depth 8 -format "%c" histogram:info: | sort -nr

If needed you can easily remove colored borders first (trim subcommand with fuzz option) or sample only xy% from the image's center, or where the main subject might be.

looks like it's a rust lib with a python wrapper. making a CLI tool should be just a few lines of code.

Good idea, I'll add a CLI tool over the weekend.

looks like it's a rust lib with a python wrapper. making a CLI tool should be just a few lines of code.

Yeah, but then I’d have to be working with Python (which I don’t enjoy) and be pulling in dependencies (which I avoid) to have a custom system with moving parts (Python interpreter, library, script) (which I don’t want).

A rust CLI would make a lot of sense here. Single binary.

OKPalette by David Aerne is my favorite tool for this, it chooses points sensibly but then also lets you drag around or change the number of colors you want: https://okpalette.color.pizza/

I've wanted something like this for level of detail processing.

[1] https://media.invisioncic.com/Mseclife/monthly_2023_05/monoc...

In the past when i tried just using image magick's built in -kmeans for this, i found chosing the second most prominent colour often looked really good. The primary was too much of the same thing.

The blurred mirror is inoffensive to almost everyone, and yet it always strikes me as gauche. Easy to ignore and yet I feel that it adds a lot of useless visual noise.

As a designer, I've built variants of this several times throughout my career.

The README says "Finding main colors of a reasonably sized image takes about 100ms" -- that's way too slow. I bet the operation takes a few hundred MB of RAM too.

As a defensive step, I'd add something like this https://github.com/iamcalledrob/saferimg/blob/master/asset/p... to your test suite and see what happens.

I really wish people would read the article, the library does exactly this:

> Okmain downsamples the image by a power of two until the total number of pixels is below 250,000.

your gh link returned 404

EDIT: then (when url refreshed) triggered a redir loop culminating in a different error ("problem occurred repeatedly")...

ah, ofc, your intent was to demonstrate a problematic asset.

> I've built variants of this several times throughout my career.

Got any to share? A self-contained command-line tool to get a good palette from an image is something I’d have a use for.

Author here: the library just accepts RGB8 bitmaps, probably coming either from Rust's image crate [1] or Python's Pillow [2], which are both mature and widely used. Dealing with codecs is way out of scope.

As for loading into memory at once: I suppose I could integrate with something like libvips and stream strips out of the decoded image without holding the entire bitmap, but that'd require substantially more glue and complexity. The current approach works fine for extracting dominant colours once to save in a database.

You're right that pre-resizing the images makes everything faster, but keep in mind that k-means still requires a pretty nontrivial amount of computation.

[1]: https://crates.io/crates/image

[2]: https://pypi.org/project/pillow/

[1] https://engineering.fb.com/2015/08/06/android/the-technology...

Good idea, I'll add a CLI tool over the weekend.

ImageMagick is a wonderful command line tool, IMO. You could use it to extract various information, e.g. the 5 most used colors of an image, as in

    convert $IMG -colors 5 -depth 8 -format "%c" histogram:info: | sort -nr

If needed you can easily remove colored borders first (trim subcommand with fuzz option) or sample only xy% from the image's center, or where the main subject might be.

> simple 1x1 resize

How is it "simple"? There are like a ton of different downscaling algorithms and each of them might produce a different result.

Cool article otherwise.

At 1x1 I don't expect any difference. It would be the average of all pixels in the image if you don't unevenly weight them (which you might decide when choosing a main color, but no downscaling algorithm would do) and the only difference is whether you remembered to gamma-correct.

So, making a library that provides an alternative is a great service to the world, haha.

Tbh shrinking the image is probably the cheapest operation you can do that still lets every pixel influence the result. It’s just the average of all pixels, after suitable color conversion.

I really wish people would read the article, the library does exactly this:

> Okmain downsamples the image by a power of two until the total number of pixels is below 250,000.

Somehow I missed that, oops. I see that the library samples a maximum of 250K pixels from the input buffer (I jumped over to the project readme)

That being said, this is sampling the fixed-size input buffer for the purposes of determining the right colour. You still have to load the bitmap into memory, with all the associated footguns that arise there. The library just isn't making it worse :) I suppose you could memmap it.

Makes me wonder if the sub-sampling is actually a bit of a red herring, as ideally you'd want to be operating on a small input buffer anyway. Or some sort of interface on top of the raw pixel data, so you can load what's needed on-demand.

That's 500x500, I'm sure you can get good results at 32x32 or 64x64 but then part of your color choice is also getting done by the downsampling algorithm. I wonder if you could get away with just using a downsampling algorithm into a 1x1 and just use that as the main color.

> I've built variants of this several times throughout my career.

Got any to share? A self-contained command-line tool to get a good palette from an image is something I’d have a use for.

your gh link returned 404

EDIT: then (when url refreshed) triggered a redir loop culminating in a different error ("problem occurred repeatedly")...

ah, ofc, your intent was to demonstrate a problematic asset.

Fred's dominantcolor script for imagemagick might work for you:

https://www.fmwconcepts.com/imagemagick/dominantcolor/index....

Back in the late 1980s people thought about color quantization a lot because a lot of computers of the time had 16 or 256 colors you could choose out of a larger palette and if you chose well you could do pretty well with photographic images.

Realizing I intentionally opened a png bomb made me chuckle, like what did I think was going to happen?

You're right that pre-resizing the images makes everything faster, but keep in mind that k-means still requires a pretty nontrivial amount of computation.

[1]: https://crates.io/crates/image

[2]: https://pypi.org/project/pillow/

If you ever did want to wrap this in code processing untrusted images there's a library called "glycin" designed for that purpose (it's used by Loupe, the default Gnome image viewer).

https://gnome.pages.gitlab.gnome.org/glycin/

A rust CLI would make a lot of sense here. Single binary.

  > uvx --with pillow --with okmain python -c "from PIL import Image; import okmain; print(okmain.colors(Image.open('bluemarble.jpg')))"
  [RGB(r=79, g=87, b=120), RGB(r=27, g=33, b=66), RGB(r=152, g=155, b=175), RGB(r=0, g=0, b=0)]

It would make sense to add an entrypoint in the pyproject.toml so you can use uvx okmain directly.

This sounds like a job for <ta-ta-ta-taaaa> contrib-directory-man!

Nearest-neighbor interpolation may pick just one pixel closest to the center.

Tbh shrinking the image is probably the cheapest operation you can do that still lets every pixel influence the result. It’s just the average of all pixels, after suitable color conversion.

It might work decently well, but I wonder if it makes it "visually" match - sometimes the perfect average is not what our eyes see as the color.

The author of the article seems to assume there is no color conversion (e.g., the resizing of the image is done with sRGB-encoded values rather than converting them to linear first). Which is a stupid way to do it but I'd believe most handwritten routines are just that.

Somehow I missed that, oops. I see that the library samples a maximum of 250K pixels from the input buffer (I jumped over to the project readme)

That last one is talked about in the article -- it sucks!

I think if you were going to "downsample" for the purpose of creating a color set you could just scan through the picture and randomly select 10% (or whatever) of the pixels and apply k-means to that and not do any averaging which costs resources and makes your colors muddy.

Fred's dominantcolor script for imagemagick might work for you:

https://www.fmwconcepts.com/imagemagick/dominantcolor/index....

Realizing I intentionally opened a png bomb made me chuckle, like what did I think was going to happen?

If you ever did want to wrap this in code processing untrusted images there's a library called "glycin" designed for that purpose (it's used by Loupe, the default Gnome image viewer).

https://gnome.pages.gitlab.gnome.org/glycin/

This sounds like a job for <ta-ta-ta-taaaa> contrib-directory-man!

So your solution to “I’d be interested in having a small ready-made tool and try this out” is “spend a bunch of time to get acquainted with the code base of something you may not even like, create a separate tool, and submit it without even knowing if it’ll be accepted”?

That’s like having someone looking at a display of ice cream in a supermarket saying “I’d be interested in trying a few samples before committing” and then getting a reply like “here are the recipes for all the ice creams, you can try to make them at home and taste them for yourself”.

I know I could theoretically spend my weekend working on a CLI tool for this or making ice cream. Every developer knows that, there’s no reason to point that out except snark. But you know who might do it even faster and better and perhaps even enjoy it? The author.

Look, the maintainer owes me nothing. I owe them nothing. This project has been shared to HN by the author and I’m making a simple, sensible, and sensical suggestion for something which I would like to see and believe would be an improvement overall, and I explained why. The author is free to agree or disagree, reply or ignore. Every one of those options is fine.

  > uvx --with pillow --with okmain python -c "from PIL import Image; import okmain; print(okmain.colors(Image.open('bluemarble.jpg')))"
  [RGB(r=79, g=87, b=120), RGB(r=27, g=33, b=66), RGB(r=152, g=155, b=175), RGB(r=0, g=0, b=0)]

It would make sense to add an entrypoint in the pyproject.toml so you can use uvx okmain directly.

Nearest-neighbor interpolation may pick just one pixel closest to the center.

It might work decently well, but I wonder if it makes it "visually" match - sometimes the perfect average is not what our eyes see as the color.

That last one is talked about in the article -- it sucks!

Random sampling makes a lot of intuitive sense, but unfortunately doesn't work well. I just answered over at lobsters: https://lobste.rs/s/t43mh5/okmain_you_have_image_you_want_co...

I should probably add this nuance to the post itself.

Edit: added a footnote

You could likely have written that tool in the time that it took to write that comment.

If it would be useful to you, and to others, then why not?

And if it is not accepted then publish it yourself. Stop your whining.

You’re not wrong, but you probably could have built the thing with Claude in the time it took you to write this comment.

Random sampling makes a lot of intuitive sense, but unfortunately doesn't work well. I just answered over at lobsters: https://lobste.rs/s/t43mh5/okmain_you_have_image_you_want_co...

I should probably add this nuance to the post itself.

Edit: added a footnote

How about sampling more from the more "prominent" areas of the image[0] and less from less "prominent" areas?

[0]: https://dgroshev.com/blog/okmain/img/distance_mask.png?hash=...

You could likely have written that tool in the time that it took to write that comment.

If it would be useful to you, and to others, then why not?

And if it is not accepted then publish it yourself. Stop your whining.

You’re not wrong, but you probably could have built the thing with Claude in the time it took you to write this comment.

How about sampling more from the more "prominent" areas of the image[0] and less from less "prominent" areas?

[0]: https://dgroshev.com/blog/okmain/img/distance_mask.png?hash=...

Your app has a card with an image. You want the back of the card to be a solid colour that is somewhat representative of the image and also visually pleasant. How would you do that?

A company I consult for did that by resizing the entire image to 1x1 (a single pixel) and using the colour of the pixel. This is a super popular approach! However, the colours were often dull and muddy even when the original image had vivid colours. It irked me, so I spent a weekend searching for prior art and trying a few tricks to do better. Then, I wrote a library. Inspired by Oklab's naming, it's called Okmain because it looks for an OK main colour:

Here are the tricks I came up with:

colour clustering
Oklab colour calculations
chroma + position cluster sorting

The rest is just implementing the tricks in Rust, writing a Python wrapper, making everything fast and robust, writing documentation, releasing to crates.io and PyPI, and writing this blogpost. Easy1!

Colour clustering

Most images have multiple clusters of colours, so simply averaging all colours into one doesn't work well. Take this image: while the green of the field and the blue of the sky are beautiful colours, simply averaging the colours produces a much less exciting colour (source):

Instead, we can find groups of similar colours and average inside the group. K-means is a well-known algorithm for exactly that. We can run it on all pixels, clustering their colours and ignoring the pixel positions (for now).

For Okmain, I decided to only allow up to four clusters. In my testing, it was enough for decent quality, and limiting the number was handy to make clustering more performant. We will come to that later.

Here's how the image looks after colour clustering (including the extracted groups as swatches):

Note, however, that not all images have four meaningfully distinct clusters. Picking the number of clusters in a general case is a non-trivial problem. However, with just four colours it's simple enough to check if all clusters have different enough colours and re-run with fewer clusters if some clusters are too similar. Here's an example of an image with three distinct clusters (source):

Oklab

Another reason for the muddiness is the resizing library operating directly on sRGB colours.

In either clustering or resizing, colours need to be averaged. In a naïve implementation, this is done in the same colour space the image is in, which is most likely to be sRGB: red, green, and blue subpixel values with gamma correction applied. This is not ideal for two reasons.

First, gamma correction is non-linear, and applying linear operations over the correction leads to incorrect results.

Second, perceived colour intensity is also non-linear, which is why a sweep through all colours without correcting for perceptual differences produces vertical strips in the gradient (source):

To solve both problems at once, Okmain operates in the Oklab colour space.

The result of averaging colours in Oklab is smoother mixing with fewer muddy browns:

Here, a pixel at (X, Y) is a mix of colour X (from the top gradient) with colour Y (from the left gradient). In the top right triangle, I'm mixing the two colours in sRGB. In the bottom left, the colours are first transformed to Oklab, mixed, and then transformed back to sRGB. The sRGB triangle is visibly less smooth, with too many muddy browns in green+yellow and blue+orange areas.

This over-representation is what skews sRGB-averaged "main colours" towards unattractive, dirty-looking colours.

Cluster sorting

After colours are clustered in Oklab, the clusters need to be sorted by their visual prominence. After all, the user likely wants the more prominent, dominant colour and not just four colours with no idea which one is more prominent.

I came up with three heuristics for how prominent a cluster is:

how many pixels are in the cluster?
how central the pixels are?
how visually prominent the colour is in itself?

Okmain combines the first two heuristics into one and calculates the number of pixels per cluster, discounting pixels that are closer to the periphery using a mask that looks like this (by default):

Intuitively, pixels that are closer to the centre of the image are more prominent, but only to an extent. If a pixel is central enough, it doesn't matter where it is exactly.

This weighting ensures that the most prominent colour on an image like this is the foreground green and not the background grey. The swatches are sorted top-down, the most prominent at the top (source):

Finally, Okmain tries to guess how visually prominent a particular colour is. This is tricky because prominence depends on how much a colour contrasts with other colours. However, using Oklab chroma (saturation) as a proxy for prominence seems to help on my test set, so it's now a factor in Okmain.

Performance

I wanted Okmain to not just produce nice colours but also be reasonably fast, ideally comparable to a simple 1x1 resize. I spent some time optimising it.

The simplest optimisation is to reduce the amount of data. Okmain downsamples the image by a power of two until the total number of pixels is below 250,000, simply averaging pixel values in Oklab. This also helps to remove noise and "invisible colours": on a photo of an old painting, paint cracks can create their own colour cluster, but ideally they should be ignored.

The downsampling is also an opportunity to de-interleave the pixels from an RGBRGBRGB… array into a structure-of-arrays (three separate arrays of L, a, and b floats), which helps to make a lot of downstream code trivially auto-vectorisable. Having a low fixed number of clusters that fits into a SIMD register (f32x4) seems to help, too.

One of the biggest hurdles for auto-vectorisation is Rust's insistence on correct floating point math. This is a great default, but it's impossible to opt out on stable Rust yet.

Another complexity is runtime dispatch based on the available instruction set. Rust defaults to a very conservative SIMD instruction set (SSE2), and it's a correct solution for a library like Okmain. However, AVX2 seems to help even for auto-vectorisation, so eventually I'll just add a dispatch with something like target_feature_dispatch. For now, Okmain is fast enough, extracting dominant colours from multi-megapixel images in around 100ms.

Initially, I implemented mini-batch k-means clustering, but for this particular case it proved slower after accounting for the sampling step. The entire dataset is small enough to fit into the cache, so going through the full dataset is quicker than having to first extract a mini-batch with unpredictable branching, even if the mini-batch itself is much smaller. K-means++ initialisation, on the other hand, helps a lot, despite the upfront cost of picking good starting points.

A tangent on LLMs

I was curious how LLM agents would work on this project. It felt like a good fit for agentic development: a small, well-constrained problem, greenfield development, and a lot of pre-existing data in the training set since k-means is a very popular algorithm. Armed with Opus (4.5 and 4.6) and sprites.dev for sandboxed, accept-everything autonomous development, I tried to retrace Mitchell Hashimoto's steps. The results are mixed, but I learned a lot.

With a good explanation and the planning mode, the very first version was ready really quickly. Unfortunately, it was subtly wrong in several places, and the code was awkward and hard to read. Additionally, I tried to make the code autovectorisation-friendly, and Opus seems confidently wrong about autovectorisation more often than it's right. Closing the loop with cargo asm helped, but the loop ate tokens frighteningly fast, and Opus was still struggling to be both idiomatic and verifiably vectorised.

After a few evenings and many tokens of trying to make Opus write as cleanly as I wanted, I gave up and rewrote the most crucial parts from scratch. In my opinion, the manual rewrite is cleaner and clearer, and this is a part where readability matters, since it's the hottest part of the library.

It seems that even frontier LLMs are struggling with intentful abstraction. LLMs split things out in the most mechanical way possible, instead of trying to communicate the intent with how things are split.

On the other hand, with the core API settled, Opus saved me a lot of time working autonomously on "debug" binaries that are easy to read through and don't need to be developed any further. I suppose that's exactly what Mitchell meant by "outsourcing slam dunks" — this works very well.

Throughout this experience, Sprites' stability was a thorn in my side. The UX and the idea are great when it works, but I had my sprite slow down to a crawl every few days. Once it went completely down and was unconnectable for most of the day. I hope fly.io folks make Sprites more stable. It's a super convenient way to run agents.

Good for now

I'm pretty satisfied with how this project turned out. You all got a decent library, and I learned more about k-means, SIMD, releasing mixed Python/Rust libraries, productive greenfield LLM use, and general performance.

Now go and extract all the main colours!

P.S. / Shameless plug: are you a manager at Apple London? Let's talk.

The non-code bits took about 70% of the time 😔 ↩

Hacker Times