Why Does JPEG Look So Weird?

In the second or so before this page loaded, your device was downloading everything it needed like the background colour, all the icons, and this text you’re currently reading.

Oh, and this delightful picture of a macaw –

A digital image is really just a grid of tiny dots (‘pixels’). Each is a combination of red light (R), green light (G) and blue light (B) which can combine to create any colour imaginable.

This image of a macaw is 1920 pixels wide and 1280 pixels tall, which comes to a grand total of 2,457,600 pixels. But, if you were to actually look at what your device just downloaded, it received less than 4% of the data it should have needed for those ~2.5 million pixels.

This is all due to the wonderful world of compression.

Without it, listening to an hour of music on Spotify would use over a gigabyte of data, and watching a movie on Netflix would use over a terabyte (1000 gigabytes).

In the case of the macaw, it has been compressed using JPEG, an extremely widespread image compression format.

JPEG combines the biology of the human eye with some computational trickery to throw away as much extraneous data from the image as possible. It then stuffs the bare essentials of the image into a tiny package to get its remarkable 96% reductions and beyond.

Here are a few of the ideas that make JPEG tick.

Representations of colour

Our eyes perceive the world in red, green and blue with photoreceptive cones on the retina. Each is tuned to one of these three colours, so it makes sense for computers to display pixels in a similar way.

In fact, it is often easier to think of a colourful picture as three entirely distinct images (known as ‘channels’), each targeting a different type of cone in our eye.

For instance, the green channel has been ‘turned off’ entirely here, but you can also toggle the other colours on or off to see how they mix.

Toggle the red (R), green (G) and blue (B) channels of the image

However, this does not mean we have to store the image in red, green and blue as well.

Instead, we can represent each colour using somewhat perculiar channels called luminance, blue-difference chroma, and red-difference chroma.

Toggle the luminance (Y), blue-difference chroma (C_b), and red-difference chroma (C_r) channels of the image

But why bother?

We don’t gain or lose any detail in the picture by splitting up into a different set of channels.

However, you may notice that the luminance channel, which represents the image’s brightness, has the largest impact on the details of the picture by far. This makes sense, as shadows, lighting and texture in the real world rarely change a material’s true colour – only how bright it seems to your eye.

Our eyes are also especially attuned to changes in brightness through evolution – it has helped us find camouflaged prey and spot the shadows of predators in our periphery for thousands of years.

With this in mind, JPEG prioritises brightness details far above changes in colour.

For instance, instead of assigning every pixel its own colour, it can assign an entire region a single hue with only its brightness ever changing, saving space. Or, it can reduce the number of displayable colours from its usual number - over 16 million distinct colours - down to only a few thousand possibilities, which requires much less space to store.

As long as the luminance channel stays intact, we will always have a fairly good idea what we are looking at.

Getting help from the neighbours

When we look at the world, we don’t see pixels and colour channels – we see contiguous chunks of colour and shape. We see the smooth gradients of blue in the sky and the repeating patterns of grass and concrete.

Ironically, when we’re interested in compression, pixels are almost the worst possible way to represent this.

If we simply stored pixels one-by-one, an entirely random grid of colours would take just as much space as a picture of a sunset.

But in the image of the sunset, colours change gradually and form repeating patterns. We can exploit this to store whole chunks of the image all as one unit.

JPEG accomplishes this by splitting the image up into 8x8 chunks of pixels, and applying something called a discrete cosine transform.

Properly deriving the transform requires a lot of calculus and something called Fourier analysis. But if you were to make it through all of the maths, you’d end up with a very curious result.

It just so happens that any 8x8 chunk of pixels can be converted into a combination of these 64 patterns:

For example, take one 8x8 chunk of pixels from this cat wearing a rather fetching scarf:

Instead of storing each of these pixels individually, we can represent the whole chunk by layering together just a few of the above patterns, tinted by varying amounts.

Adjust how many patterns get layered together

Crucially, not all patterns are created equal – some of the patterns are influencing the brightness and colour of this chunk enormously, while others (the mostly grey ones) make almost no difference at all.

Furthermore, the patterns contributing the most colour always seem to be clumped towards the top-left of the pattern grid. This is true not only for this 8x8 chunk, but for almost any other 8x8 chunk from any picture taken in the real world. The world is rarely unpredictable enough to need any of the very complex patterns found in the bottom-right of the pattern grid.

This becomes even clearer when we do the same conversion on a completely random 8x8 chunk – we do not get any of the clumping like we would expect for a photo of the real world, as there are no patterns or relationships between neighbouring pixels to be exploited.

Adjust how many patterns get layered together

JPEG relies on this property of real-world images to throw away as much extraneous data as possible. It can ‘turn off’ the bottom-right patterns entirely when they are not not doing anything useful to save space, and prioritise the patterns from the top-left instead.

Infinite possibilities

JPEG gives users complete control when they want to compress a picture.

They can choose how much to prioritise luminance (brightness) over the other colour channels. They can choose how much to prioritise the top-left of the pattern grid over the bottom-right. They can choose how precisely each of those patterns’ brightnesses and hues can be adjusted (in a process called ‘quantisation’).

These settings all impact how much the JPEG will be compressed, and how true to the original image the compressed version will be.

For instance, here are all of the strategies from above being used together in a simple compression system:

Choose between better compression and better quality

In fact, there are so many ways to adjust these settings in JPEG that we are still discovering even better ones.

In 2017, Google introduced a new encoder which uses an artificial intelligence (AI) to select all of these settings automatically, and it beat the industry-standard human-programmed encoders by over 35%.

And while this pursuit has spanned almost two decades now, we will likely never find a truly optimal system.

Why? While most problems have clear answers, JPEG is unique. It is trying to be an answer to a question so utterly human that it can be argued about for days, weeks, years or decades –

“Does that picture look a bit weird or is it just me?”