If your work involves sitting in front of a computer, chances are — at some point or another — you’ll have to work with digital images. For most people, it’s more than enough to know how to save images into the right formats. If you’re a game developer though, it’ll help to have a bit more in-depth knowledge, since you’ll be working with game engines that like to throw around obscure terms like true color, bit-depth or ARGB-16.
I struggled a lot with images during my early days in game development precisely because of these terms — Google searches and Wikipedia articles only led me to pages with big words that confused me even more. With time, I eventually figured things out — hence this primer. Hopefully, reading this means you won’t take as long as I did to figure things out.
Bits and bytes
Before we get to the more interesting bits (no pun intended), it is important to talk about bits and bytes — not just what they are, but also how they can be used to represent information.
A bit is a binary digit — this means that it represents two numbers, a 0 or 1 (bi- means two). You can juxtapose it with a decimal digit, which represents ten numbers, 0 to 9 (deca- means 10). Like how decimal digits can be strung together to represent bigger numbers (e.g. 10, 456, 8421), binary digits can be strung together to represent numbers bigger than a 0 or 1 (e.g. 10110100, which represents 180).
A byte is a “packet” of 8 bits — which means that they can represent 256 different numbers, from 0 (i.e. 0000 0000) to 255 (i.e. 1111 1111). Multiple bytes are packed together when representation is needed for bigger numbers: 2 bytes (i.e. 16 bits) can represent 65536 (= 216) numbers, 4 bytes (i.e. 32 bits) can represent 4294967296 (= 232) numbers. People don’t usually use more than 8 bytes to represent a number, because we haven’t needed to use numbers that large.
By convention, bytes are represented by hexadecimal (base-16) digits in writing (and code). Hexadecimal digits use 0 to 9, as well as alphabets a to f to represent numbers 10 to 15. One hexadecimal digit can represent all possible numbers in 4 bits (0 to 15), and two of them can represent all possible numbers of a byte (ff is hexadecimal for 255).
Also, even though bytes themselves are numbers, that is not the only thing they can represent. Bytes can also be used to represent text by mapping each of their possible values to a character, as shown in the map below.
There are many different ways to map text to bytes, and these mappings are called charsets. ASCII is one of the earliest and most primitive charsets, which uses 1-byte per character to represent metacharacters, all of the English alphabet and most of its punctuation marks. In an ASCII text file, the size of the text file is the number of characters in it (i.e. the word “hello” is five bytes because there are 5 characters in it). However, not all charsets adhere to this 1-byte per character standard, as some of them need to represent more than 256 characters. You can find out more about charsets in this Wikipedia article about character encoding.
Storing the pixel in bits and bytes
If you zoom in on any digital image far enough, you will see that they are made up of many single-coloured square dots called pixels. To store a pixel, you basically need to record information about what kind of colour it contains.
Generally, there are 3 varieties that pixels come in:
- Greyscale pixels can be represented by 1 byte each, with 0 representing a pitch black pixel and 255 representing a pure white pixel. An image made up of greyscale pixels can usually be called 8-bit color images, because 1 byte is used to represent each pixel.
- RGB stands for Red, Green, Blue. RGB pixels can represent every possible colour on a computer screen by mixing varying amounts of red, green or blue light, and they use 1 byte to represent each colour channel for a total of 3 bytes per pixel. An image made up of RGB pixels can also be called 24-bit color images.
- RGBA adds an extra Alpha channel (and a byte) to an RGB pixel to represent the transparency of the pixel. They are called 32-bit color images, because 4 bytes are used to represent each pixel.
In the most basic kind of image (i.e. a bitmap, or .bmp), you can calculate the image’s file size by multiplying the number of pixels with a pixel’s byte size. A 24-bit color RGB bitmap image with a resolution of 300 × 250 pixels, for example, will contain:
3 bytes × 300 × 250 = 225,000 bytes
The actual size of the file will be a little bit larger, as the bitmap will also have an image header that stores metadata like the image’s resolution, author and how the image was created. Think of the image header as a mini-text file embedded on top of an image file.
Hexadecimal (or hex) colour codes
Remember that I said RGB images are usually 3 bytes, and that 1 byte can be fully represented by 2 hexadecimal digits? This means that you can fully represent any colour on a computer with six hexadecimal digits. If you’ve worked with image editing software, then you’ve probably come across them before (see the Hex column in the image below).
In a hex colour code, the first two hex digits represent red, while the next two represent green, and the last two represent blue. Hence, the hex codes #ff0000
, #00ff00
and #0000ff
will give you pure red, pure green and pure blue respectively.
In image-editing software that support image transparency, you will also get 8-digit hex codes to represent RGBA colours. The last two digits in these hex codes represent the alpha value.
Bit depth
Although a colour channel is usually represented with 8 bits, they don’t necessarily have to be so. The greater the number of bits you use to represent each colour channel, the more colour gradation you will have. Besides 8-bit bit depths, images usually can also come in bit depths of 16-bit or 24-bit.
With the exception of greyscale images, using anything more than 8-bits of bit-depth in an image is usually overkill. In an RGB image, a bit depth of 8-bits will give you more than 16 million possible colours (as opposed to only 255 in a greyscale image).
Image compression
Although we’ve established that the size of an image is equals to pixel size × image resolution, you will find that images in common formats like JPG and PNG have much smaller sizes than our little equation above says they have. This is because they use colour mapping, similar to the way characters are mapped, to reduce the number of bits needed to represent each pixel.
While the RGB format is capable of representing over 16 million colours, images rarely use all of these available colours. Hence, in a “colour mapped” image, instead of storing colour information for every single pixel, every unique colour in the image is recorded once onto a table and given a serial number. Each pixel in the image then maps itself to a serial number instead of an RGB code to represent colour information. Consider this 8 × 8 pixel RGB image:
If we were to store the pixel information of the image normally, each pixel would take up 3 bytes. There are 64 pixels in an 8 × 8 image, so the entire image would be 192 bytes. Now, if we were to “colour map”:
Every colour in the image would be mapped to a serial ID, and then each pixel only needs to be represented by the serial ID. Since there are only 4 possible values for the serial ID, you can represent one pixel in 2 bits (see the serial binary column). That means we are storing 4 pixels in a byte!
In this way, we only need to use 16 bytes to store pixel information, plus another 12 bytes of colour information (3 bytes per colour). This means the whole file can be stored in 28 bytes — 14% the size of the original file!
Lossy compression and “true color”
In real-life situations, we also deal with images that contain more than 4 colours; and in those situations, we are also going to find that some image colours will be similar to each other. There might be 10 different hues of red in a particular image, out of which 3 look pretty similar — why don’t we combine them into 1 colour and save 6 bytes of colour information instead? This is the idea of lossy compression, where you destroy some insignificant colour information in order to save space. When you say an image is “true color”, you are saying that the image has not undergone lossy compression — all of its original colour information is preserved.
There’s more
Needless to say, the whole deal of how image compression works is more nuanced than this article makes it out to be. There is also a lot more to talk about when it comes to using images for video games, but let’s save that for a Part II in the distant future.