For pretty much any film or video project you might work on these days, you’ll spend at least part of the time dealing with compressed material. In order to decide which codec to use, you need to keep in mind what purpose you are using it for and how a particular codec fulfills that purpose. In this entry, I’ll explain some of the techniques I use for evaluating codecs. In later entries I’ll cover some more specific results.
Image compression (really any kind of compression) involves a tradeoff between space or bandwidth (how much data you need to store or move around), complexity (how much computing power is needed to encode or decode the material) and quality. The very idea of compression implies a compromise of some sort, or we’d only be dealing with uncompressed files in the first place.
Every codec balances these qualities differently and, as computing resources, bandwidth and storage become more abundant, the decision on which codec to use for which purpose will need to be reevaluated periodically. (For an excellent example of this, consider Panasonic’s transition from DVCPROHD to AVCI, which provides improved image quality at the same bandwidth, at the cost of increased complexity. More on this in a future installment.)
So, how do we go about evaluating a codec? Well, evaluating the space and complexity (or at least their effects in your environment) are relatively straightforward – just compress some footage and monitor the size and resources used. But how do we judge quality?
This is what I do: I start with some uncompressed footage. I have a few clips set aside that I use to compare codecs. If you’re planning on doing codec testing for a specific project, make sure you pick some footage with attributes that are reflective of what you will encounter, as many codecs work better with certain types of footage than others (the greatest factor will be the amount of detail in the original footage, but long-GOP codecs will also be affected by the amount of difference between frames. Also, different codecs will down-sample the luma and/or chroma components of the image which will affect sharp color transitions). I then compress the footage in the codecs to be tested so I have several versions of the same clip – an uncompressed version, as well as various compressed versions. After looking at the different clips to see if anything jumps out at me, I then look at only the difference between the original and compressed versions of the clip, magnified to enhance any difference. (I use Shake for this. Here’s what my Shake node tree looks like)
I can now step through the clips and see how each codec handles different challenges, such as fine detail and fast movement. Here are some examples:
Here’s a crop from the center of our original image (Note that for the sake of bandwidth all images in this post have been cropped and compressed as high quality JPEGs. I think the quality is enough to make my points, but if you want to see the original images contact me and I’ll arrange it.):
Here’s the amplified difference image for Avid DNX 115:
Avid DNX 175:
Apple ProRes HQ:
Sony HDCAM SR:
Notice that the more lossy a codec is, the more detail shows up in the difference image. Any sort of scaling in a codec (due to sub-sampling in the luma channel) creates very obvious loss around any fine detail. Other artifacts such as macro blocks and ringing also become clear.
Note 1: As an example of why it’s important to look at the original images in addition to the difference image, check these images out. I’ve deliberately created these images for illustrative purposes, but I’ve come across exactly this situation with real codecs. First the differences:
Notice that the second image is somewhat lossier than the first. But take a look at the actual images:
Look at the maroon carpet and the rear surface of the cart in both images. In fact, despite the fact that the second image is lossier (I’ve scaled it and added a fair bit of noise), it is still a much better looking image than the first.
Note 2: it is possible to condense the entire difference image down to a single number called the Peak Signal-to-Noise Ratio (or PSNR for short). This can be useful for certain purposes, but I find that looking at the actual image is much more informative. Firstly, the PSNR number itself is meaningless. It depends on the signal being compressed, and there are a few slightly different ways of calculating it resulting in different numbers, so it is only useful as a comparator for other compressed signals originating from the same source material. Secondly, it doesn’t tell you exactly what is being lost. Often what is being lost is more important than the overall level of lossiness.
Note 3: This is for the geeks out there. I noticed something interesting when preparing the images for this post. I compressed all of the examples as fixed quality JPEGs, and it turns out that if you sort the difference images by size (in decreasing order), they will sort in order of PSNR or the original image (lowest to highest). This makes sense because the compressibility of a signal is inversely proportional to the amount of information in contains, so the more detail in an image the less compressible (without signal loss) it is.