Preprocessing Video and Audio
You have shot and edited your video, and now it’s time to compress, encode, and deliver it. But one more step is left before final encoding can begin. This step is called preprocessing, and it consists of a variety of optimizations you need to perform on the video and audio before you can hand them off to the encoder. These optimizations can include deinterlacing, inverse telecining, cropping, scaling, aspect ratio adjustments, noise reduction, brightness and color corrections, and audio adjustments.
Preprocessing of some sort is almost always necessary to get your video to look its best. The goals of preprocessing are to clean up any noise in the video and to optimize its basic elements for playback on the devices you are targeting (TVs, mobile phones, computers, and so on). Preprocessing is sometimes referred to as the “magic” part of compression because it takes a lot of practice to achieve the desired results. This is the most artistic and frequently misused part of the compression process. It’s easy to go overboard with the changes until you get a feel for how much you really need to do to your video and audio to optimize it for encoding.
Fortunately, preprocessing is a craft that can be learned, and practice will only make you better at it. Understanding why you preprocess video and audio and how the various types of optimization that occur at this stage affect your final product will help you make your preprocessing choices.
Spatial/Geometric Preprocessing
Whether it’s simply changing the frame size and aspect ratio to match the destination or reframing the video entirely, spatial preprocessing is one of the most common types of preprocessing you’ll encounter as a compressionist.
Cropping: Title and Action Safe Zones
Cropping is a way of identifying a specific region of the video to use in the compression, excluding the other areas of the source frame. Cropping can be used to change the aspect ratio of a wide-screen 16 × 9 video to the older 4 × 3 format, or it can be used to crop out unwanted picture areas at the edge of the frame.
Most TVs do not display the entire image that is transmitted. Instead, they are overscanned, meaning slightly larger than the viewable area of a consumer-grade television. This is done for several reasons, most of which culminate in needing to hide irregularities that exist in the edges of the video frames. Production people are aware of this and have created three regions of a video image that affect how they frame shots and incorporate graphic overlays: overscan, action safe, and title safe. Figure 4.1 will be familiar to anyone who has looked through a video camera viewfinder. It depicts the action- and title-safe areas that a camera operator needs to be conscious of when framing a shot.
Figure 4.1 Compressionists need to be aware of the action- and title-safe regions of the frame and use them as a general guideline for cropping nonbroadcast content.
The outermost region of the video is known as the overscan region. This area may not appear on standard consumer TV screens, and it often contains the edge of the set or cables and other equipment. Professional-grade monitors have a mode that allows this overscan area to be viewed, known as underscan mode. These monitors may also include action-safe and title-safe indicators.
The action-safe area is the larger rectangle within the image area. This area displays approximately 90 percent of the video image and is where camera operators will make sure to keep the primary action framed to keep it viewable on TVs.
The smaller rectangle in the image is the title-safe area (comprising about 80 percent of the visible image). It is far enough in from the four edges of a standard TV that text or graphics should show neatly without being cut off or distorted. The title-safe area started out as a guide for keeping text from being distorted by the rounded corners on old cathode ray tube (CRT) TVs. Most modern TVs display a lot more of the area outside of the title-safe zone than their CRT counterparts. However, as a rule of thumb, text and titles should still be contained within the title-safe area.
Cropping can be a useful tool to ensure that the video displays the same regardless of delivery and playback format. While TVs routinely overscan, mobile devices and online video players do not. If you want the video to appear to display the same on all devices, it may be necessary to crop the overscan area on the formats that do not overscan.
Scaling
Scaling is another key part of the preprocessing process, and it simply means either enlarging or shrinking the frame size. A proportionate scale means that the same scaling factor is applied to both the horizontal and vertical axes. Many times you’ll crop an image and then scale it up so that the frame size stays the same as it did when you started. Other times, it may mean shrinking a video from the original size to a size more appropriate for delivery (Figure 4.2).
Figure 4.2 These images demonstrate how a 720 × 480 source clip (left) can be scaled down to 320 × 240 (right) for web delivery with no loss in quality.
Scaling video up is called an upconvert. Going from a small frame size to a higher frame size (as shown in Figure 4.3) is not recommended but can be necessary in certain situations. In these situations, the different scaling algorithms in different tools can produce dramatically different results. Some tools will simply scale pixels, while others will interpolate, or create new pixels. There are lots of options for tools in this area ranging from lower-cost plug-ins for After Effects all the way up to dedicated hardware for real-time SDI conversions. As always, your mileage may vary, and it’s always best to test your content against multiple tools to determine the best tool for the job.
Figure 4.3 Upscaling, unlike downscaling, is a bad idea—you can’t add pixels that weren’t in the image to begin with without compromising quality.
A key question when scaling is what size should you scale to? The answer will be determined by your destination and the frame sizes supported. As bandwidth has increased and technologies such as adaptive bitrate streaming are invented, the answers to these questions have gotten both simpler and more complex.
While there are all sorts of possibilities, there are two general facts: the larger the frame size, the bigger the output file will need to be to maintain good quality, and the slower it will play on some machines. Of course, small frame sizes could also be low quality, so you need to find a good balance that works for your projects. One thing to keep in mind, however, is that it is always best to scale down rather than scale up. That means if you have a video finished at 1920 × 1080, creating an upconverted 4k/UHD version won’t mean higher quality.
Here are some general guidelines, regardless of aspect ratio:
Height of 2160 pixels: At double the size of HD video, this is a great high-quality encode if you don’t need to worry about file size or playback speed (some older computers will have difficulty playing back files this large without dropping frames).
Height of 1080 pixels: This is probably the most common delivery size today. Most platforms and players support some version of 1080p.
Height of 720 pixels: This is a good choice if you need to prioritize file size over spatial resolution. It can also be appropriate if your target machine is older and slower.
Height of 480 pixels and smaller: What used to be called standard def, these frame sizes are pretty small. These are more suited to mobile video, given the screen resolution and bandwidth.
For more details on general guidelines for frame sizes, see Table 4.1. These ratios assume that the pixel aspect ratio is 1:1 (square).
Table 4.1 Common frame sizes
Output |
Full-Screen (4:3) |
Wide-Screen (16:9) |
Ultra-high definition |
2880 × 2160 |
3840 × 2160 |
Extra-large broadband |
1920 × 1440 |
2560 × 1440 |
Large broadband/high definition |
1440 × 1080 |
1920 × 1080 |
Small broadband and large mobile |
960 × 720 |
1280 × 720 |
Large mobile |
640 × 480 |
854 × 480 |
Medium mobile |
480 × 360 |
640 × 360 |
Small mobile |
320 × 240 |
426 × 240 |
