Preprocessing Video and Audio

Jan 1, 2099

📄 Contents

␡

⎙ Print

< Back Page 7 of 12 Next >

This chapter is from the book 

Video Compression Handbook, 2nd Edition

Learn More Buy

Legacy Video Issues

There has been a lot of change in the realm of digital video. We’ve moved from interlaced to progressive formats. The number of standard aspect ratios has exploded. So-called high-definition resolutions have gone from being in the minority to the majority. Even higher resolutions like ultra-high definition, 4k, and even 8k are now coming into play. But as a compressionist, you’ll be asked to work with all sorts of video from various periods in time. That means there’s a whole host of issues that come with older video that you’ll need to understand and be comfortable with fixing.

Deinterlacing Video

In the past, most digital video was interlaced because it was assumed that at some point it was meant for playback on a standard CRT television, which had an interlaced display. We’ve now moved to a world where most displays are progressive scan, and they are not always televisions. Because interlaced video is unacceptable for web and mobile delivery, deinterlacing video for playback on the Web or other progressive displays is a fundamental and necessary step.

If the source video is left with interlacing lines intact, the output will appear as jagged lines sometimes referred to as combing. The lines do not make for a good viewing experience and are also difficult to encode for a couple of reasons. Moving objects will keep merging and reassembling (with the two fields moving out of step), and motion estimation will become difficult, thus making the encode inefficient. The interlaced image will also have more detail than is necessary to display the image, so additional bits are wasted when relaying this redundant detail.

You can perform a deinterlace in several ways. Each is designed to optimize different types of images and different types of motion within the video. All in all, there are approximately eight ways to perform a deinterlace, though those eight have several different names by which they are recognized, depending on the tools you are working with.

Blend

The first common deinterlace method is referred to as blending, also known as averaging and combining fields. This method involves both fields being overlaid. This method gives you good results when there’s no movement, but it results in unnatural, low-quality movements. The advantage of this approach is that it’s a fast way to deinterlace and is good for low-motion scenes, such as interviews, but you will get ghosting every time an image moves.

Weave

Another commonly used method is weaving, which shows both fields in each frame. This method basically doesn’t do anything to the frame, leaving you with jagged edges but with the full resolution, which can be good.

Area-Based Deinterlacing

Area-based deinterlacing blends nothing but the jagged edges. You do this by comparing frames over time or by space/position. It gives you good results in quiet scenes with little movement because in those circumstances there is nothing to blur.

Motion Blur

The motion blur method blurs the jagged edges where needed, instead of mixing (that is, blending) them with the other field. This way, you get a more filmlike look. You may have to apply motion blur with a program such as Apple Final Cut Pro or Adobe After Effects before using a compression application.

Discard

With the discarding method, you throw away every second line (leaving the movie at half the original height) and then resize the picture during playback. Because this is the same as skipping Field 2, Field 4, Field 6, and so on, you could also call this even fields only or odd fields only. Although you won’t get artifacts from trying to blend or merge the images, you’ll lose half the resolution, and motion will become less smooth.

Bob

The bob approach displays every field (so you don’t lose any information) one after the other (that is, without interlacing) but with double the frames per second. Thus, each interlaced frame is split into two frames (that is, the two former fields) at half the height. Sometimes bobbing is also called progressive scanning. However, since the bob approach doesn’t analyze areas or the differences between fields, the two approaches are not really the same (see the next section).

Progressive Scan

Progressive scanning analyzes the two fields and deinterlaces only the parts that need to be deinterlaced. The main difference between progressive scanning and area-based deinterlacing is that progressive scanning gives you a movie with twice the frames per second instead of the standard 25 fps or 30 fps movie, thus leaving you with perfect fluidity of motion. To say it more academically, it has high temporal and vertical resolution.

This method is also variously called motion adaptive, bob and weave, and intelligent motion adaptive.

Motion Compensation

The motion compensation method analyzes the movement of objects in a scene when the scene consists of a lot of frames. In other words, it involves tracking each object that moves around in the scene, thus effectively analyzing a group of consecutive frames instead of just single frames.

Although effective for horizontal motion, some software for this technique does not handle vertical motion at all and may fall back on selective blending or other techniques when it is unable to resolve the motion vectors.

Image Aspect Ratio Correction

The film world has dealt with the difficulty of matching the source aspect ratio to the screen ever since movies were invented. For the video world, this is somewhat of a new challenge. Much like Henry Ford’s Model T that came in any color as long as it was black, if you worked with video, the aspect ratio was 4:3. This was a much simpler time because most of the video displays were also 4:3. We’re now living in a multiscreen/multiformat world, and it’s become a lot more complicated. Most of the time, it’s advisable to keep the aspect ratio the same as the source and allow the playback system to account for any difference in aspect ratio. An example would be if your source file is a movie trailer that was delivered to you at 2048 × 858. If you divide the width by the height, you get the result of 2.3869, which is generally rounded up and written as a ratio to one like so: 2.39:1. This is one of the standard aspect ratios for cinema. Let’s say you want to upload this to YouTube and have it look correct. You could reformat the source video to 16 × 9 by adding a letterbox, which is simply black bars that take up the remaining space between the original 2.39:1 aspect ratio and the destination 1.78:1 player aspect ratio. The problem with this approach is that if the end user plays the YouTube video full-screen on a computer or tablet that doesn’t have a 16 × 9 display, the black bars are “burned in,” and the player may not format the video optimally for that display.

The better choice is to resize the video to a standard/optimal width and keep the original aspect ratio. In the previous example, this would mean keeping the 2.39:1 aspect while resizing to 1920 wide. The end result would be a file that is 1920 × 804. The YouTube player will automatically letterbox the video with the appropriate amount of black for the final output display.

Some facilities will also ship finished content in the anamorphic format—that is, wide-screen content that is horizontally squished to fit in 4:3. During playback, specialized hardware is used to restore the wide-screen aspect ratio. When working with content that is either letterboxed or anamorphic, it’s important to correct the image appropriately. In the case of letterboxing, this probably means cropping out the black bars, and in the case of anamorphic, it means changing the aspect ratio from 4:3 to 16:9 to correct the image.

Take care when dealing with aspect ratio conversions to ensure that you aren’t losing content by accident or distorting the picture in a way not intended by the creator.

Pixel Aspect Ratio Correction

An important element of scaling is pixel aspect ratio correction. What is pixel aspect ratio? You already know that aspect ratio just means the relationship between the height and width of the image. Pixel aspect ratio is the same concept, but instead of being applied to the image as a whole, it describes the actual shape of the pixels.

Three common pixel aspect ratios are in use today. The first is taller than wide; the second is the same height and width, which makes a perfect square; and the last is wider than tall. Let’s take a closer look at where you’ll most likely run into each of these now.

You’ll find pixels that are taller than they are wide when working with legacy standard-definition content. These pixels are referred to as nonsquare pixels or 0.9 pixels. Pixels on the old CRT televisions had this shape.

Computer displays and modern HD televisions use pixels that are square. Square pixels are the most common pixel aspect ratio today and by far the easiest to work with.

A few codecs store the image in an anamorphic (squeezed) way but use a special pixel aspect ratio to stretch that squeezed image back out upon display. These codecs use pixels that are wider than tall to do this. Depending on if you’re in a standard-definition environment or an HD environment, the codec will stretch by slightly different amounts: 1.21 for SD and 1.33 for HD.

Make sure you get the aspect ratio correction right when you convert to square-pixel sources from nonsquare-pixel sources by verifying that the output frame size matches the source aspect ratio. So, if you use a 4:3 source, a 4:3 frame size such as 1440 × 1080, 960 × 720, 640 × 480, and 320 × 240 are all acceptable choices, even if the source frame size is 720 × 480, 720 × 486, 640 × 480, or 352 × 480 (all nonsquare pixels).