Uncompressed Digital Video
A refresher course in the basics and details of uncompressed video.
Although compression has been a big focus in video for the last five or six years, uncompressed video is starting to garner more attention. Examples can be found in recent editing products. One such example is Avid Xpress Pro, which includes an uncompressed video option with Mojo.
Compression — DV, DVCPRO, DVCPRO 50, IMX, and MPEG-2 — has been the focus of most of our attention over the last half-dozen years. Now, however, “uncompressed” video is gaining new attention.
Avid includes an uncompressed video option in Xpress Pro when used with Mojo. Apple's Final Cut Pro, in conjunction with boxes like AJA's Io, can work with either 8-bit or 10-bit uncompressed video. And, HD is typically edited as uncom-pressed video.
So exactly what is uncompressed video? Purists would say it is almost never the video media we work with.
Interlaced video is inherently “compressed” because the process of dividing an image into odd and even fields is a clever technique to reduce bandwidth by a factor of two. Likewise, NTSC, “color-under,” and “analog component” recording all remove significant information from images. Based on the eye's inability to resolve fine color detail, the chroma signals are dramatically frequency limited. Additionally, the luminance signal is low-passed filter to allow it to be carried without interference with the chroma signals.
To make progress, let's ignore the purists and start at the beginning — where analog video becomes converted to digital video. Devices that convert optical images to electrical signals are analog devices. So digital video always comes from the analog domain. The CCD sensor is an analog device that generates, during 1/50
Analog-to-digital conversion is done on a line-by-line basis. The total number of lines depends on the video standard: NTSC has 525 lines, and PAL has 625. These values are the total number of lines in the standards. The number of lines used for the image is far fewer: NTSC uses 483, and PAL uses 576.
As each line is scanned, the A/D converter samples the analog voltage and converts it to digital values. So how many samples are taken from each line of NTSC video and PAL video? We get these numbers by knowing the sampling rate — expressed in samples per second. Engineers decided to use the same sampling rate for both NTSC and PAL. That rate is 13.5 million samples per second. Or, expressed as a frequency, 13.5MHz. At this rate, the total number of samples per line is 858 for NTSC, while PAL has 864.
Not all of these samples represent the image. For both NTSC and PAL standards, 720 samples are reserved for image data. (See Chart 1.) However, the typical number of source image elements varies from 704 to 712. And, interestingly, according to the Advanced Television Standard Committee (ATSC) definition for DTV, the number of samples for SD-DTV is 704. (A number chosen because, unlike 720, it is evenly divided by 64 and ideal for cable-casting.)
Many engineers prefer to operate an A/D converter at a 27MHz sampling frequency, even though this requires parts that are more expensive. This process is called oversampling. The ultimate result is not two times more samples; rather, oversampling yields greater signal quality.
Let's assume that we are digitizing a monochrome image captured by an analog sensor. The brightest shade that needs to be digitized is white and the darkest shade is black. The range between black and white is a grayscale. The more levels (i.e., resolution) in the grayscale, the better image quality.
There are two typical resolutions: 256 levels (values of 0 to 255) and 1,024 levels (values of 0 to 1,023). The number of A/D bits required for the former are 8 bits and the latter 10 bits. When we are working with three color signals (red, green, and blue), the number of definable color values possible with 8-bit systems is approximately 16 million — the same number as 24-bit computer graphics cards — while 10-bit systems can define approximately 64 million colors.
IRE is an old name for IEEE units, but is still used in the NTSC world. Legal video signals range between -40IRE (sync-tip) and +100IRE. With an 8-bit system, white is represented by a value of 200, while in a 1,024-bit system it is represented by 800. However, both the PAL and NTSC systems allow brightness to exceed the nominal intensity of white. The following ranges are available for “super-white” intensities: 201 to 235 (8 bit) and 801 to 940 (10 bit.) Values can momentarily exceed 235 and 940, but are never allowed to become 255 and 1,023.
So, is the darkest shade black? Yes and no. In the digital world it is. Black has digital values of 16 (8 bit) and 64 (10 bit). However, when digital values are converted to analog signal levels, the situation is more complex.
In the PAL and Japanese NTSC systems, a digital value of 16 or 64 is converted to 0IRE. With the U.S. NTSC system, after 16 or 64 is converted to an analog signal, the output voltage is increased slightly. Using the IRE scale, black becomes 7.5IRE. The voltage increment, called “pedestal” or “set-up,” is a legacy of the way the U.S. television system was designed. It is not a good legacy, as it slightly reduces the resolution of the U.S. NTSC grayscale.
There is another problem with the U.S. system. Equipment imported from Japan into the United States does not have set-up, so black is 0IRE. That means a monitor correctly adjusted for a 7.5IRE pedestal will not be correct for non-pedestal sources. If the monitor is correctly set for 7.5IRE, dark-gray shadows from a 0IRE source will disappear into black. Conversely, if the monitor is adjusted for a 0IRE source, a legal NTSC source will have dark gray rather than black shadows.
What about the accuracy of the conversion process? While there are measures of accuracy, I am not going to talk about them. However, it is critical to understand that converters vary greatly in their accuracy. Thus, two converters that have the same sampling rate and sample size can vary greatly in their conversion quality. So when purchasing digital equipment, you cannot fully rely on specifications.
Once video data are in the digital domain, it is vital that math errors do not corrupt them. One of the simplest ways to define image-processing quality is to ask how many bits are used for calculations. This question is the equivalent of asking how many digits a hand calculator can handle. The general rule of thumb is that processing should be done with at least two more bits than used to represent the signal. Nowadays, when A/D converters offer 10 bits, an extra 4 bits are typically used in the DSP, so it is not uncommon to see 14-bit DSP circuits. The need for accuracy explains why Final Cut Pro offers 10-bit (per component) and High Dynamic Range (32-bit floating-point) rendering calculations.
RGB digital data to YUV digital data
After conversion, the image is represented by an array of red, green, and blue digital values. Typically, this is referred to as 4:4:4 video. (More about this later.) Working with 8-bit values, each sampled point occupies 24 bits. With 10-bit values, 30 bits are required. The red signal, after conversion, is represented by 720 values for each of the 480 lines in an NTSC picture.
Apple Final Cut Pro, when used with boxes like AJA's Io, is one of several apps that can work with either 8-bit or 10-bit uncompressed video.
Likewise, the green signal, after conversion, is represented by 720 values for each of the 480 lines.
And the blue signal, after conversion, is represented by 720 values for each of the 480 lines — resulting in more than 1 million digital values.
This amount of data can be reduced by taking advantage of the fact that the human eye is not able to resolve the color of fine details. In fact, it is possible to reduce color resolution by a factor of four with only a minor loss of image quality. It takes digital processing to reduce color (chroma) resolution while keeping luminance (black and white) resolution at maximum. The first step is to obtain the luminance signal (symbolized by Y), and then obtain the chrominance signals (symbolized by C).
Obtaining the luminance signal
The luminance signal is obtained by adding red, green, and blue signals together. Each signal is multiplied by a coefficient that represents the amount the color signal will contribute to the luminance signal.
Y = 0.299R + 0.587G + 0.114B
You will note that most of the luminance signal is from the green signal. And, as represented in Chart 3, the results are 720 values for each of the 480 lines. To further reduce information, a pair of color “difference” signals is computed.
Obtaining the blue and red ‘difference’ signals
The blue difference signal is computed by mixing the blue signal with an inverted luma signal: Cb = B - Y
This signal is also symbolized by (B - Y) or U. (Although Cb is becoming the most common symbol.) The results are 720 values for each of the 480 lines.
The red difference signal is computed by mixing the red signal with an inverted luma signal: Cr = R - Y
This signal is also symbolized by (R - Y) or V. (Although Cr is becoming the most common symbol.) The result is 720 values for each of the 480 lines.
Now that the Cb and Cr signals have been computed, the final Cb and Cr signals are obtained by discarding every other Cb and Cr value for each color component.
The final array of digital data is shown in Chart 8.
The pattern of digital data shown in Chart 8 is called “4:2:2” sampling. The “4” represents luma values, while the first and second “2” represent each of the chroma values. We are used to noting this as (Y, B - Y, R - Y) or YUV or YCbCr.
Where does the “4:2:2” come from? What's obvious is that the 2:1 relation between the “4” and “2” represents the relation between the ratio of luma and chroma samples. So where does the “4” come from?
To sample the 64-microsecond duration of a video line, the Nyquist sampling frequency requirement is 12MHz. However, digital video is sampled at 13.5MHz (or 27MHz). This higher frequency was chosen because it is greater than 12MHz and it works well with both NTSC and PAL standards. If you divide 13.5MHz by the NTSC 3.58MHz NTSC subcarrier frequency, you get approximately four.
As shown in Chart 9, 4:2:2 sampling reduces the data required by one-third. Each chroma component is sampled at 6.75MHz, one-half the luma sampling rate. The resulting luminance and chroma bandwidths are 5.75MHz and 3MHz (per color component), respectively. This type of sampling is defined as CCIR-601 and ITU-R BT.601.
If you are protesting that CCIR-601 video really is not uncompressed, you are correct. However, relative to formats that employ a codec or an encoder to squeeze CCIR-601 video even further, this video can be considered uncompressed.
To comment on this article, email the Video Systems editorial staff at firstname.lastname@example.org.