Compositing 101: The Basics of Rotoscoping, Tracking and Keying

The combination of elements from two or more photographic sources often produces a striking effect. Originally a fine art form, this layering technique is referred to as collage. At the turn of the century, filmmakers started to experiment with a blending of layers of film, consequently ushering in the world of film optical effects.

In addition to dozens of filmmakers all over the world, by the beginning of this century a producer and cinematographer for Thomas Edison named Edwin S. Porter began experimenting with mattes. In 1903, Porter's feature film The Great Train Robbery featured two extensive scenes created by masking the camera and shooting double exposures.

"In-camera" effects quickly led to film opticals, in which successive layers of film were laid down on an optical bench and printed onto a third piece of film. This allowed filmmakers the creative freedom to explore visual possibilities without logical or geographic restraints.

With the advent of computers, the process of combining different layers became infinitely easier, but at the same time more complex, as the variety of combinations are now seemingly limitless. Filmed images can now be scanned into a computer running compositing software, enabling the digital blending of several-or, literally, hundreds-of layers of imagery. It is the current standard for creation of special effects in film, television, and TV commercials. In the digital realm, compositing is the umbrella term for the many processes required to technically accomplish image combination in the computer.

In order to illustrate tracking and compositing, we have chosen to examine two scenes from Free as a Bird, the award-winning 1995 Beatles music video. We have referred to the two scenes as the "Sgt. Pepper Party" and "Paperback Writer." The video is a seamless mix of new footage and archival Beatles footage taken from films and newsreels produced 20 to 30 years ago. This project was a team effort between Pacific Ocean Post (POP), Santa Monica; Quiet Man, Los Angeles; and CrewCuts, New York. In a little over three weeks, over 280 hours of compositing was completed at POP on two Discreet Logic Flames running on a four-processor 250-MHz SGI Onyx; 140 hours of rotoscoping was done on a Quantel Hal Paintbox.

Rotoscoping, tracking, and keying were the three major components of the postproduction on this video. Work on this project began before any principal photography had been shot, with over 80 hours of rotoscoping of archival footage of the Beatles.

Rotoscoping is frame-by-frame matte preparation. The object or figure intended to be extracted from the source clip is silhouetted by hand, isolating that object or figure on each frame of the source clip. All of the rotoscoping work was done on a Quantel Hal.

The creation of mattes defines what part of an image is seen and what is not. Essentially, a matte is a silhouette in black and white. It is the necessary signal for the computer to cut out the part of the image intended to be visible.

In our case, the combination of images using mattes is done in the keyer portion of Flame. A keyer is simply an electronic compositor. It is important to note that matte creation is a painstaking art form; without precise matte preparation, composited images will not work together. In addition to rotoscoped mattes of objects or people, hold-out mattes were also another necessary form of matte creation on this project. These mattes define foreground and background layers within the same clip. They are used when an element from a different source clip is intended to slip behind an object in the background plate. For example: in the "Sgt. Pepper Party" scene, hold-out mattes had to be made for the elements of the live-action background plate-the table with flowers, and the curtain-in order to slip Ringo Starr into a chair seated behind the table. With these hold-out mattes and a careful balance of contrast and color correction, Ringo could convincingly sit at the table shot 30 years in the future. The archival footage used in these composites came in varying formats (mainly 3/4-inch and D-1 videotape) from the archives at Apple Records.

Once the matte cutting was completed, the actual compositing of layers could begin. In the keyer, a composite clip is generated from a foreground clip and a background clip, using an input matte clip to determine how the two sources are to be combined.

In order to reconcile camera moves on both the live-action plates and archival footage, tracking and stabilization were necessary. Tracking and stabilization are preparation processes which allow multiple camera passes to be married together seamlessly, duplicating camera moves or eliminating any unwanted camera movement in the footage. These processes were used to reconcile archival source material and live action used in the "Paperback Writer" scene.

For "Paperback Writer," the video's director, Joe Pytka, shot a carefully art-directed scene with a coffee table, a chair, a TV set, clues from Beatles history, and a writer sitting at his desk. This was the background plate. In the final composite, archival footage from The Ed Sullivan Show was tracked into the TV set. John Lennon's image was rotoscoped (isolated) from footage and was also tracked into this scene.

While this sounds simple, it proved to be quite difficult, because the two-dimensional footage of John Lennon had to be reconciled into the new camera move with its inherent perspective changes. This is where tracking helped to create a seamless composite. After the camera move was tracked in Discreet Logic's Flame, we tried unsuccessfully to composite Lennon into the chair that was shot practically. The angle of John's body and the chair did not match-it just did not look "right" to the untrained eye-so we covered the live-action chair with the one John was sitting in from the source clip. John and his chair were then able to be placed into the "Paperback Writer" scene.

An interesting note: after working on this composite for some time, it still did not look quite right. Joe Pytka, who had not seen this initial work, walked in one day and saw the footage. He said immediately: "Try flopping John Lennon." In Flame we simply "flopped" Lennon's image; instead of looking left to right, we made him look right to left. Suddenly, the composite looked believable. Sometimes it is just such a small, subtle change that makes the visual effect "work" so profoundly well for the human eye.

The possibilities available to those considering special effects are endless. The only real boundaries are budget and time. Careful planning of the desired effects will prevent a costly fix in post, where time would be better spent enhancing the final composite. With any 2-D effect, a flawless result is dependent on the materials supplied. Compositing demands a careful preliminary examination of all disparate elements intended to coexist in the same time and place.

Discrepancies in lighting and perspective angles destroy the intended illusion that all layers of elements live convincingly in the same physical world.

Kristin Johnson is a recipient of several awards for special effects, including a Clio for the Beatles music video Free as a Bird and the AICP award for both "Nike/Wall" and Pepsi's "Set Piece" with Shaquille O'Neal. (All of this work was directed by Joe Pytka.) Johnson has been a Flame/Inferno artist at POP for the last three years.

Alix Eglis is a producer of visual effects for POP, Santa Monica. She relocated from New York three years ago, where she worked for over nine years combined at Limelite Video and VSC Post/Manhattan Transfer. If you have any questions concerning this story, or other effects projects, Eglis can be reached at 310-319-1741; or, e-mail her at:

1. This is the "Paperback Writer" camera original background plate. The "X" taped on the monitor was used as a tracking point for POP to duplicate the camera move, enabling the archival Ed Sullivan Beatles footage to be placed into the TV. Also note the chair, which will be replaced by John Lennon and his couch from the archival footage. Other enhancements in the composite included correcting the illegible newspaper headline "10,000 Holes In Blackburn, Lancashire" (from the original "A Day in the Life" lyric) and boosting the weak color of the green apples.

2. Archival footage of John Lennon used in the "Paperback Writer" scene. After trying for hours to place John on the chair that was shot practically, POP ended up using the sofa John was sitting on in the original clip.

3. Archival footage of the famous Ed Sullivan Show appearance of The Beatles taped in 1964. This clip was tracked and matted into the TV set in the background plate.

4.The final composited "Paperback Writer" scene. Notice all the changes: Lennon's image is flopped to look more realistic in the environment; the original chair has been replaced; the Ed Sullivan footage is convincingly composited into the TV set; the color in the apples, the newspaper headline, and the truffle box title, have all been enhanced.

1. The "Sgt. Pepper Party" camera original scene. Hold-out mattes were created for the table, doorway, and curtains, so that the footage of Ringo Starr and the green screen footage of the elephant could be convincingly married into thisplate.

2. Three days before the project was due, an elephant was filmed green screen on a stage in Hollywood, because Ringo remembered that there was an elephant present at the original Sgt. Pepper Party.

3. Archival footage of Ringo Starr used in the "Sgt. Pepper Party" scene. Ringo was rotoscoped frame by frame to create a matte, in order to isolate himfrom the background he was shot on.

4. The final composited "Sgt. Pepper Party" scene. The elephant has been tracked and composited into the background plate, utilizing the hold-out mattes of the doorway and curtain. Shadows were also added. Ringo now appears to be sitting behind the table, again thanks to hold-out mattes for the table and other objects. Ringo's image was also scaled and tracked to match the camera move in the background plate.

Compositing: The umbrella term for the many processes required to technically accomplish image combination in the computer.

Matte: A silhouette in black and white. It is the necessary signal for the computer to cut out the part of the image intended to be visible. (Note: A matte can also exist in many other physical forms, such as a masked-off camera composition; a painting on glass; and other pre-digital techniques.)

Hold-out mattes: These mattes define foreground and background layers within the same shot. They are used when an element from a different source clip is intended to slip behind an object in the background plate

Keyer: An electronic compositor.

Keying: Electronically compositing one picture over another. The two types of keying are luminance and chroma-keying. This term came from the word "keyhole" and is interpreted by the computer as a signal enabling a hole to be cut in a cliplayer.

Chroma keying: Another matte derivation method in which the computer sources a specific color (usually green, blue or red) to create the key signal. This is a way of performing automatic matte extraction, using the colored background to create a key signal. Note the elephant shot against green screen for the "Sgt. Pepper Party." The green area surrounding the elephant is interpreted by the computer as a hole and is replaced with the layer behind, in this case the background plate of the Adelphi Hotel.

Rotoscoping: Frame-by-frame matte preparation or retouching of a clip. For matte preparation, the object or figure intended to be extracted from the source clip is silhouetted by hand, isolating that object or figure on each frame of the source. Other retouching, such as rig removal, in which wires or rods are removed from an image (e.g., wires which support a stuntman falling from a building), is also a kind of rotoscoping, because the wires are removed one frame at a time. Scratch removal is also a form of rotoscoping.

Tracking and stabilization: These are pattern recognition processes originally pioneered by the Defense Department to track missiles. These processes are used in the Flame to define a mark of distinct luminance in a source clip and follow its movements through the duration of the clip. The result is an analysis and duplication of the camera move in the source clip. Stabilization is also an analysis of the movement of a defined mark or pattern in the source clip. The detected movement can then be stabilized to eliminate undesirable camera jitter.