Corps de l’article

How was sound film technology constructed by its many users when it was first introduced into Hollywood mainstream production in 1927? The question is straightforward and apparently simple, but the response is decidedly neither. Two obstacles impede direct handling of this question. First, contemporaries rarely confronted this question straight on. They didn’t talk or write all that much about their understanding of the new technology, thus forcing us to read between the lines in order to grasp their attitudes and proclivities. Second, we are not dealing with a single user, but with multiple user groups and approaches. In order to analyze this or any other technology, a great deal of interpretation is required.

For the purposes of this article, I propose to “read” the soundtrack of an exemplary early film, Warner’s The First Auto (Roy Del Ruth, 1927), in order to understand the various ways in which sound technology was understood at the point when it was first introduced. Based on a story by producer Darryl F. Zanuck, The First Auto opened at New York’s Colony Theatre in late June 1927, some three months before The Jazz Singer (Alan Crosland, 1927). Utterly fascinating because of its wide variety of sound usage, this is a film that deserves to be far better known. In the pages that follow, I will treat The First Auto as an apt indicator of the ways in which Warner’s Vitaphone technology was understood when it was first deployed. Careful consideration of The First Auto’s sound strategies will then serve, in the second half of the article, as an appropriate backdrop for further remarks on the ways in which understanding of sound film technology changed over the course of the next decade.

When filmmakers first gained access to Vitaphone sound film technology, how did they understand that technology? For the directors and sound engineers of the 1926-1927 period, just what was sound? Or, to put it in another way, what sounds were needed or at least considered useful in a cinema context? Surprisingly, this question is rarely addressed in the discourse of the period. Not for several years would the film industry get beyond the “techie gear head” approach characteristic of the Bell Labs personnel responsible for developing the Vitaphone system. Engineers like E.C. Wente and Joseph Maxfield wrote repeatedly about microphones, amplification and frequency ranges, but they are far less loquacious about just how the new technology should be used. Filmmakers, including sound personnel, have even less to say about their understanding of the new technology. Often, the best witnesses are the films themselves.

The opening scenes of the Vitaphone version of The First Auto offer useful insight. Over the title and credits we hear the familiar tones of “In My Merry Oldsmobile.” No 1920s spectator could fail to make the connection between the lyrics of Gus Edwards’ 1905 smash hit and the title of the film at hand. In this film, sound means music, but not just any kind of music. Recalling a long-standing silent film tradition, The First Auto uses the familiar lyrics of a popular song to establish the film’s tone and topic.

Once the race begins, a new model appears. Instead of recording the sound of horses’ hooves, the film’s sound crew has recourse to a familiar vaudeville representation of horses running, where the sound of hooves striking the earth is represented by the rhythmic pounding of coconut shells. And as in vaudeville, where the volume of sound effects varies little, we find that a cut-in to a medium shot of sulky driver Hank Armstrong produces no change in the volume of the horses’ hooves effects, even though the larger image apparently places us substantially closer to the sound source.

Some thirty seconds into the scene, we cut from the trotting horses to the onlooking crowd. Just as an image of running horses elicited the sound of running horses (as filtered through vaudeville conventions), so an image of the crowd calls forth loud crowd sounds. But it is important to note that the crowd sounds are heard not under or over the race sounds, but instead of the sounds of the race. Even though the race sounds could obviously be heard by the crowd in the stands, and the sounds of the crowd would be audible to the jockeys, we hear only one sound at a time—the sound associated with the image visible on screen. The sound seems tied not to a location, but to an image.

Shortly, along with a medium shot of one particular spectator, his lips moving, we hear a shouted “Come on Hank!” just before reading an intertitle that simply repeats those same words. When the race ends, this process is repeated. As we watch a spectator shown in medium shot, we hear a shouted “Hank wins!” But this time, instead of simply replicating the spectator’s audible comment, the title asserts that “Hank Armstrong wins.” Why was this mismatch allowed to subsist? And what lessons do this film’s intertitles hold for us?

The handling of the very next scene raises further questions about the relationship between the film’s sound and its intertitles. When Hank and his mare Sloe Eyes take a victory walk through the bar, the sound design suddenly becomes unexpectedly complex. The music replicates the opening credit strategy, appropriately matching the lyrics of the familiar drinking song, “How Dry I Am,” with the joyful bar atmosphere. At the same time a display of beer glasses is accompanied by the tinkling of glass. But when we focus on Mayor Robbins, whose moving lips identify him as the speaker of a congratulatory speech directed at Hank, we are not allowed to hear the Mayor’s words, but only to read them in the form of an intertitle. Why do we hear an anonymous race spectator calling out “Come on Hank!” and “Hank wins!” and yet not hear the Mayor pronounce his speech? For the time being, it suffices to note that not all sounds or dialogue are deemed worthy of Vitaphone treatment. The rest of The First Auto will help us understand why this should be.

The next scene places us in the middle of what has become a cliché of American filmmaking. A young woman and a young man are seated together in a public place, sipping sodas and making small talk. Once again, we are not allowed to hear what they have to say, but must instead make do with intertitles. They are interrupted by an unpleasant rival suitor, who takes pleasure in showing his lapel pins to the seated couple. To the young lady, he reveals a button proclaiming “I’m for you, Oh you kid,” while to his rival he shows “Go way back and sit down.” Displayed in close-up images, these visual verbal messages are the first of many in the film. We see several different newspaper clippings. We laugh at a lettered storefront identifying the undertaker as one D.P. Graves. Along with Squire Stebbins, we read a letter from his insurance company and then consult the “General Instructions” explaining the modus operandi of his brand new horseless carriage. We read Bob’s note to his father Hank. For habitués of mature silent film, these photographed verbal inserts are familiar indeed, for the filmmakers of the 1920s went out of their way to naturalize their intertitles by photographing diegetically present written or printed texts. It becomes increasingly clear that early sound filmmakers’ attitudes towards language and sound were heavily inflected by their silent film experiences.

Whereas comedy scenes in so-called silent films were often accompanied by up-tempo pieces connected to the narrative by their lyrics, dramatic scenes were typically accompanied by music matched to the narrative by its emotional and tonal qualities. When his mare Sloe Eyes dies in Hank’s teary-eyed presence, our commiseration is thus encouraged by a Glazunov meditation and a Rimsky-Korsakov romance. Those responsible for The First Auto’s sound are in this case clearly doing nothing more than applying familiar silent film standards to the new technology. But what happens next derives from no obvious pre-existing model.

Why are some words sounded and others not? Why, when Hank calls out to his sleeping son Bob, do we hear him call “Bob!”? And why, once Bob awakens, is none of the subsequent dialogue sounded? What notion of sound directs the sound engineers’ choices? What cultural construction of sound justifies the sounding of some words, but not others? What prejudice about the way the new sound technology should be used governs the soundtrack of The First Auto? A rapid census of the parts of the film that are sounded offers an entirely unexpected answer to these questions. In addition to the opening scene’s “Come on Hank!” and “Hank wins!” we twice hear Hank call out to his son: “Bob!” Later, to start a race between Hank’s horse and a horseless carriage, we hear Mayor Robbins call out “Go!” In reaction to Barney Oldfield’s exploit, pushing his Ford to a mile a minute, one onlooker shouts “Hey!”

What do all these synch sounds have in common? They are certainly all speech events, but we can hardly call them dialogue. Every time human speech is chosen for synch sound treatment, the speech in question is short and loud, always calling for an exclamation point. Indeed, a careful inventory of the film’s sound effects reveals that they too are selected on the basis of their volume. Whereas relatively quiet sounds are rarely afforded synch sound treatment, every loud sound makes its mark on the soundtrack: crowds shouting, horses racing, automobiles speeding, whips cracking, plus shotgun blasts, car horns, cranks and backfires, as well as the impact of a stone breaking a windshield. If we were to write rules describing the treatment of sound in The First Auto, we would certainly have to include a rule governing volume and another governing duration. Rule number one would clearly state that only loud sounds are considered worthy of treatment by the newfangled Vitaphone system. Rule number two would identify punctual sounds—i.e. sounds with a virtually instantaneous attack and a rapid decay—as especially suitable for Vitaphoning.

Why this prejudice towards loud, punctual sounds? In order to understand this preference, we need to look no farther than the most common and culturally most important use of sound technology during the 1920s, as well as the labels attached to the most visible portion of contemporary sound systems. In fact, we still retain part of that terminology today. Sound is transmitted to our ears through devices known then and still often designated as LOUD speakers. This is hardly surprising, given the extent to which a substantial portion of Vitaphone technology derives from the microphones, amplifiers and loud-speakers designed by Bell Labs for their nationally famous post-war public address systems.

Contemporary understanding of new technologies depends heavily on existing practices. The technology may be new, but it is regularly used according to tried and true principles and practices. In the case of The First Auto it is quite clear that the sound crew understood the new system as an amplification device, i.e. as a technology with decided loud-speaking capabilities. Instead of attending to sounds of all sorts—dialogue, footfalls, bodily movement, and the like—the sound engineers of The First Auto thus used the new technology exclusively for what we might term “megaphone sounds,” sounds produced at a high volume and destined to be heard at a substantial distance. Just as early synch sound systems, from Gaumont’s Chronograph to Edison’s Kinetophone, were all aimed at reproducing vaudeville acts and the phonograph records derived from them, so The First Auto borrows its understanding of sound technology from familiar previous uses of that technology.

Additional understanding of The First Auto’s construction of sound technology is to be had from close listening to the delightful sequence featuring Squire Stebbins’ wild horseless carriage ride. Throughout Squire Stebbins’ wild ride, The First Auto continues the silent film musical strategy heard earlier with “In My Merry Oldsmobile.” For comic scenes we regularly hear an up-tempo popular song whose lyrics fit the situation especially well. In this case it is the 1913 hit song entitled “He’d Have To Get Under—Get Out And Get Under.” The lyrics say it all.

He’d have to get under—get out and get under—to fix his little machine

He was just dying to cuddle his queen

But ev’ry minute

When he’d begin it

He’d have to get under—get out and get under—then he’d get back at the wheel

A dozen times they’d start to hug and kiss

And then the darned old engine, it would miss

And then he’d have to get under—get out and get under—and fix up his automobile.

Indeed, the darned old engine does a lot of missing. In addition to the sounds of the creaky automobile body itself, we regularly hear backfires, semi-synch sounds designed to take full advantage of the new technology. Full advantage? Well, yes and no. Throughout this wild ride, every time we see Squire Stebbins’ car we hear the automobile’s characteristic sounds. But every time the car goes off-screen the car sounds disappear. Over the course of only eighty seconds the same pattern is repeated no fewer than eight times. As long as we see the car, we hear the car. But as soon as the car becomes invisible it becomes inaudible as well.

Throughout The First Auto, this rule is followed: if it’s not visible on the screen, then it doesn’t make a noise that deserves to be heard by the movie’s audience. Another particularly clear example may be found in the race between Hank’s mare and a horseless carriage. As long as the horse and sulky (or its driver) remain on the screen, we hear horses’ hooves. But when they disappear from the screen, they also disappear from the soundtrack. Similarly, the automobile is heard when it is visible, but it remains absent from the soundtrack when it is no longer in the image. Only once, in a long shot featuring both modes of locomotion simultaneously, do we hear both sounds at the same time. It becomes increasingly clear that sound is ineluctably tied to the image—cued by the image, we might say. In this film, sound doesn’t have or create its own space, because it exists only to the extent that the image calls it into being.

Before we leave The First Auto, it will be helpful to review the understanding of the new Vitaphone sound technology that it reflects.

  • Influenced by silent film accompaniment strategies, The First Auto regularly uses music as it might have been used by a pit orchestra just a few months before: dramatic moments are accompanied by wordless light classical music that is emotionally matched to the on-screen action, whereas comic moments are regularly accompanied by popular music whose lyrics offer commentary on the action at hand.

  • Influenced by the vaudeville sound effect strategies that had already been adopted by silent film drummers, The First Auto eschews direct recording of sound in favour of theatrical production of sound effects.

  • Influenced by the amplification associated with electronic technology, The First Auto systematically saves its sound system for megaphone sounds—loud, punctual sounds that fully justify the term LOUD speaker.

  • Influenced by the theatre, where actors most often abandon their right to be heard when they go off-stage, The First Auto never uses sound to create coherent and continuous sound space. Instead, it regularly restricts its sound to on-screen sources, only rarely acknowledging the existence of off-screen space.

The First Auto offers extremely useful insight into the construction of sound as Hollywood adopted new sound technology. Borrowing from a wide range of existing sound practices, The First Auto features a soundtrack that changes with virtually every scene. Variety is everywhere in this fascinating film, but continuity is nowhere to be found. In the years to come, some of The First Auto’s sound strategies would be adopted for the new medium, while others would be permanently pushed aside. In the space remaining, I will concentrate on a practice that would change considerably in the years after The First Auto.

As I have suggested, The First Auto does little to create space through the creative use of sound. Working in close harmony, the film’s image editor and sound mixer produced a soundtrack that is coherent in its insistence on image/sound matching, but that matching is produced at the cost of continuity. The film employs what we might call a shot-by-shot approach to sound, which would last for many years before being replaced by a scene-by-scene strategy. Consider, for example, this scene from the middle of Raoul Walsh’s The Big Trail, a late 1930 super-production employing the proprietary wide-screen process that William Fox called Grandeur. Led by John Wayne, the wagon train pushes its way west, crossing parched deserts, deep forests and raging rivers. One of the longest and most striking scenes details the pioneers’ descent to the base of an impressively tall cliff. One by one, the wagons and animals are lowered with ropes, sometimes with disastrous consequences.

The sound in this scene obeys an implicit rule just like the one governing The First Auto’s sound. When a wagon is on-screen, we hear the wagon. When characters engaged in dialogue occupy the screen, we hear their dialogue. What we don’t hear is whatever dialogue might be taking place while we see the wagons, or whatever sounds the wagons might be making while we’re hearing the dialogue. Look at the centre of this wide-screen image and you will be systematically satisfied by the image/sound connections, because we virtually always hear whatever sounds are associated with the screen’s “sweet spot.” Concentrate on the edges of the screen, however, and you’re likely to be frustrated. Even though sound-producing activity may be taking place on the margins, we never hear it.

The final twenty seconds of this scene offer an unexpectedly clear demonstration of this practice. The final shot of the sequence shows mounted riders crossing a river right next to the wagons and characters we have just been watching and hearing. Now, for the first time, the soundtrack features water sounds. A river? What? Right next to the scenes we’ve just been witnessing? Why didn’t we hear the river before? Viewers with eagle eyes may have spotted the river in the distance of the opening master shot. But not until the final shot of the scene do we actually hear the river. It is as if the microphone were subservient to the camera: point the camera at an object or character and you have a good chance of hearing that object or character. But any item not foregrounded by the camera stands little chance of being heard.

It is important to reflect on the logic implicit in this approach. I have noted before the extent to which this strategy operates on a shot-by-shot basis. Each shot has its own logic, independent of all other shots, either calling for sound or not. When a sound-making object goes off-screen, it disappears not only from the image, but also from the soundtrack. In the real world, we often hear things that we can’t see. In fact, that is one of sound’s great powers—the ability to represent the unseen. Indeed, our sense of place regularly depends on our ability to hear things that we can’t see. We don’t live on a shot-by-shot basis; we regularly hear things that are not immediately available to our eyes. Not in The First Auto, however, and not in The Big Trail, and for that matter not for the first few years of Hollywood’s sound film era.

To understand just what is at stake here, we need to consider how—through what sensory information—we manage to make sense of the space presented in this scene. We are never totally lost in this scene, because we begin it with an establishing shot that provides a context for subsequent larger scale shots. Whether we are looking at Wayne and his friend or focusing on the cliffs and wagons, we easily fit the objects of our gaze into a framework that has been provided by an opening shot that is so long and so wide that all necessary spatial connections are guaranteed. But there is no parallel treatment of the sound. The river sounds might have been featured from the start, thereby providing a sense of sound space that could serve throughout the scene. But that’s not the way sound worked at this point in film history. Early synch sound was characterized by a shot-by-shot strategy in which the image—and never the sound—would be responsible for anchoring us within the frame.

But this situation would not last forever. Though films made during the early thirties typically continued to handle sound in a shot-by-shot fashion, changes were on the way. During the early 1930s, a new approach was introduced that substantially modified the role of sound in constructing space. Perhaps the most representative film of the period, in terms of sound’s contribution to the creation of space, was Frank Capra’s It Happened One Night (1934). With sound overseen by Capra’s long-time soundman Ed Bernds,[1] this 1934 film offers an approach to sound that varies substantially from the approach used in The First Auto, in The Big Trail, and in most other films from Hollywood’s early sound years. It Happened One Night is the story of Ellie Andrews, the rich heiress played by Claudette Colbert, who runs away from her father and, eventually, into the arms of rebellious reporter Peter Warne, played by Clark Gable. Shortly after swimming away from her father’s yacht, Ellie prepares to sneak out of Miami on a bus, giving rise to the first of several scenes featuring a bus station.

We’ve all been taught that the archetypal Hollywood scene is presented through a familiar series of shots. First we get a long shot or even an extremely long shot, which because of its ability to locate objects and characters in relation to each other is typically called a “master shot” or an “establishing shot.” Thanks to the overall spatial knowledge secured by the master shot, our sense of space is not undermined by the subsequent series of tighter shots, even though they may be discontinuous. For example, we never have the least difficulty understanding the spatial relationship between the characters shown in a shot/reverse-shot sequence, because the establishing shot has already clearly defined their spatial relationship. This, as we have all learned, is how Hollywood handles—indeed masters—space. But it certainly is not the way Capra and Bernds configure space in this scene from It Happened One Night.

The first bus station scene begins with a close-up of a bus announcement, then continues through a series of spatially unrelated medium shots and medium close-ups featuring a bus announcer, Ellie Andrews, two goons tracking Ms. Andrews, an anonymous older lady buying a bus ticket, a series of men gathered around a phone booth, and eventually Peter Warne himself speaking on the phone, plus the editor on the other end of the line. Outside of a few short pans and tracks the images offer us precious little information regarding the spatial relationship between the various locations evoked. Where is the bus announcer in relation to Ellie? Where is the ticket window in relation to the telephone? Where is the telephone in relation to Ellie? Accustomed to being comforted by Hollywood’s master shot practices, we would be nothing short of discomfited by this bus station sequence if it weren’t for a new spatial strategy that has little to do with camera location and shot scale.

The images in this scene offer limited information about the spatial relationships that obtain between the sequence’s various close-up images. But the sound is a different story. From the opening image to the end of the scene, the sound provides non-stop continuity between what are otherwise discontinuous images. At every point along the way, we are comforted by the background bus station sound. We know how each image relates to the next not because we have been shown an establishing shot tying the various locations together, but because Capra and Bernds have furnished us with what we might reasonably call “establishing sound.” This establishing sound guarantees spatial continuity even when the image fails to relate one space to another.

Whereas the scenes from The First Auto and The Big Trail systematically handle sound in a shot-by-shot manner, It Happened One Night regularly treats sound according to an approach that we might call “scene-by-scene.” Each scene offers a soundtrack that is double. Up front we have discontinuous dialogue and sound effects, while the background features continuous semi-synch atmospheric sounds. When we cut to Gable’s editor, for example, the soundtrack informs us that we are right next door to a large room full of typewriters. Even though we never see that room, the continuous background sound tells us it is there. Just as a visual establishing shot guarantees the existence of space that is no longer fully visible in the subsequent tighter shots, so establishing sound animates space even when that space is currently invisible.

When Gable enters the bus we are treated to yet another aspect of this new approach to sound space. Before we can see him, a pillow salesman begins his patter off-screen: “A thousand miles is a long trip. Make yourself comfortable with a pleasant pillow.” After passing behind Gable the pillow salesman continues to be heard after he has left the image. A hallmark of this new scene-by-scene approach, the use of off-screen sound expands space, activating areas that the image fails to represent.

The First Auto’s proclivity to megaphone sounds, whether in dialogue or as sound effect, ensures that the film’s sound will all be up-front, produced by a foreground character or object and characterized by high-volume, low-reverb sounds. It Happened One Night operates in an entirely different manner. In this film there are always at least two sound planes. The foreground is dedicated to dialogue and narratively important sound effects, while the background provides atmospheric sound. Systematically, the foreground thus uses intermittent, live sound, characterized by a lack of reverb and close synchronization, while the background sound is continuous and regularly endowed with enough reverb to convince us that it emanates from an off-screen source. One further characteristic of this background sound is its tendency to be at best semi-synch in nature. That is, the sound is typically matched to its source in a general manner only. Instead of tight synch, background sound offers only generic synch, as when we hear the sound of a crowd or traffic noises. We know that the sound is coming from the crowd, or from the passing cars, but we are unable to match up specific sounds with particular sources.

It is important to notice one other essential difference between the shot-by-shot and scene-by-scene approaches to sound. When sound is handled in a shot-by-shot manner, with all sound foregrounded, the only existing space is the space located in the on-screen image. Treated scene-by-scene, however, according to the characteristic bi-level foreground/background approach, invisible space is regularly activated. The importance of this difference for diegesis creation and reinforcement would be hard to overestimate. Whereas the sound strategies used in The First Auto offer little support for a sense of diegetic coherence, the approach taken in It Happened One Night provides a non-stop guarantee of the existence and extent of the diegesis.

Throughout It Happened One Night, each successive scene offers a new set of background sounds appropriate for each specific location. This scene-by-scene strategy regularly deploys background semi-synch sound with substantial reverb, in order to guarantee sonic continuity between spaces with limited visual continuity. From the very start of each scene, the film offers sufficient establishing sound to carry viewers (who are also listeners) from one shot to another without ever sensing any discomfort. Sometimes the sound used to establish a coherent sound space involves crowd noise. At other times it is the bus sound that assures continuity. At the auto court, rain serves a similar purpose. Later, continuous sound space is guaranteed by the sounds of a stream, followed by the buzz of night insects. Thanks to regular deployment of establishing sound, the audience is never left to depend solely on the image to assure spatial continuity. Each new scene calls forth a new establishing sound, whose continuity throughout major portions of the scene lends unity and clarity.

It Happened One Night is by no means the first film to employ establishing sound. Several early 1930s films had already experimented with establishing sound, including Howard Hawks’ Scarface (1932), Alfred E. Green’s Baby Face (1933), Ernst Lubitsch’s Design for Living (1933) and Cecil B. DeMille’s Four Frightened People (1934). The Columbia films featuring the collaboration of Frank Capra and Ed Bernds are especially rich in the use of establishing sound, including Rain or Shine (1930), Platinum Blonde (1931), Forbidden (1932), The Bitter Tea of General Yen (1933) and Lady for a Day (1933).

In order to understand the structures and techniques that characterize the treatment of sound during the years following the introduction of synch sound into Hollywood production we need an appropriate range of analytical tools. Perhaps the notion of “establishing sound” will prove capable of contributing usefully to the tool kit that can be deployed to make sense of film sound. Similarly, the twin concepts of “shot-by-shot” and “scene-by-scene” treatment of sound offer further opportunities to analyze and describe the development of standard sound practices.