Sound and Electronic Literature: Locating the Text in the Act of Listening

by John F. Barber

Author's Note

This essay evolved from my presentation at the Electronic Literature Organization 2013 conference, hosted by the Laboratoire Paragraphe and the EnsAD (Ecole nationale supérieure des Arts Décoratifs), 24-27 September 2013, Paris, France. My presentation was titled "Internet Radio and Electronic Literature: Locating the Text in the Act of Listening." In May 2014, an expanded version of my presentation was published in Electronic Book Review (ebr). In that essay, I positioned Internet radio and social audio networks as change agents for electronic literature. This present essay carries forward the core elements of my original presentation, but sets aside the expanded discussion of Internet radio and social audio networks included in the original ebr publication, focusing instead on the central nature of sound(s) to literature and literary endeavors, and thus by extension, to electronic literature. My thanks to ebr for granting me permission to use portions of my essay for this collection. My thanks also to Dr. Dene Grigar, her staff, colleagues, and associated researchers at her Electronic Literature Laboratory at Washington State University Vancouver for collecting and providing these conference proceedings.


Responding to the conference theme of locating the text, this essay suggests sound(s) can provide a valid literary experience and might be considered, like reading and writing, fundamental in the digital narratives of electronic literature. Specifically, sound is positioned as an essential element of electronic literature, as providing a fundamental basis for narrative, the heart of every literary experience, and thus locating the text not (solely?) in the acts of reading and writing, but also in the act of listening.


"Beyond what fascinates your ear today is something else, incessantly and obdurately present, although you cannot or do not hear it yet—but whoever hears it first has a good chance of inheriting the future" (R. Murray Schafer 39).

Electronic literature is defined by the Electronic Literature Organization (ELO) as "works with important literary aspects that take advantage of the capabilities and contexts provided by the stand-alone or networked computer." Such works might be "born digital" (created explicitly for and only able to be experienced in a computer-mediated context) or remediated from print to pixel. This "confrontation with technology," and the process-intensive aspects of the artifacts, is what distinguishes electronic literature from the migration of print to various digitized versions by authors seeking to "go digital" (ELO).

Given this definition, one might assume electronic literature as broadly incorporating digital media. With regard to graphics (images, video, and animation) this might be true. With sound, the story is different. As Dene Grigar notes, the majority of sound(s) included in works of electronic literature provide only background, context, or affirmation of interaction with the text (Grigar 2006).

A search of the ELO website seems to bear out Grigar's findings. Seven pages of results were returned in response to my query for "sound," including, from the first two pages alone, "sound bites," "phonetic sound," "animation/film/Flash/image(s)/poetic fragments/prose narrative and sound," "layers of sound," "alphabetic letter sounds," "continual sound," "response sound(s)," "graphic narrative along with sound,"  "background sound," "sound effects," and "voice and sound."

These responses seem to continue a complex interplay between sound and visuals in screen art so as to maintain the illusion and/or reality of a three-dimensional visual space where the spectators' gaze might be focused on interacting (reading the visual signs) with text (the use of visual signs to represent complex or abstract ideas).
Again from the ELO website, "electronic literature often intersects with conceptual and sound arts, but reading and writing remain central to the literary arts. These activities, unbound from pages and the printed book, now move freely" through a number of different venues. As a result, "electronic literature does not reside in any single medium or institution." In short, the ELO seems to argue text is located in the acts of reading and writing. Sound merely augments these literary acts.
Why is this the case? In a 1992 Modern Language Association conference paper, Charles Bernstein suggests an answer. He says a focus on one particular aspect within any frame of reference diverts attention from others. He calls this situation "frame lock," based on Erving Goffman's "frame analysis."
Bernstein says Goffman calls the overlooked features the "disattend track" and notes, "within text-bound literary studies, the disattend track may include such features as the visual representation of the language as well as its acoustic structure" (Bernstein).
I suggest that we might substitute sound for Bernstein's term "acoustic structure."
Kenneth Sherwood, in a presentation delivered at the 2008 ELO conference in Vancouver, Washington, entitled "From Audio Black to Artful Noises: Looking at Sound in Electronic Literature," suggests several disattend tracks within the various forms of electronic literature then archived by the ELO. These include "the meditation on listening and indeterminacy of Stuart Moulthrop's Radio Salience and [Reiner] Strasser and [Alan] Sondheim's 'Dawn'; the foregrounding of sound-track in Young-Hae Chang's pseudo-filmic flash poems, the adoption of 'edit to the beat' techniques of MTV and television commercials in [Giselle] Beiguelman's Code Movie 1; the privileging of audio in the remix rhythms in Babel [Chris Joseph] and Esha's Urbanalities; the witty, instrumental score for the kinetic word ballet of [Robert] Kendall's Faith; the user-driven audio collages of [Maria] Mencia's Birds Singing Other Birds' Songs and [Jim] Andrew's Nio; the triggered, synthetic sound of [Damien Everett and Melinda] Rackham's carrier (becoming symborg); and the ambient drone and crackle accompanying [Jenny Weight] Geniwate's [and Brian Kim Stefan's] Generative Poetry" (Sherwood 2008).
As Sherwood notes, sound is an important component of several works of electronic literature. And, to be fair, future works of electronic literature may use sound(s) as a central narrative element. But, currently, while digital media technologies have increased forms and opportunities for electronic literature, sound is, arguably, frequently overlooked for the visual appeal of generative text. This essay suggests sound(s) can provide a valid literary experience and might be considered, like reading and writing, fundamental in the digital narratives of electronic literature. Specifically, sound (from a variety of sources, environmental, mechanical, soundscapes, and human vocalization) provides the basis for narrative, the heart of every literary experience. Sound(s) might form the basis for new works of electronic literature.
In discussing these points, I first explore sound as the basis of literary experience, with speech being the oldest of mediums and subsumed as the content of later writing and (through printing) reading. So, although the ELO definition of electronic literature seems to predispose reading and writing, we are, though these literary activities, channeling sound(s) that provide narrative frameworks. Next, I discuss locating the text in sound. In conclusion, answering the conference theme—Chercher le Texte: Locating the text in electronic literature—I suggest that we can locate the text not (solely?) in the acts of reading and writing, but also in the act of listening.

Sound as the Basis of Literary Experience

N. Katherine Hayles, in Writing Machines, notes the examination of book culture has focused primarily on the experience of reading, rarely on the physicality of the artifact being read, or the culture(s) of reading itself. In support of this contention, even a basic perusal of literary history and/or theory suggests that following the advent of mass publication technologies, literature evolved primarily as a silent, solitary, visual experience, a personal relationship between the reader and the immersive, virtual reality of an imaginary world evoked by words printed and preserved on the pages of books. The literary experience is characterized predominately by reading and writing, replacing the aural with the visual as the primary sensory input. What has been lost of the literary experience by shunting the aural to the sidelines?
Beginning with publication of The Mechanical Bride in 1951 and continuing to his death in 1984, Canadian communications theorist Marshall McLuhan developed an intricate taxonomy of media and their effects, reaching back to humankind's origins for comparisons between pre-literate and electric communications, always calling attention to the fact that the medium matters to our experience of the message.
For example, McLuhan, with his son and collaborator, Eric, described two spaces, acoustic and visual, in which humankind has contextualized itself with different results. "Acoustic space . . . is spherical, discontinuous, non-homogeneous, resonant, and dynamic. Visual space is structured as static, abstract figure minus a ground. Acoustic space is a flux in which figure and ground rub against and  transform each other" (McLuhan and McLuhan Laws of Media, 33).
The McLuhans expanded the terms figure and ground, both coined by psychologist and phenomenologist Edgar Rubin in 1915, to explore visual perception. By figure, the McLuhans mean any object rising from or receding into ground. Ground is surface, configurational and comprised of all available figures (McLuhan and McLuhan Laws of Media 5). As Thomas MacFarlane explains, ground is subliminal, always beyond perception except through analysis of emerging and receding figures (McFarlane 62).
With acoustic space, McLuhan suggests expansive, unseen possibilities, a world awash in sounds. Pre-literate humankind, the only ever to live in acoustic space, relied on sound as their predominant sensory input. Sound formed the basis for humankind's explanations of and interactions with the surrounding physical world. As summarized by Paul Levinson, with aural information emerging from all directions, and with no opportunity to shut off or organize the constant stream of sound, pre-literate humankind perceived its world as both surrounding and inclusive, a permeable extension of itself, and they of it (Levinson 1999, 5-6).
In acoustic space, filled with environmental sounds, the emergence of speech technology allowed the communication of abstract thought. Storytellers wove the sounds of speech into narratives that helped explain the presence and purpose of humankind, their situation and agency. Speech provided a means to preserve and share cultural histories and memories.
McLuhan argued that alphabets and writing visualized, preserved, and extended the aural nature of speech. With writing, the speaker's voice became visible, replaced with symbols representing sounds. With printing and distribution of texts, humankind was encouraged to see and read (literally and figuratively) the world as a series of discrete pieces, strung like beads on a linear continuum running from the past, through the present, toward the future.
McLuhan said evolving forms of electric media—primarily television as computer technologies were then nascent—extended the human nervous system, abolished time and space, and imploded divisions between formally diverse peoples and cultural issues.
As result, the world, McLuhan said, shrunk to village size. He saw possibilities for far-flung citizens, through electric interdependence, to live once again, as in earlier oral contexts, in the context of a global village (McLuhan The Gutenberg Galaxy, 31). Within this global village, issues and peoples are no longer separate, or unrelated. In a global village, people share information simultaneously. The global village is "a brand-new world of allatonceness [all-at-once-ness; everything happens at the same time] . . . a simultaneous happening. We have begun again to structure the primordial feeling, the tribal emotions from which a few centuries of literacy divorced us" (McLuhan The Medium Is the Massage, 63).
Unfortunately, the rise of television promoted an increased focus on visualization, not the return of community orality. The introduction of the Music Television Network (MTV) in August 1981 solidified this point when the first music video broadcast was the 1979 debut song "Video Killed the Radio Star" by The Buggles, a new wave band from England. One heard the song lyrics, of course, but mostly as they augmented the visual spectacle of the band's performance.
An upshot, as argued by Leigh Eric Schmidt, is "a hierarchy of the senses, with sight vastly ennobled and hearing sharply diminished" (Schmidt 48) "deeply ingrained in Western religious and philosophical traditions" (Schmidt 43). This results in "a marked dichotomy between eye and ear cultures that has commonly drawn on radicalized constructions of Western rationality and ecstatic primitivism" (Schmidt 48)—most notably the work of Walter Ong and Marshall McLuhan.
Later, with the advent of digital media, one might have thought the opportunities for digitizing and then combining, remixing, and remediating all forms of content, including sound, would promote a returned emphasis on sound.
Unfortunately, this does not seem the case. According to Hayles, the first generation of electronic literature, texts by George Landow, Jay David Bolter, Michael Joyce, and others, focused primarily on the hyperlinks between chunks/screens/lexia of text. These early applications of hypertext theory and the Storyspace interface, she says, despite providing multiple reading paths, preserved a basic print-centric conception by locating the text—with its subsumed voice(s)—in a series of screen views (Hayles 27).
Second generation electronic literature, with a rich diversity of interfaces and programming languages, experimented with linking narrative with concepts like perspective, access, determinability, transience, dynamics, and user functions. The result, says Hayles, was the emergence of two camps, hypertext and cybertext—so named from emphasis on computational nature and combinatorial strategies—each seeming more concerned with arguing theoretical stances than exploring the materiality of the literary artifacts produced (Hayles 28).

Locating the Text in Sound

For McLuhan, the "content" of any medium is always another, older medium (McLuhan Understanding Media, 23-24). Extrapolating on this point, Levinson says speech, the oldest medium and the most prevalent form of human communication, with its origins in abstract thought, claims a presence in most all media that follow (Levinson 1981). As James O'Donnell notes, "the manuscript was first conceived to be no more than a prompt-script for the spoken word, a place to look to find out what to say. . . . to produce the audible word" (O'Donnell 54).
Beyond the audible word are other sounds that comprise the soundscape, a term coined by R. Murray Schafer to denote the auditory terrain in its entirety of overlapping noises, sounds, and human melodies (Schafer 1977, 1993).
A soundscape, Schafer says, is not a flat terrain that can be mapped, but rather a fluid field changed with the introduction of each new sound. Sound provides a place in which embodied social and cultural traces can be carried, often without the awareness of their bearers.
Michael Vincent says we can hear literary, even musical events in soundscapes. For example, the overlapping vocalizations, mechanical, and environmental sounds in a restaurant can be heard as "spoken word choral performances." Likewise, the hushed tones of conversation prior to the start of a movie are "akin to the tuning of an orchestra before an evening performance" (Vincent 59).
Soundscapes are, therefore, stages for human invention and/or interaction, according to Stephen Feld. With soundscapes, he says,
sound both emanates from and penetrates bodies; this reciprocity of reflection and absorption is a creative means of orientation—one that tunes bodies to places and times through their sounding potential. Hearing and producing sound are thus embodied competencies that situate actors and their agency in particular historical worlds. These competencies contribute to their distinct and shared ways of being human; they contribute to possibilities for and realizations of authority, understanding, reflexivity, compassion, and identity (Feld 226).
Building on this idea, Don Ihde says, "In the most general terms, auditory imagination as a whole displays the same generic possibilities as the full imaginative mode of experience. Within the active imaginative mode of experience lies the full range from sedimented memories to wildest fantasy. . . . Within the range of the imaginative, auditory imagination may accompany other dimensional presentifications [sic]" (Ihde 61-64).
Ihde suggests that between the imaginative and perceptual modes of experience there are "distances and perceptions" of "copresence," a dual polyphony of perceived and imagined sound. There is, in auditory imagination, "the possibility of a synthesis of imagined and perceived sound.” These distances and perceptions can create the sense of an "echo" between, or because of the alternation between perceived and imaginative sounds (Ihde 61-64).
Two examples to illustrate Ihde's point. First, the CD-ROM game Myst (1993) achieves much of its immersive power through sophisticated sound design. Each level and/or world is characterized by specific ambient sound(s), wind through the trees, lapping waves, machinery, and more. These sounds accentuate and reinforce the reality of the illusionary experience and they promote sound-based exploration and/or way-finding in the various worlds of Myst.
We have further experience with this affordance of sound in other game play contexts. Janet Murray points to objects producing specific sounds when manipulated correctly or not, and music tracks responding to mouse movements as examples (Murray 53).

Why Sound?

For McLuhan, each new medium incorporates what it replaces and/or extends (McLuhan Understanding Media, 23-24). Speech, the expression of thoughts and feelings by articulate sounds, for example, incorporates abstract thought, and extends its ability to explain and/or characterize human agency and situation. Examples include spoken narrative, storytelling, drama, and literature. Each, at their basis, signifies, to depart from Shakespeare, something significant.
Joseph Campbell documented the reenactment of myths in the form of ritualistic participatory drama, often involving narrative, music, and/or other sound sources, by cultures around the world. Playwright David Mamet positions drama as an essential endeavor of humanity when he argues drama is the nature of human perception, "and it is a human need to construct, or have constructed for us, narratives," three-act dramas [thesis, antithesis, and synthesis 66] about our lives that "order the universe into a comprehensible form" (Mamet 8).
Writing (and printing and reading) incorporated (replaced) speech and extended its reach beyond the transmission range of the human voice. So, at the basis of writing and reading, we find speech (sounds) and can retrieve some of the prominence of oral myth, ritual, and participatory drama.
The key to such narratives is listening. Sound artist Francisco López suggests, "profound listening," to denote listening without constraints in order to explore and affirm all the information inside any sound (López 82-83).
Michael Bull and Les Back promote "deep listening" as a way of attuning our ears to the multiple layers of meaning potentially embedded in any sound. Deep listening, they say, also involves "practices of dialogue and procedures for investigation, transposition and interpretation" (Bull and Back 3-4).
Specifically, Bull and Back argue thatBy moving "into sound," by actively and deeply listening to the sounds of the world in which we live, we open new ways of thinking about and appreciating the social experience, memory, time, and place—the auditory culture—of sound (Bull and Back 16).
Bruce R. Smith says sound, as an object of study, is important, for knowing the world through sound is fundamentally different from knowing the world through vision (Smith 129). As noted previously, Stephen Feld positions sound "as a modality of knowing and being in the world" (Feld 226).


We understand from Marshall McLuhan that the artist and/or creative person is always to be found on the forefront of any new technology, experimenting with and enhancing awareness and participation with its affordances, and thus extending the capabilities and understanding of humankind (McLuhan Understanding Media).
McLuhan foresaw a transition, fostered by electric media, into a post-literate present where the individual viewpoint created by phonetic literacy and print was obsolesced by the collaborative and collective identity of a global village.
In the now-presence foreseen by McLuhan, the opportunities afforded by digital media for combining, remixing, and remediating all forms of content, including sound, may predict a return to an acoustic space characterized by what Edmund Carpenter calls the verbal, musical, and poetic traces and fragments of oral culture (Carpenter 1970).
This acoustic space provides both a model and a context where, understanding the primacy of sound in human narrative, we may reconsider sound as a basis for engagement with emerging forms of electronic literature.
In conclusion, I promote the idea that sound(s)—environmental, mechanical, soundscapes, and human vocalization—provide the basis for narrative spaces and experiences. Rather than augmentation, sound(s), carefully considered and utilized for their ability to convey cultural information, can be the basis for new works of electronic literature that provide imaginative, thought provoking, and participatory experiences.
And, as suggested by Michael Bull and Les Back, understanding the primacy of sound(s) in human narrative, and engaging with sound(s) through active, careful, or deep listening can also promote new ways of engaging with the world in which we live.
The desired upshot is to promote deep, rich, engaging, and immersive sound-based literary experiences that locate the text not (solely?) in the acts of reading and writing, but also in the act of listening.

