Dueling Senses: Hearing and Vision and How They Work Together

Brenton J. Burke

Comm 141 Term Project

April 19, 2001


Introduction: A Learned Connection?

     A gunshot rings out. A loud thud shakes the ground. Metal crunches into metal. The first thing we do when we hear any of these sounds is look in the direction of the sound. There is a definite connection between auditory sensations and visual sensations and we cannot seem to shake this connection.

     It would seem that the tendency to follow up aural cues with a visual sweep is innate. When we want to get the attention of a newborn, we talk to it. We snap our fingers or touch it while saying something to grab its attention. In what seems to be a natural movement, the newborn looks at the source of the sound to find an interested, babbling adult. Only when the newborn is looking at us do we know that we have its attention. Or do we?

     One could also say that babies learn from the earliest age that in order to interact that they must be looking in the direction of the adult. They receive the most positive reinforcement when they smile or coo in the general direction of the adult talking to them. This could be explained as a learned reaction to sounds rather than the result of understanding that someone is trying to communicate with them. However, it seems that when the child ages and begins to talk, they already know that they should look at the person they wish to talk to in order to get their attention. The original connection may be learned but that connection between sight and sound begins to allow for the transference of feelings and needs.

     All this is important because when the child matures further, this aspect of looking to the focus of attention becomes automatic. To keep attention focused, it is preferable to look at the source of auditory information. Take, for instance, the classroom. Children seem to learn best when they are focusing on the teacher visually. If something is drawing their eyes to the front of the room, chances are they will listen more closely to the sounds coming from that part of the room.

     Whatever the cause, the link between visual and aural stimulation does exist. This should be understood when discussing the visual nature of a particular culture. When theorists speak of a “visual culture,” they do not mean that there is a complete lack of audio but rather a cultural emphasis on the visual. If today’s global culture is truly visual as Daniel Chandler discusses in his essay “Biases of the Ear and Eye: 'Great Divide' Theories, Phonocentrism, Graphocentrism & Logocentrism,” this only means that people have sought to separate the visual from the audio and describe the preference of the pervading culture as “visual.” However, one cannot separate the senses as Chandler mentions, “it's only a categorical convention that we have five senses.” Sight and sound are linked in that they are both forms of information our environment sends to us. They do not lend themselves to separation easily without permanently disabling either sense.

Noisy Modern Life

     So having somewhat established that vision and hearing are intertwined in the way they help us experience our world, we now turn to the effect that the world has on our perception. It is no secret that every day we are surrounded by noise and inundated with busy images. Visual and audio sensations are plentiful everywhere we go.

     How does one deal with such noise? Vision offers a simple solution; look away. Sound, however, is much more difficult to deal with. No matter how hard we try, we cannot “hear away” from a sound. Sound will travel to us from every direction and we will hear it. To make matters worse (or better), we cannot process and interpret every one of these sounds. What happens is a process where sound is psychologically and selectively filtered out. The barrage of sound passes through a type of floodgate in the brain and our perception of our environment is simplified so that we may go about our day with the least possible amount of effortful processing.

     Hearing is not the only sense to use this process. Receptors on the skin and in the nose also become accustomed to a certain level of stimulation and readjust so that we can discriminate between, for example, noises in a noisy environment or smells in a pungent environment. One could also make a case for sight since vision relies heavily on the interpretation of signals in the occipital lobe of the brain. When the eyes send a signal to the brain, the brain often combines the last signals it received with the new signals. This could explain why when you look directly at a light and then look away, you still see the form of a light floating in mid-air. This could also explain why movement grabs our attention whereas in a familiar space we can forget that some sign or poster is present even though it is hanging right in front of us.

     The process isn’t as simple as that, though. As mentioned before, vision and hearing (and the other senses) are intertwined. Both vision and hearing are our best senses for experiencing objects over distances and both work together to give us an accurate picture of the world around us. It seems natural that sound should accentuate visual stimulation, making what we see a more rich experience. Can it also be said that looking at what we are hearing makes our audio experience clearer and more meaningful?

     Certainly: we can see this in experiments designed to test the interaction between sound and sight in our perception of different things.

Experiments in Seeing and Hearing

     Although psychologists do not study perception with the same theoretical goals as an audio artist might, their efforts can help us determine just how intertwined vision and hearing are as they work together to help us make sense of the loud and busy world around us.

     One such experiment investigated an interesting audio illusion called the McGurk Effect. Psychologists have found that when they show people a silent video of a person saying the syllable “ga” while the audio track simultaneously outputs the same person’s voice saying “ba,” the participants in the experience will usually believe that the video showed a person saying the syllable “da” when no such syllable was presented (McDonald et al., 2000). This illusory effect was found with several other syllable pairs and shows us the relationship between vision and recognizing speech. The relationship is so strong, in fact, that we will perceive a different sound than was actually said if the mouth movements do not match. Even more compelling is the fact that McDonald et al. (2000) ran another experiment where the video image was blurred to different degrees while the audio track remained clear. Participants continued to succumb to the McGurk Effect until the visual image was almost unrecognizable.

     What this means is that visual information is unconsciously and automatically used to interpret our aural experiences. We cannot control the processing of certain elements in our environment. Our sight will impose some characteristics on our aural experiences, especially our speech experiences. As put by Prof. Lawrence Rosenblum on his web site (http://www.psych.ucr.edu/avspeech/VSMcGurk.html), “integration of the discrepant audiovisual speech syllables is effortless and mandatory [as seen in the McGurk Effect].”

     Much research has been done on the effect of memorizing word lists in different modalities (i.e. senses). Turner et al. (1992) found that when presented with word lists in audio and visual formats, participants more easily recalled the audio lists in experimental situations with same-modality interference. Simply put, this means that when a person was presented with an audio list to remember while extra audio information was presented to them simultaneously, they remembered the audio list better than when a visual list was presented with extra visual information.

     This finding agrees with Chion’s hypothesis about the nature of audio; that it is temporal (p.10). We more easily remember audio lists regardless of interference (Glenberg et al., 1987) because we listen with a sense of expectation. In Turner et al.’s experiment, there was a fixed time between each presentation of a word on the audio list. Chion would say that each participant expected the next word to be presented and processed it regardless of the interference because we are more easily able to take advantage of the rhythm of the presentation with hearing.

     Turner’s experiment also showed another interesting result. About halfway through the list, participants remembered the contents of the list better when presented aurally with audio interference than when no interference was present. No such effect was found for visual lists. Chion would also agree with these findings because, according to him, “[v]isual and auditory perception are more disparate natures than one might think. The reason we are only dimly aware of this is that these two mutually influence one another…lending each other their respective properties by contamination and projection. (p.9)” Turner et al. originally set out to dispel the theory that there are two separate areas in the brain for visual and auditory processing of information into memory. However, Turner et al.’s results could not dispel this theory and in fact, by finding this strange mid-list effect, have helped to strengthen it.

     It seems that even though the McGurk Effect is so strong that people will hear a syllable that is not uttered vision and hearing are processed separately by the brain. Turner et al. and Chion are both representative of the current paradigm that asserts the idea that vision and hearing are more “disparate than most would think.” However, the fact that the McGurk Effect exists makes it possible for vision and hearing to be linked in processing before even the most basic attempt is made by the brain to attach meaning to what is heard. It is possible in the complex human brain that the initial processing of all basic sensory information is located in the same region of the brain and that processing that data for long-term storage is achieved through different processes in different parts of the brain.

 

Conclusion

     So where does this leave us? We’ve established that some processing of visual and auditory information must occur simultaneously and with “contamination” (Chion p.9) from the other. However, we’ve also established that it is likely that semantic processing of perceptual information occurs in different places in the brain or that the processes that the brain uses to store perceptual information are so different that auditory and visual memory have different properties in recall.

     The implications for noisy modern life are complex. If vision and hearing are innately connected then it seems that we are doomed to a kind of perceptual tunnel vision. Noises surround us in such a way that we cannot listen outside of our field of sight or face confusion and disorientation. We must exploit our vision as a welcome guide to our focused listening in order to dampen out all the sounds around us.

     If the processing of perceptual information is separate then we may have difficulty coping with a multimedia world where sound and vision often try to work together to present new ideas to us. The differing weaknesses of vision and hearing may make it difficult to learn something that requires both hearing and vision for proper processing into memory. For instance, people remember vision or sound, but rarely both together, from a movie or a television program. A person may remember lines from a movie and remember what the scene looks like when those lines are recited, but the visual element and the aural element seem to come from different parts of the brain. Remembering exactly what is said does not require recalling the visual aspects of the setting where the lines were recited. If the same problem were applied to a computer program or a classroom project, the divergent quality of perceptual memory may affect our ability to learn this new type of information from these multimedia sources.

This is just a guess, though. In everyday, hearing and vision work well together to give us an accurate picture of those things which are not within reach. Without one, the other is not as strong. The exact nature of the connection may be complex and different for every deeper level of processing sensory information goes through, but the connection is obvious. Those who might say that one or the other is superior are on the wrong track. Hearing affects what we look at while sight can influence what we hear and thus they will always work together to make our environment more simple and less confusing.

Bibliography

Chandler, Daniel. “Biases of the Ear and Eye” – http://www.aber.ac.uk/media/Documents/litoral/litoral1.html

 

Chion, Michel. Audio-Vision: Sound on Screen. Columbia University Press, NY. 1994.

 

Glenberg, A., Eberhardt, K., and Belden, T. (1987). The role of visual interference in producing the long-term modality effect. Memory & Cognition. 15(6), 504-510.

 

MacDonald, J., Andersen, S., and Bachmann, T. (2000). Hearing by eye: How much spatial degradation can be tolerated? Perception. 29(10), 1155-1168.

 

Rosenblum, Lawrence – Web Page on the McGurk Effect (with      demo) http://www.psych.ucr.edu/avspeech/VSMcGurk.html

 

Turner, M., Johnson, S., McNamera, D., and Engle, R. (1992). “Effects of same modality interference on immediate serial recall of auditory and visual information.” The Journal of General Psychology. 119(3), 247-263.