Test of Audio Transcription via Dragon Naturally Speaking by Diane Chapman
(SPHS)
Background
The need for looking at ways of providing accessibility to those people with
disabilities who take our online courses has arisen. Much of the content in
our online courses consists of PowerPoint presentations synched with streaming
audio (aka tutorials). As a result, we needed to test the ability of a voice
recognition program to transcribe previously recorded audio. Following are the
results of one such test.
Procedure
I had recorded a short (18 slide) PowerPoint presentation earlier this semester
and had it made into a tutorial. To perform this test I needed a copy of the
audio of this presentation. Since the audio is cut up during the editing process
to synch the slides, the only audio of the full presentation I found was a Real
media version of the unedited recording. This is the recording that was used
to perform this test.
I also used Dragon naturally speaking as the voice recognition program The specs
are Naturally Speaking Preferred, version 5.
Preparation of the Audio
Dragon Naturally Speaking (DNS) has a function for transcribing audio, but
the audio must be either a DNS or .wav file. Since the only file I had was
in real media (RM), I had play the (RM) file and rerecord it as a .wav file.
The result was a .wav file (24,497 kb).
Training the Voice Recognition Program
In order to transcribe an audio file, DNS must be trained to recognize the
particulars of a person's voice. DNS provides a series of specific readings
that must be read into the program via a microphone. The more reading that
are done, the better DNS will be at distinguishing words correctly.
The process for this test was to train DNS four time and to assess the
correctness of the transcription after each time. Each training is supposed
to improve the accuracy of DNS. Training DNS for my voice took time. The
time for each training was as follows (also noted are the specific texts
that were read at each training):
Training 1: 10 minutes (Talking to Your Computer)
Training 2: 15 minutes (Alice's Adventures in Wonderland)
Training 3: 5 minutes (Talking to Your Computer)
Training 4: 7 minutes (What's Plain English)
Transcription of the Audio
Because the audio recording was unedited, there are several places where
slides were redone and extra audio was captured. All of the test results
have been edited (using correction tracking in MS Word) so that you can
see exactly where the problems are.
I set myself up as a user and trained my voice for the first time (taking
about 10 minutes). The results of this test are saved in DDC_test1_edited.doc.
I trained my voice for the second time (taking about 15 minutes). The results
of this test are saved in DDC_test2.doc.
I trained my voice for the third time (taking about 5 minutes). The results of this test are saved in DDC_test3.doc.
I trained my voice for the fourth time (taking about 7minutes). The results of this test are saved in DDC_test4.doc.
Test Results
The edited transcripts from all four tests were reviewed and the number
of errors counted. The results are as follows:
Test 1: 75 errors (See DDC_test1_edited.doc)
Test 2: 87 errors (See DDC_test2_edited.doc)
Test 3: 75 errors (See DDC_test3_edited.doc)
Test 4: 92 errors (See DDC_test4_edited.doc)
Special Problems & Issues:
By performing this test, I have found out several pieces of information that
are important.
As a result of these limitations, I do not think that the transcription function
in Dragon Naturally Speaking will fit with our needs for audio transcripts.
It is a very labor-intensive process and requires skilled manpower already in
short supply. However, it may be useful for those giving presentations to use
it to prepare their own scripts prior to recording in the audio booth. They
could give their lecture at their desktop, edit it and give us a copy when they
give us their PowerPoint presentation.