Evolution of Audio Recording in Field Surveys: Abstract
Evolution of Audio Recording in Field Surveys: Abstract
Use of CARI on several national surveys has provided From the marketing of the Dictaphone in 1907 (Nuance
production experience to bolster laboratory tests. This Communications 2005) to the availability of miniature
article reviews the progress of CARI technology in the recorders embedded in portable electronic devices today
years since it was introduced, with an emphasis on (Dwyer et al,1998), people have been discovering ways
feasibility for routine use with field surveys. to take advantage of audio recording tools to capture
voices for later review. While the early acoustic
Key Words: survey technology; audio recording; recorders proved helpful for journalistic interviews, they
computer audio-recorded interviewing (CARI); sound were not usable for large-scale research surveys; the
file; quality assurance; performance management; field introduction of cassette tapes improved convenience for
interview; in-person interview interviewing (Stockdale, 2002).
Many audio file formats have been developed over the CODECS (compression–decompression techniques)
years, and their sheer variety may seem baffling to the were developed for use with audio recording, to reduce
new observer. Recent attention has been given to mp3 the size of sound files. It is possible for audio recording
(Motion Picture Group Audio Layer 3) format, but to combine the digitizing process and compression at
many other file formats exist as well. A few of the once. For use in surveys, the system designer can
common formats are listed in Figure 3. choose among simple recording with no compression,
simultaneous recording and compression or recording
Microsoft Windows operating systems include Sound followed by compression. Section 11 discusses these
Recorder software which writes to the wave file format, approaches in a comparison of post-recording
and the Windows Media Player which can play back compression with simultaneous recording and
wave files and a number of other non-proprietary compression
formats. The PCM (pulse code modulation) digital
recording algorithm is used in various encoders 7. Integrating Audio Recording with Survey
including Sound Recorder, and records uncompressed Software
sound with no required licensing.
A variety of technologies have been in use to implement
Figure 3. Common audio file formats survey instruments, such as Blaise (Statistics
Name File Use Netherlands), CASES (University of California,
Extension Berkeley) and web-based technologies like ASP.NET
Wave .wav Windows (Microsoft). Audio recording components have been
uncompressed successfully incorporated in all these environments.
One of the challenges of incorporating audio recording
MP3 .mp3 Compressed audio
in a survey instrument is to make the process
RealMedia .rm Compressed audio
unnoticeable to the interviewer. The recording process
RealAudio .ra Compressed, for
must not slow the system or provide any visual or
streaming audio
audible clue as to when it starts and stops.
AIFF .aiff Macintosh default
uncompressed Audio recording can be added to Blaise instruments by
CD Audio .cda Music CD tracks using either of two programming approaches. One
Active Streaming .asf Streaming audio approach uses a Blaise procedure which in turn invokes
Format an external application to start and stop the recorder.
Using this approach requires complex programming
Wave files are not especially efficient at storage, but the within Blaise in every place the recording application
recording process places little demand on the computer. needs to be invoked, to keep track of whether recording
The size of a particular wave file depends on the is already in progress or needs to be started or stopped
recording parameters selected in its creation. For each (Thissen and Rodriguez, IBUC 2004).
available audio file format, there is a choice of sampling
rate, bandwidth, number of channels and other The second approach uses the Blaise alien router.
parameters. For RTI’s current CARI system, the Starting with version 4.6, Blaise introduced the alien
standard configuration is 16 bit bandwidth, 11.25 KHz router as part of the Blaise component pack. The alien
router technology allows the invocation of an external dialog. (See Section 11 below for a comparison of
component before and after every survey item. Use of recording parameters and file sizes.) Assuming an
the alien router externalizes the complexities of tracking instrument were programmed to collect three one-
the recorder state. It also opens up the possibility of minute recordings which were compressed to 100KB
maintaining a text list of items to be recorded, external each, the case management system would have 300KB
to the instrument. This reduces the complexity of to transmit for every case. If the interviewer transmits
instrument programming and allows easy modification one case each day, these files can be sent using a dialup
of the list of items to be recorded, without any need to connection. The use of broadband allows transmitting a
modify the data model or recompile the instrument larger number of files or larger size files at a faster rate.
(Thissen and Sattaluri, 2006b).
The third option, using removable external media and
For CASES instruments the recording can be integrated shipment, can be used when entire interviews or lengthy
by spawning a separate application to start and stop an sections are recorded. However, security concerns, the
external recorder (Wrenn-Yorker and Thissen, 2005). effort of handling external media and the possibility of
loss make this approach less desirable than automatic
When a survey is offered in multiple modes by using a transmission via dialup or broadband. Still, it may
web-based instrument, field interviewing may take prove useful when recording interviews in their entirety
place through a website running on the laptop without or when other forms of file transfer are not available.
continuous connection to the internet. In that case, the
audio recording component can be achieved by Audio recordings may contain personal identifying
installing a client side Java applet and Java scripting, information, whether by intention or by accident, and so
similar to the way in which CARI can be implemented it is important to protect these files by using encryption
for internet-based surveys (Suresh, 2005) tools while they reside in any location accessible to
unauthorized individuals. In addition, if files are
Once a survey instrument has been enabled with CARI transferred over the internet, secure socket layer (SSL)
technology, survey information systems (Thissen, 2004) certification can be used, which provides a way to
must also be expanded to handle the audio data files. encrypt the data stream during transmission.
From a case management and data security perspective,
CARI files are no more than response data stored in a 10. CARI Monitoring
different format. Issues and concerns are the same for
files containing audio response data as they are for files After audio files are received at a central location, the
of textual responses. File protection on the laptop, monitoring process may be as simple as opening up the
transmission to a central site, central storage, access by files using a free player tool like Windows Media Player
authorized researchers and eventual deletion must all be or Real Player. However, since manual case
planned with the same security and confidentiality used management is impractical for all but the smallest of
for traditional response files. surveys, it is best to build a system that provides an
interface for reviewing the files and a database for
9. Transmission recording evaluations.
There are several options for transferring audio files The monitoring system might be a client-server
from the field laptop to a central management system. application or a browser-based application located on an
The files can be sent using dialup transmission, internal or external network. Client-server applications
broadband, or removable media like flash drives restrict access to an organization’s internal network and
shipped by secure delivery methods. For small surveys, locally-located users, due to poor performance of
it may be practical to leave audio files on the laptops database connections over long distance. A web-based
until the end of data collection. With the pervasiveness approach has advantage of being available from any
of broadband access at homes through cable modem or workstation which has access to the network, supporting
DSL (digital subscriber line telephone service), the organizations with review staff distributed nationally or
capacity for transmitting large files has greatly even internationally (Thissen and Sattaluri, 2006a).
increased. Still, researchers must plan for transmission
when using CARI, since audio files can be large. Regardless of the implementation, it should provide
role-based access to protect the security of the
The choice of transmission option may depend on the information stored in the audio files. For example,
size of files being transmitted. It is found that three levels of access might be designed into the system:
uncompressed audio recording consumes about one • CARI monitoring staff, who listen to and
megabyte of disk space for each minute of recorded evaluate audio files
• Supervisory staff, who designate monitors, At RTI, files are recorded with the Windows native
manage caseloads and track review-completion Sound Recorder software called from Blaise or CASES,
status resulting in file sizes of about one MB/minute
• System administrators who configure new uncompressed. Use of the LAME (The LAME Project)
surveys and create new logins and passwords. open source compression algorithm and appropriate
parameters yields an average compression ratio of
For large surveys, the system may also include an approximately 11:1 without loss of audio quality,
algorithm to select a specified percentage of files to be resulting in about 100KB files for one minute of audio.
reviewed per interviewer. Ideally, it would offer the
flexibility to adjust review rates for any field Figure 5. File sizes obtained by concurrent recording
interviewer for any active survey, so that quality and compression
assurance personnel can increase monitoring of any Input Number Average
interviewer who has been suspected of improper data CODEC Sound Of Files MB/Min
collection practices. (Hartman et al, 2006) Quality Tested
MPEGRec Low 4 0.98
11. Audio and Operational Results MPEGRec Mod 3 1.68
MPEGRec V.High, 2 0.96
In this section, we present some results of RTI’s Mono
experience with CARI technology. The data given MPEGRec V.High, 1 1.80
below were obtained by lab test, field test and Stereo
production survey use of CARI processes. RealMedia Low 24 0.34
RealMedia Mod 3 0.51
A comparison of recording alternatives is shown in RealMedia V.High, 2 0.34
Figure 4, with an indication of the resulting playback Mono
sound quality. The column labeled “MB Per Min” lists RealMedia V.High, 1 0.47
the number of megabytes of storage required for one Stereo
minute of sound when using the uncompressed wave
file format. Similar patterns of relative file size can be In another experiment, we recorded sound directly to a
found for other file formats. compressed format, without intervening storage as a
wave file. In a CARI system, this requires the
Figure 4. Recording parameters instrument to call a specific recording application and
Band- Sampling Chan- Sound MB Per CODEC, such as MPEGRec (mp3), producing a
width nels Quality Min compact file that is ready to encrypt and transmit. The
8 bit 11.25 KHz 1 Low 0.66 simplicity of this approach was attractive because
16 bit 11.25 KHz 1 Medium 1.31 compression was immediate and effective, as shown in
8 bit 22.5 KHz 1 Medium 1.79 Figure 5. On the down side, simultaneous compression
16 bit 22.5 KHz 1 High 1.19 and recording tax the computer’s processing power.
16 bit 44.1 KHz 1 Very 5.25 This reduces system performance, produces lag and
High visible indication of recording processes, and limits its
16 bit 44.1 KHz 2 Very 12.3 usefulness.
High
Figure 6. Loudness Effect on File Size
We have looked at alternative processes for File Sound Averaged MB Per
compressing existing audio files. A wave file was Format Level Over # of Minute
compressed as a separate step after recording, using a Files
specific CODEC and selected recording parameters. In Wave Silent 6 1.30
terms of a CARI system, this process might be Wave Quiet voice 9 1.31
performed by the case management system after the Wave Voice 6 1.32
interview was completed but prior to transmission. MP3 Silent 6 0.97
Using this approach, compression ratios ranged from a MP3 Quiet voice 8 0.96
factor of 2 to 75. In general, if the recording was of MP3 Voice 6 0.97
very high fidelity stereo, the original file would be very RM Silent 6 0.34
large and compress greatly. Lowering the recording RM Quiet voice 8 0.34
quality produces a smaller file originally but RM Voice 6 0.34
proportionally less compression.
We tested whether loudness had any effect on the size 8 shows the distribution of field performance problems
of the recorded output file by looking at the level of found in one study after review of approximately 5600
sound in audio files compared to file size, for CARI interviews. A single case might be assigned multiple
files which where all recorded under identical problem codes, and so the problem count total is greater
configuration settings on the same laptop. Figure 6 than the number of affected cases (Wrenn-Yorker and
shows the results of the comparison, demonstrating that Thissen, FedCASIC, 2005).
there was no apparent effect of loudness on audio file
size. In general, field interviewers and respondents have been
accepting of the technology. In a feedback study, 82%
The quality of the sound files from the field is of of interviewers felt neutral or positive about use of
interest, as an indicator of the feasibility of gathering CARI and a post-interview survey of 283 respondents
information for large numbers of interviews. Figure 7 found that 70% of the respondents reported they had no
shows results from reviewing a sample of 11% of the reaction one way or the other, 15% reported liking the
first 1500 completed interviews from a survey. The idea, while 13% disliked the idea (Herget et al, 2005).
asterisk (*) indicates that the default rating was chosen, As noted above, assent to CARI by respondents ranged
as opposed to an explicitly-defined score. Rating the from around 83% in one survey to 93% in another. This
file quality rating was optional through the monitoring assent was independent of consent to conduct the
interface if the quality was acceptable for review interview (Wrenn-Yorker and Thissen, 2005).
(Hartman et al, 2006).
A small experiment was conducted to determine the
Figure 7. CARI sound file quality distribution minimum number of CARI audio files required for
Sound Quality Number of Interviews making consistent monitoring evaluations, that is, how
1 – Poor 4 many audio files were required before reaching a point
2 – Passable 5 where listening to additional audio files for an interview
3 – Adequate 21 had no effect on the determinations. This work
* – Acceptable 48 suggested that three audio files each of 30-second
4 – Good 49 duration may be adequate for verification purposes.
5 – Excellent 37 After review of three files, CARI monitors reached 97%
agreement with the ratings found by review of five files,
Problems noted with audio files included background indicating that three files provide sufficient information
noise, static, faintness of voices, key tapping, hum and for evaluation purposes.
other recording problems which interfered with
detection of vocal content. Audio files were considered It is difficult to compare costs precisely between CARI
adequate if voices could be plainly heard and operations and more traditional re-interview or
understood, regardless of other noises. This definition verification processes, because the traditional systems
of quality differs from any commonly used to rate the tend to be well established while CARI systems are still
quality of audio recording for other purposes, such as evolving. A theoretical cost-analysis model was created
musical entertainment, but it is appropriate for survey to compare the expected costs of operating both systems
evaluation purposes. at the same “steady state” in which all systems had been
implemented. Analysis of that model suggests that the
Figure 8. Field performance problems detected steady-state cost of verification is less with CARI than
through CARI for the traditional approach, but actual data were not
Count % of Problem Definition available for that comparison.
Cases
13 0.2 Authenticity Questionable 12. Visions of the Future
217 3.9 Reading - Minor Deviation
Looking forward, we see expanded use of CARI in field
72 1.3 Reading - Major Deviation
surveys, for monitoring survey quality and also as an
73 1.3 Recording Errors
integral part of data collection. Advances in digital
44 0.8 Unprofessional Behavior signal processing may support automation of activities
86 1.5 Inappropriate Probing now being done by CARI monitors or coders.
79 1.4 Feedback not Neutral
1 0.01 Incorrect Incentive Provided With regard to data quality monitoring, it may be
possible one day to screen a large portion of the audio
We have also gathered operational information on field files automatically for evidence of falsification. For
staff performance from production use of CARI. Figure example, software may be able to distinguish between
audio files with and without voices and to identify the Herget, D., Biemer, P.P., Morton, J. and Sand, K.
number of differing voices within a single recording. (2005), “Computer Audio Recorded Interviewing
This technology could be employed for a population (CARI): Additional Feasibility Efforts of
census or large survey that requires many interviews to Monitoring Field Interview Performance”,
be screened very quickly for falsification. Audio Presented at Federal Conference on Statistical
processing software may be able to determine Methods.
respondent qualities such as whether a voice is male or Kowal, S., O'Connell, D.C. and Sabin, E.J. (1975)
female, or to match spoken interviewer words with the “Development of Temporal Patterning and Vocal
predefined question text, for evaluation of how well the Hesitation in Spontaneous Narratives”, Journal of
interviewer followed protocol. Psycholinguistic Research, Vol. 4, p. 195-207.
The LAME Project, LAME Compression Software,
CARI can also be used as a data collection tool. A http://lame.sourceforge.net/index.php
number of surveys tape record respondent responses Ming, J., Hazen, T.J. and Glass, J.R. (2006), "Speaker
that are subsequently coded, and CARI offers a Verification Over Handheld Devices with Realistic
convenient, unobtrusive alternative for collecting these Noisy Speech Data," Proceedings of the
recordings. Matching audio responses to a dictionary of International Conference on Acoustics, Speech, and
expected words might allow automated coding of open- Signal Processing, pp I-637 to I-640.
ended items or of an “other-specify” option of multiple- Nuance Communications, Inc. (2005), “About
choice items. Dictaphone”,
http://www.dictaphone.com/aboutus/history.asp
Farther in the future, recordings may be transcribed O’Connell, D.C. and Kowal, S. (1983), “Pausology”. In
automatically to text with can be parsed and analyzed. Computers in Language Research, Sedelow, W. A.
Current commercial software often requires “training” Jr. and Sedelow, S.Y. (eds), Berlin-New York:
the package to recognize the user’s voice, which limits Walter de Gruyter & Co., pp. 221-301.
usefulness in the field. However, research is underway Statistics Netherlands, Statistical Informatics
on speech-to-text conversion tools in uncontrolled or Department, P.O. Box 4000, 2270 JM Voorburg,
“noisy” surroundings (Ming, et al, 2006), which may The Netherlands.
broaden its applicability to include home environments. Stockdale, A. (2002), “Tools for digital audio recording
in qualitative research”, Social Research Update,
Acknowledgements pp 1-4
Suresh, R. (2005). “Web-Based Computer Audio
The authors would like to acknowledge the work of Recorded Interview (Web-CARI).” Presented at the
Albert Bethke, Phil Cooley and R. Suresh in the International Field Directors and Technology
invention of CARI, the contributions of Frank Mierzwa Conference 2005, Atlanta, GA
in cost modeling and of Pauline Robinson in file Thissen, M. R., and Rodriguez, G. (2004), “Recording
compression studies. Finally we would like to Interview Sound Bites Through Blaise
recognize the contributions of the U.S. Census Bureau Instruments”, Proceedings of the International
to the field of audio-recorded interviewing. Blaise Users’ Conference, pp. 411-423.
Thissen, M.R., and Sattaluri, S. (2006a) “Computer
References Audio-Recorded Interviewing (CARI)”, Presented
at The International Field Directors and
Biemer, P.P., Hergert, D., Morton, J. and Willis, W. Technologies Conference, Montreal
(2000), “The Feasibility of Monitoring Field Thissen, M.R, and Sattaluri, S. (2006b), “Research and
Interview Performance Using Computer Audio Development in Audio-Recorded Interviewing, Part
Recorded Interviewing (CARI)”, Proceedings of the II”, Presented at The International Field Directors
American Statistical Association’s Section on Survey and Technologies Conference, Montreal, Canada
Research Methods, pp. 1068-1073 University of California, Berkeley, Software Support
Dwyer; J.J, Godin, D.K., Colon, R.S., Sr., Rothschild, S. Services, “Computer-Assisted Survey Execution
Pawlowski, J.J., and Vaughan, J.C. (1998), “Voice System (CASES),” CSM Program, 358 Barrows
file management in portable digital audio recorder”, Hall #3820, Berkeley, CA 94720.
United States Patent 6671567 Wrenn-Yorker, C. and Thissen, M.R.(2005), “Computer
Hartman, P., Wrenn-Yorker, C., Sattaluri, S. and Audio Recorded Interviewing (CARI)
Thissen, M.R. (2006), “Research and Development Technology”, Presented at the Federal Computer-
in Audio-Recorded Interviewing”, Presented at Assisted Survey Information Collection
Federal Computer Assisted Survey Information (FedCASIC) Conference.
Collection (FedCASIC) Conference