Development of Real-Time Visual Feedback Assistance in Singing Training: A Review
Development of Real-Time Visual Feedback Assistance in Singing Training: A Review
net/publication/220663218
CITATIONS READS
29 131
3 authors:
SEE PROFILE
Peter Desain
Radboud University Nijmegen
176 PUBLICATIONS 2,449 CITATIONS
SEE PROFILE
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Makiko Sadakata
letting you access and read them immediately. Retrieved on: 18 September 2016
Review article
Abstract Four real-time visual feedback computer tools for singing lessons (SINGAD, ALBERT, SING &
SEE, and WinSINGAD), and the research carried out to evaluate the usefulness of these
systems are reviewed in this article. We report on the development of user-functions and the
usability of these computer-assisted learning tools. Both quantitative and qualitative studies
confirm the efficiency of real-time visual feedback in improving singing abilities. Having
addressed these findings, we suggest further quantitative investigations of (1) the detailed
effect of visual feedback on performance accuracy and on the learning process, and (2) the
interactions between improvement of musical performance and the type of visual feedback
and the amount of information it presents, the skill level of the user and the teacher’s role.
308 & 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd Journal of Computer Assisted Learning 22, pp308–316
Computer-assisted singing training 309
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
310 D. Hoppe et al.
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
Computer-assisted singing training 311
Fig 3 Example of ATARI SINGAD assessment screen for the fourth trial of the set illustrated at the foot of the figure. (The figure has
been adapted from Howard et al. 2003).
training, while the other participant received classical meter condition, as subsequent use actually decreased
voice training and VFB with the following three response accuracy.
parameter conditions: (1) CQ; (2) spectral ratio; and
(3) a combination of CQ and spectral ratio. Each
condition lasted for two lessons successively. For each SING & SEE
condition, the effect of the presented parameter on the
level of both parameters (CQ, spectral ratio) in re- Recently, the SING & SEE project was introduced at the
sponses was measured. An increase in the levels of Conference of Interdisciplinary Musicology (Call-
these parameters indicates the presence of a pro- aghan et al. 2004). The project aimed at developing
nounced singer’s formant. In the single parameter new VFB technology for the singing studio. The main
conditions (1 and 2), the level of the parameter pre- features of research were the investigation of acoustic
sented as VFB increased, while the level of the other analysis techniques, methods of displaying VFB in a
parameter, which was not presented as VFB, remained meaningful way and the pedagogical approaches for
unchanged. For the third condition in which both CQ implementing VFB technology into practice. Three
and spectral ratio were presented, the measured level parameters were distinguished as relevant for usage in
of both these parameters increased. Clearly, the VFB the singing studio: pitch (F0 against time), vowel
was contributing to an improvement in voice quality. identity (R1, R2), and timbre (spectrogram). The
Interestingly, the changes in response accuracy major difference from former studies was that not only
showed the greatest effect when participants used the quantitative, but also qualitative data were of interest
VFB for the first time (e.g. lesson 1, 3, and 5). Sub- in this project.
sequent use only yielded a small effect. This effect Four singing teachers and 21 of their students
was even more pronounced for the combined para- (seven beginning, 11 progressed, three advanced)
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
312 D. Hoppe et al.
Fig 4 (a) Display screen A. SING & SEE pitch grid, F0 against time. Target notes are indicated as dark bars. The pitch trace is indicated
by the line; (b) display screen B. Target notes are indicated as light grey keys. Correct responses turn the target key into bright red.
(The figure has adapted from Wilson et al. 2005).
participated in the study. The study followed a within- students may not fully understand what they see. Also,
subject pre–post-test design in which all students re- teachers commented that the VFB is probably most
ceived the VFB during a 2-week intervention period. efficient during specific stages in development.
Because of different skill levels, the parameters used As for the students, 90% were positive about the use
during the intervention differed between students. of VFB in the singing lessons. They felt that the VFB
Moreover, the parameters used during the intervention improved their understanding of the desired outcome
also differed within groups because the teachers had of the target model that they were required to imitate,
their own teaching style. Each participant was re- as the feedback was immediate and unambiguous.
corded during the baseline, intervention, and follow- Also, the use of the spectrogram received positive
up periods. Both acoustic and perceptual measures feedback. The vowel quadrilateral, on the other hand,
were taken to quantify the changes in singing perfor- met with the same criticism as from the teachers.
mance. After the follow-up period, all participants Recently, another study was conducted to examine a
were interviewed on how the VFB had been integrated novel form of VFB and test its impact on singing using
into the singing lessons, whether the software used had the pitch display from the SING & SEE software
particular strengths or weaknesses and how to make (Wilson et al. 2005). Besides testing the VFB’s impact
possible improvements. on learning, this study also addressed the question of
The paper reported an analysis of the interview data. whether the amount of information presented by the
All teachers were positive about the use of the pitch visual display has an effect on singing performance.
feedback. The spectrogram appeared to be more useful Fifty-six participants took part in this study, with skill
working with more experienced students than with level ranging from non-singer to trained singers. They
beginners. More experienced and advanced students were assigned to three groups. Groups A and B re-
used the spectrogram for training timbre, dynamics, ceived VFB, while group C served as a control. Within
and the singer’s formant, whereas beginners only used each group participants, were divided into three sub-
it for note onset and offsets. The vowel quadrilateral groups according to their skill level. The distribution
appeared not to be appropriate for a high-pitched of skill level between the three groups was kept equal
voice, being only suitable for an adult male voice. as far as was possible.
Importantly, the teachers suggested that the VFB Group A trained with the pitch display from the
displays should be more musically relevant. They also SING & SEE program. This form of visual feedback
warned against the use of VFB without supervision, as presented the participants with a pitch trace, giving the
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
Computer-assisted singing training 313
Fig 5 WinSINGAD. (a) Panel showing a spectogram (upper), spectrum for the final vowel (centre) and ratio (lower); (b) dual-panel
screenshot. Example of a vocal tract area display for a sung/u:/sound. Both the singer and the tutor can monitor posture at the same
time (camera shot). ((a) has been adapted from Welch et al. 2004; (b) has been adapted from Howard et al. 2004).
user constant feedback on pitch history and target lo- VOXed: WinSINGAD
cation (Fig 4a). Group B trained with a keyboard
display (Fig 4b). Only when a note was correctly sung Besides the SING & SEE project, the VOXed project
did the matching key light up. This form of VFB does (Welch et al. 2004) was presented during the same
not provide pitch history. It merely gives the user a Conference of Interdisciplinary Musicology. The
specific right/wrong feedback. Group C was presented project also incorporated real-time VFB for singing
with the same display as group B, but without the pitch education, suggesting that there are growing interests
response feedback. In both a pre-test and a post-test, in computer assistance in musical education. While
all participants were assessed on five test patterns of SING & SEE places emphasis on maximizing VFB
five interval sequences in upward semitone incre- technology itself, VOXed aimed at maximizing the
ments. Before the test recordings began, the pitch collaboration between different fields. Psychologists,
range of the test patterns was adjusted to the partici- voice scientists, singing teachers, and singing students
pant’s pitch range. joined to form an interdisciplinary research team
In addition to the effects of VFB on the learning searching for a better insight on the impact of VFB on
process, the effect of VFB during the training period the learning experience. Importantly, VOXed sought
was analysed in this study. Comparisons were made to work with participants as active agents rather than
between pre- and post-test as well as between pre-test just passive recipients. The goal of the project was to
and intervention. Surprisingly, participants in the ex- investigate possible useful forms of VFB with the use
perimental groups tended to worsen their performance of commercially available visual feedback software.
when they received the VFB. Nevertheless, the com- The windows-compatible software tool WinSINGAD
parison between the pre- and the post-test revealed was developed, which is the successor of the early
that the experimental groups significantly improved SINGAD systems (Howard et al. 2003, 2004). Because
their performance after the intervention compared changes in vocal output over time are of primary in-
with the control group (group C). Comparisons be- terest to singers, the majority of displays are plotted
tween the groups using VFB also indicated that the against time. The parameters available were as fol-
impact of the pitch display was remarkably larger for lows: input waveform; F0 against time; short-term
the non-trained singers, whereas the impact of the spectrum; narrow-band spectrogram; spectral ratio
keyboard display was of greater use to the trained against time; vocal tract (VT) area; and mean/min VT
singers. Apparently, different skill levels require dif- area against time. Any parameter-window could be
ferent forms of VFB. displayed on the screen in combination with another.
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
314 D. Hoppe et al.
Also, a side-view Web cam could be selected for di- any up to three of the six primary parameters could be
rect postural feedback (see Fig 5). combined to form new parameters. In this way, the
Two teachers agreed on using the VFB in their VFB could be altered so that it was most appropriate to
teaching lessons. Each teacher had four students par- the context of the task, be it singing training, pro-
ticipating (eight in total), two of whom were taught nunciation training, or voice therapy. SING & SEE also
with the VFB and the other two serving as controls. focused specifically on singer-related parameters: pitch
The teachers were given full control over the way to (F0 against time), vowel identity (R1, R2), and timbre
use these parameters during their teaching lessons. (spectrogram). The last project we reviewed, VOXed,
Recordings were made during two sample lessons. In introduced the successor of the early SINGAD systems,
the first lesson, VFB was not implemented and served WinSINGAD, and included the widest range of singer-
as a baseline. The second lesson fully established the oriented parameters: input waveform; F0 against time;
use of the VOXed technology. Of each lesson, real- short-term spectrum; narrow-band spectrogram; spec-
time observations as well as video recordings were tral ratio against time; VT area; and mean/min VT
made. Outside the lessons, both teachers and students area against time. A side-view Web cam could be se-
gave semi-structured interviews. Thus far, this study lected for direct information about the student’s pos-
has reported on quantitative analyses of sample lesson ture. In general, VFB features have become more
observational data, supplemented by qualitative com- multifaceted over time. This has allowed the programs
mentary (Welch et al. 2005). to be accessible and useful to a wide range of
The results showed an overall positive appreciation singers. Although SINGAD was designed only for the
of the use of VFB in the singing studio. Teachers were child’s voice development, ALBERT aimed at a broader
so keen on using the VFB that they even started to use application that was not restricted to just singing
it with other students as well, in one case, even with training. Both SING & SEE and VOXed were specifi-
those who were initially assigned as controls. Con- cally designed for singers of all ages and skill levels.
cerning the implementation of the technology in the We can conclude that the usability of these systems
singing lessons, the teachers reported that they found has improved over time by the addition of new
the system to be user-friendly and non-obstructive to functions.
the normal course of teaching events. Of all displays, Interestingly, the effectiveness of VFB on the
the spectrogram was most fully exploited by both learning process seems to depend on the amount of the
teachers. Also, the facility to play back the sung re- student’s musical experience (as initially postulated by
sponse of the student had great advantages, as both Welch 1985). In a controlled experiment, Wilson et al.
teacher and student could now attend to the display (2005) investigated the effect of the amount of in-
and discuss its contents. Furthermore, the time-course formation presented by the visual display on the
analysis of the lessons confirmed the different teach- singing performance of both trained and non-trained
ing strategies between the two teachers. The amount of singers. The keyboard display, which presented spe-
time that each teacher made reference to the VFB also cific right/wrong feedback, was of greater use to the
differed. Overall, the introduction of VFB into the trained singers. The pitch display, which presented
singing lesson was met with great enthusiasm. more detailed and contextualized information, was
found to have a remarkably larger impact on singing in
tune for the non-trained singers. However, Callaghan
Discussion
et al. (2004) found that the greatest use was made of
Four real-time VFB computer tools for professional the spectrogram, especially with progressed students:
voice development were reviewed in this paper. Over the spectrogram appears to be a hard visual display to
time, the original designs have been developed to interpret because it presents much information at once.
provide the user with more information. SINGAD made Although these two studies showed that the amount of
use of a single parameter: fundamental frequency (F0). information displayed in the VFB interacts with the
ALBERT maximally exploited memory capacity offered skill level of the student, exactly how these two factors
by the rapid development of computer hardware in the relate is not yet clear, and therefore requires further
mid 1990s. A unique function of this system was that investigation.
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
Computer-assisted singing training 315
In designing ALBERT, careful consideration was operative group than when working on one’s own
made in constraining the amount of information that (Latane et al. 1979). The observed results of this study
was shown as VFB. To make sure that the VFB might therefore be influenced by the pattern of social
showed information relevant to the task, users were interaction. In contrast, for example, Wilson et al.
given the possibility to enter new parameters by (2005) conducted a well-controlled experiment with
combining any of the six primary parameters. Rossiter many participants. Based on the large number of ob-
et al. (1996) investigated the impact of this feature on servations, they have shown that VFB does, in fact,
the singing process during training lessons. Single and significantly enhance learning to sing in tune. How-
combined parameter display conditions were eval- ever, the number of such controlled experiments is
uated. The combined parameter condition showed the rather small. Many of the valuable qualitative findings
greatest improvement in performance accuracy during should therefore still be evaluated quantitatively.
initial use. However, during subsequent use, the same Another finding from the SINGAD experiment was that
condition yielded a negative effect. Also, greater ap- improvement takes place in pitching ability when VFB
preciation of the feedback was found during initial use is used even without supervision. The program could
of all feedback conditions, in contrast to a general therefore serve as a replacement of singing teachers
decline in performance accuracy during subsequent instead of just being used as a helpful tool to singing
use. Apparently, quality of performance does not al- teachers. However, in observational studies, greater
ways increase during the use of VFB. In fact, Wilson improvement was found when teachers assisted with
et al. (2005) reported that participants tend to worsen VFB. Furthermore, Callaghan et al. (2004) reported that
their performance during the use of VFB, although an their participant teachers were actually against the use of
increase in singing accuracy from pre-test to post-test VFB without supervision, as students may not under-
was observed. These findings are counter intuitive, as stand fully what they see. In their study, the spectrogram
one would expect a linear improvement in performance appeared to work very well and was very informative,
accuracy during the use of VFB. Thus, the actual im- but only with the teacher’s assistance. This real-life
pact of VFB on the singing process and its relation to application in the singing studio suggests that VFB may
the learning process still have to be understood. be more effective when it is properly understood, and
A difficulty in evaluating educational tools is often for this, teacher’s assistance is helpful.
to balance the fundamental research and its application As in other voice development research domains,
to real situations. In most studies that we reviewed, such as pronunciation training (Neri et al. 2002) and
preserving natural singing class conditions seems to second-language acquisition (Dowd et al. 1998; Hirata
overrule controlling experimental factors. For ex- 2004), we addressed several studies showing that VFB
ample, in their study, Welch et al. (1989) trained the helps in learning to sing. It is noteworthy that VFB
experimental groups to sing single notes, whereas serves well as a tool for assistance, rather than a re-
controls had to sing songs, just as in a common pri- placement of the singing teacher. Accordingly, VFB
mary school singing lesson. As the pre- and post-as- technology has been met with great enthusiasm from
sessment procedures used a task similar to the professional singing teachers. For future research,
experimental training procedure, the experimental further quantitative research on the detailed effect of
groups had a clear advantage over the controls during VFB on performance accuracy and on the learning
the final assessment. It might be the case that the process is necessary, as well as a closer investigation
nature of the training task can explain the observed of its interactions with the type of VFB and the
results of this study. Another point is that the pattern amount of information it presents, the skill level of the
of social interaction differed between the experimental user and the presence/absence of a teacher.
groups and the controls. While the experimental Of valuable insight to the field might be the research
groups worked in pairs or threes, the controls had to on feedback and motor skill learning with respect to
sing together as a group. In such group-singing ac- ‘focus of attention’. The effect of the learner’s ‘focus of
tivity, it is very likely that a phenomenon such as attention’ on the learning process was reviewed by Wulf
‘social loafing’ might occur: the tendency to exert less and Prinz (2001). They showed that an internal focus of
effort on a task when working as a part of a co- attention, which is directed to ‘one’s own movements’,
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd
316 D. Hoppe et al.
appears to be less beneficial to the learning process than Howard D.M. & Welch G.F. (1993) Visual displays for the
an external focus of attention, which is directed to ‘the assessment of vocal pitch matching development.
effects of one’s movements’. Accordingly, VFB on Applied Acoustics 39, 235–252.
singing performance that is directed to one’s own Howard D.M., Welch G.F., Brereton J. & Himonides E.
movements (e.g. the vocal tract) may be less effective (2003) Towards a novel real-time visual display for
singing training, http://www.sonustech.com/voxed.
than VFB on the acoustical output (e.g. real-time
Howard D.M., Welch G.F., Brereton J., Himonides E., De-
spectral information). Indeed, taking the phenomena of
Costa M., Williams J. & Howard A.W. (2004) WinSin-
attentional focus into consideration accounts for some
gad: a real-time display for the singing studio.
previous findings, such as the interview data from Logopedics Phoniatrics Vocology 29, 135–144.
Welch et al. (2005), where singing teachers preferred Latane B., Williams K. & Harkins S. (1979) Many hands
the spectrogram to all other feedback options. Further- make light the work: the cause and consequences of so-
more, the option for the ALBERT user to define new cial loafing. Journal of Personality and Social Psychol-
parameters that are most relevant to the task can also be ogy 37, 822–832.
interpreted in terms of an ‘external focus of attention’, Neri A., Cucchiarini C., Strik H. & Boves L. (2002) The
as the goal of a singing task is to reach ‘a desired effect pedagogy–technology interface in computer assisted
from the movement of the vocal apparatus (e.g. the pronunciation training. Computer Assisted Language
desired acoustic output)’. A further understanding of Learning 15, 441–467.
Rossiter D. & Howard D.M. (1996) ALBERT: real-time
internal and external focus of attention in relation to
visual feedback computer tool for professional vocal
real-time VFB learning would therefore enrich the field.
development. Journal of Voice 10, 321–336.
Rossiter D., Howard D.M. & DeCosta M. (1996) Voice
Acknowledgements development under training with and without the influ-
ence of real-time visually presented biofeedback.
This research was supported by the Technology Acoustical Society of America 99, 3253–3256.
Foundation STW, applied science division of NWO Welch G.F. (1985) A schema theory of how children learn to
and the technology program of the Ministry of Eco- sing in tune. Psychology of Music 13, 3–18.
nomic Affairs (NNN6301). We would like to thank Welch G.F., Rush C. & Howard D.M. (1989) Real-
Alex Brandmeyer for English editing. time visual feedback in the development of vocal
pitch accuracy in singing. Psychology of Music 17,
146–157.
References
Welch G.F., Himonides E., Howard D.M. & Brereton J.
Arends N. & Povel D.J. (1991) An evaluation of the visual (2004) VOXed: Technology as a meaningful teaching aid
speech apparatus. Speech communication 10, 405–414. in the singing studio. Proceedings of the conference on
Callaghan J., Thorpe W. & Van Doorn J. (2004) The science interdisciplinary musicology (CIM04), Graz, Australia,
of singing and seeing. Proceedings of the Conference on April 15–18.
Interdisciplinary Musicology (CIM04), Graz, Australia, Welch G.F., Howard D.M., Himonides E. & Brereton J.
April 15–18. (2005) Real-time feedback in the singing studio: an in-
Dowd A., Smith J. & Wolfe J. (1998) Learning to pronounce novatory action-research project using new voice tech-
vowel sounds in a foreign language using acoustic mea- nology. Music Education Research 7, 225–249.
surements of the vocal tract as feedback in real-time. Wilson P.H., Thorpe C.W. & Callaghan J. (2005) Looking at
Language and Speech 41, 1–20. singing: does real-time visual feedback improve the way
Hirata Y. (2004) Computer assisted pronunciation training for native we learn to sing? Second APSCOM Conference: Asia-
English speakers learning Japanese pitch and duration contrasts. Pacific Society for the Cognitive Sciences of Music, South
Computer Assisted Language Learning 17, 357–376. Korea, Seoul, August 4–6.
Howard D.M. & Welch G.F. (1989) Microcomputer-based Wulf G. & Prinz W. (2001) Directing attentionto movements
singing ability assessment and development. Applied effects enhances learning: a review. Psychonomic Bul-
Acoustics 27, 89–102. letin and Review 8, 648–660.
& 2006 The Authors. Journal compilation & 2006 Blackwell Publishing Ltd