Wishart, Trevor - Computer Sound Transformation PDF
Wishart, Trevor - Computer Sound Transformation PDF
Trevor Wishart
Introduction
For the past thirty years I have been involved in developing and using sound
transformation procedures in the studio, initially working on analogue tape, and
then through various types of computer platforms as computer music came of
age. Over these years I've developed a very large number of procedures for
manipulating sounds. Being a composer, I refer to these processes as musical
instruments and they are developed as part of my musical work. However, I have
not been prominent in publishing this work in academic journals as I'm primarily
a working artist. Nevertheless, the processes (and source code) have all been
available to others with the facilities to use (or develop) them through a
composers' cooperative organisation based in the UK, the Composers' Desktop
Project. As there has been a recent surge of interest in the Phase Vocoder F1"
name=FF1>1 as a musical resource, I've been advised by friends in the academic
community to put my contribution to these developments on record.
Origins
The earliest successful transformations I developed can be heard in the piece Red
Bird (1973-77) 2. The musical structure of the piece was conceived in terms of
such transformations between sound types, but techniques for achieving this had
to be developed on an ad hoc basis - through discovering what was practicable
with the facilities available in the local analogue studio. The transformations, all
from the voice to other sounds, include 'lis' (from the word 'listen') to birdsong,
'rea' (from the word 'reason') to animal-sounds, 'reasonabl-' to water, and
various machine-like events constructed from vocal subunits. They were achieved
by combining the elementary studio facilities available (tape editing, mixing,
mixer eq) with extended vocal techniques (developed while working as a free
improvising vocal performer F3" name=FF3>3 ). A discussion of the approaches
used in Red Bird, and the concept of Sound Landscape, can be found in On Sonic
Art F4" name=FF4>4. A more detailed description of the composition of this piece
can be found in Red Bird, A Document F5" name=FF5>5.
Returning from IRCAM to the UK musicians were faced with an entirely different
development environment. There was no independent national research centre
for music – music research was confined to University music departments. Most
of these were small and very poorly funded – they were seen as primarily sites of
humanities research and hence could not attract the money required for
advanced computing equipment, which at that time was very expensive. A
number of departments had PDP-11 computers accessible to a few research
students and staff – updating this equipment was a constant financial worry.
Immediately after the IRCAM project in 1986, working in the CDP environment, I
developed a large number of other spectral transformation tools using the Phase
Vocoder data as a starting point. Subsequently, I also created a number of
original time-domain instruments (e.g. waveset manipulation, grain
manipulation, sound shredding) and extensions of existing instruments (e.g.
brassage F11" name=FF11>11). In 1994, a complete description of all the
spectral, time-domain and textural transformation possibilities available on the
CDP system was published in the book Audible Design F12" name=FF12>12. The
book has been used subsequently as a source by other software developers (for
example by Mike Norris who implemented many of my waveset manipulation
procedures on the Macintosh, now available from Sound Magic) some of whom
may well have had access to the CDP code.
I would stress that the work of researchers and developers at IRCAM (notably
Steve McAdams who introduced me to contemporary psycho-acoustic research on
the 1981 induction course and, later, Miller Puckette), and at the G.R.M. F13"
name=FF13>13 – where I attended the composition course in 1993 – were an
important source of knowledge, ideas and inspiration for my work. However,
when the sound morphing and spectral stretching instruments for Vox 5 were
originally developed as part of the public domain system shared by IRCAM,
Stanford and other major sites, IRCAM's research priorities were focused
elsewhere. The instruments did make their way to the USA via the University of
Santa Barbara and Dan Timis (the resident computer wizard at IRCAM when I was
working there). Later IRCAM did decide to pursue Phase Vocoder based
transformation and the Super Phase Vocoder (SVP) group was established (the
basis of the later AudioSculpt). During the development phase of SVP, when the
CDP spectral transformation suite was already quite large, I was a visitor at
IRCAM, and discussed possible transformational approaches with some of the
team working on the program.
The pioneering development work of the CDP has remained largely unknown or
forgotten about as the vast majority of the Computer Music community eventually
opted for the Macintosh as the machine of choice. Furthermore, being developed
primarily by a group without official financial support from within the University
infrastructure, the project was always short of resources. An initial grant from the
Gulbenkian Foundation helped propel us forward in the first 18 months, but this
was exceptional. Nevertheless the CDP continued to make both its instruments
and its code available to interested users and developers. I am indebted to the
work of many others developers (including in particular Richard Orton and
Richard Dobson) and to the C.D.P. Administrator, Archer Endrich, for continuing
to promote, support and develop the system, and make it more accessible to
users, despite the lack of financial rewards. The system has tended to be adopted
by independent composers or small educational institutions with limited budgets.
However, the source code has been available at a number of UK (and other)
University sites at different times, even after these moved to a primarily Mac-
based studio system. And some institutions, notably the Institute of Electronic
Music in Vienna, developed sophisticated graphic interfaces of their own.
There is not enough space to describe all the C.D.P. procedures in this article, so
I will describe only the more interesting ones, or those not available elsewhere.
Full descriptions of all these processes can be found in Audible Design.
From a musical point of view, the most innovative early new developments were
spectral banding, a rather complicated 'filter', which enabled the spectrum to be
divided into bands, and various simple amplitude-varying (and in fact frequency-
shifting) processes to be applied to the bands, spectral tracing and spectral
blurring.
Spectral tracing simply retains the N channels with the loudest (highest
amplitude) data on a window-by-window basis. If N is set to c. 1/8th the number
of channels used in the PVOC analysis, this can sometimes function as an
effective noise reduction procedure (the value of N which works best depends on
the signal). When N is much smaller than this, and a complex signal is processed,
a different result transpires. The small number of PVOC channels selected by the
process will vary from window to window. Individual partials will drop out, or
suddenly appear, in this elect set. As a result, the output sound will present
complex weaving melodies produced by the preserved partials as they enter (or
leave) the elect set. This procedure is used in Tongues of Fire F14"
name=FF14>14.
Tuning the spectrum was introduced a little later. Tune spectrum works by
selecting channel data lying close to the partials of a specified set of pitches, and
moving the frequency of that data to (or towards) the desired partial frequency.
The spectrum can also be traced (see above) before doing this. Choose partials
selects channels which should contain frequencies close to those of a specified set
of partial frequencies (harmonics of, odd harmonics of, octaves above, linear
frequency steps away from, or a linear frequency displacement from harmonics of
a given fundamental). As analysis channels above the 21st are sufficiently narrow
to focus on a semitone band of frequency or less, the channel number itself is
sufficient to grab the desired partials.
After discussing possible algorithms with the SVP developers, I implemented
some of their ideas for spectral filters (defining filters in a more conventional way
than the banding procedure described above), and implemented various types of
low pass, high pass, band pass, notch and graphic e.q. spectral filters,
together with a chorusing procedure suggested by Steve McAdams' work
(introducing jitter into the partials data).
After discussions with Miller Puckette about his work on tracking the pitch
produced by instrumentalists performing in real time, procedures to extract the
pitch of PVOC data were finally developed into a useful form, and instruments to
correct the data, to transform the pitch data (quantise, shift, vibrato,
approximate, or randomise the pitch, and exaggerate, invert or smooth
the pitch contour), and apply the pitch to other sounds, were developed.
At the same time, the extraction of formants from the PVOC data was
implemented satisfactorily for the first time within the CDP environment. This
enabled the inner glissando procedure to be developed. Here, the process
retains the time-varying spectral envelope (the formant envelope) of the sound,
but replaces the signal itself by an endlessly glissandoing Shepard Tone signal
F15" name=FF15>15.
Waveset distortion was developed for the CDP while composing Tongues of
Fire. I defined a waveset as the signal between any pair of zero-crossings. With a
simple sine-wave the waveset corresponds to the waveform. But even with a
harmonic tone with very strong partials, the waveform may cross the zero more
than twice in a complete cycle. In this case the wavesets are shorter than the
waveform. With complex signals (e.g. speech) containing noise elements, the
definition of the waveset produces many varieties of technically arbitrary, but
potentially musically interesting, artefacts. A whole suite of procedures was
developed to manipulate wavesets. I have used three at prominent moments in
compositions.
The second, waveset averaging, involves extracting the shape of each waveset,
and then averaging this shape over a group of N adjacent wavesets. Again, this
produces an extreme modification of the source (usually a relatively harsh sound
and often a transformation so distant from the source that little audible
connection is apparent!) and is used in the 'fireworks' transformation immediately
after the rhythmic climax of Tongues of Fire. The article Sonic Composition in
'Tongues of Fire' F17" name=FF17>17 discusses this in more detail.
This approach also allows one to reverse an iterative sound. Most sounds have
an asymmetric form with a (relatively) loud initiating event at the beginning, and
a tailing away to zero at the end (these features themselves can have a vast
number of forms). Playing a sound backwards therefore rarely results in a sound
that we recognise as being a close relative of the original. Only sounds of (on
average) steady amplitude which have attack and decay as inverses of one
another e.g. a slow fade in matching a slow fade out, will appear similar when we
reverse them. Iterative sounds are particular difficult in this respect as every
attack within them gets reversed. If we extract the grains and then sequence
them in the reverse order, without reversing the grains themselves, we achieve a
convincing sense of retrograding the sound without change of source recognition.
Texture Generation was (and is) able to use an arbitrarily large number of input
sounds, to generate a stream of events where all the following parameters can
themselves vary through time:
• the average time between event repetitions (the density of events) or the
specification of a sequence of event times
• the scatter (or randomisation) of event timings (which means the
instrument can generate anything from dance-music-like regularity to
complete arhythmicity)
• a quantisation grid for times (or none)
• a specification of which range of input sounds are to be used
• the range and range-limits of pitch-transposition of the events
• the range and range-limits of event amplitudes
• the range and range-limits of durations of the individual events in the
texture
• the spatial centre of the texture on the stereo stage, and its motion
• the spatial bandwidth of the texture on the stereo stage.
• over a harmonic field (not necessarily tempered) which can itself change
through time
• clustered into groups of events of specified or random pitch-shape
• formed from a line with arbitrary or specified decorating patterns (which
themselves have properties with independent parameters of their own).
The texture generation instrument are used extensively in all my sonic art pieces
since Vox 5.
First of all, there is no limit to the number of 'tracks' used (apart from the
memory space of the computer). Any number of sounds can be superimposed.
Secondly, global operations on the mix are available, from simple features like
doubling (or multiplying by any number) the distance between event onsets, or
randomising them (very slightly or radically), to randomly swapping around the
sound sources in the mix, automatically generating particular timing-sequences
for event entry (from regular pulses, to logarithmic sequences etc.), or
redistributing the mix output in the stereo space in a new, user-defined way.
There are no original filter algorithms in the CDP, but some powerful filter design
frameworks are available. In particular filter varibank allows one to define a
filter over a set of pitches which itself varies in time, where each pitch element
has an associated amplitude (which can go to zero so that pitches, or moving-
pitch-lines, can be 'faded out' or cut). The number of harmonics of those pitches
(and their relative level) can be specified (these serve to define further individual
filter frequencies), and the filter Q can also vary through time. This filter-building
algorithm was developed and used during the composition of Fabulous Paris F22"
name=FF22>22.
Over the years I have also developed a large number of utilities which I find
indispensable as a composer, starting with an instrument which searches a tape
of source recordings and extracts significant segments from surrounding
silences or clicks, using gating, and selection parameters specified by the user.
Next there are facilities to compare sounds, or compare the channels of a
single sound, (are they the same, or almost the same to within specified
limits?), to balance the level of sources, or the channels of a source, to invert
(or narrow) spatial orientation, and to invert phase (which, apart from
anything else, can be used to gain more headroom in a mix).
In the various instruments described in this article, alomst all parameters can
vary through time. Data for this is provided in simple textfiles containing
time+value pairs. To aid in working with such data, hundreds of automatic data-
creation and data-modification processes have been implemented, and are made
available in the Table Editor, now also driven from the graphic interface. I have
used it to design and modify complex filter specifications, to generate 'random
funk' accentuation patterns as envelopes over an existing stream of events
(Birthrite A Fleeting Opera F23" name=FF23>23) – and even to do my tax
returns(!). As an additional aid, a Music Calculator allows easy conversion
between a great variety of musical and technical units.
The future...
Footnotes