Navigating The Mix Space Theoretical and
Navigating The Mix Space Theoretical and
USIR is a digital collection of the research output of the University of Salford. Where copyright
permits, full text material held in the repository is made freely available online and can be read,
downloaded and copied for non-commercial private study or research purposes. Please check the
manuscript for any further copyright restrictions.
For more information, including our policy and submission procedure, please
contact the Repository Team at: usir@salford.ac.uk.
NAVIGATING THE MIX-SPACE: THEORETICAL AND PRACTICAL
LEVEL-BALANCING TECHNIQUE IN MULTITRACK MUSIC MIXTURES
ABSTRACT 2. BACKGROUND
The mixing of audio signals has been at the foundation of For many decades the mixing console has retained a recog-
audio production since the advent of electrical recording in nisable form, based on a number of replicated channel strips.
the 1920’s, yet the mathematical and psychological bases Audio signals are routed to individual channels where typ-
for this activity are relatively under-studied. This paper in- ical processing includes volume control, pan control and
vestigates how the process of mixing music is conducted. basic equalisation. Channels can be grouped together so
We introduce a method of transformation from a “gain- that the entire group can be processed further, allowing for
space” to a “mix-space”, using a novel representation of complex cross-channel interactions.
the individual track gains. An experiment is conducted in One of the most fundamental and important tasks in music
order to obtain time-series data of mix engineers explo- mixing is the choice of relative volume levels of instru-
ration of this space as they adjust levels within a multi- ments, known as level-balancing. Due to its ubiquity and
track session to create their desired mixture. It is observed relative simplicity, level-balancing using fader control is a
that, while the exploration of the space is influenced by the common approach to the study of mixing. It has been indi-
initial configuration of track gains, there is agreement be- cated that balance preferences can be specific to genre [3]
tween individuals on the appropriate gain settings required and, for expert mixers, can be highly consistent [4].
to create a balanced mixture. Implications for the design As research in the area has continued, a variety of as-
of intelligent music production systems are discussed. sumptions regarding mixing behaviours have been put for-
ward and tested. A number of automated fader control
1. INTRODUCTION systems have used the assumption that equal perceptual
loudness of tracks leads to greater inter-channel intelligi-
The task of the mix engineer can be seen as one of solving bility [5, 6]. This particular practice was investigated in a
an optimisation problem [1], with potentially thousands of study of “best-practice” concepts [7], which included pan-
variables once one considers the individual level, pan po- ning bass-heavy content centrally, setting the vocal level
sition, equalisation, dynamic range processing, reverbera- slightly louder than the rest of the music or the use of cer-
tion and other parameters, applied in any order, to many tain instrument-specific reverberation parameters. A num-
individual audio components. ber of these practices were tested using subjective evalua-
The objective function to be optimised varies depending tion and the equal-loudness condition did not necessarily
on implementation. Conceptually, one should maximise lead to preferred mixes [7].
‘Quality’, an often-debated concept in the case of music Much of these “best-practice” techniques may be anecdo-
production. In this context, borrowing from ISO 9000 [2], tal, based on the experience of a small number of profes-
we can consider ‘Quality’ to be the degree to which the sionals who have each produced a large number of mixes
inherent characteristics of a mix fulfil certain requirements. (see [8,9] for reviews). Due to the proliferation of the Dig-
These requirements may be defined by the mix engineer, ital Audio Workstation (DAW) and the sharing of software
the artist, the producer or some other interested party. In a and audio via the internet, it has now become possible to
commercial sense, we consider the requirement to be that reverse this paradigm, and study the actions of a large num-
the mix is enjoyed by a large amount of people. ber of mixers on a small number of music productions.
This paper considers how the mix process could be rep- This allows both quantitative and qualitative study of mix-
resented in a highly simplified case, investigates how high- ing practice, meaning the dimensions of mixing and the
quality outcomes are achieved by human mixers and offers variation along these dimensions can be investigated.
insights into how such results could be achieved by intelli- To date, there have been few quantitative studies of com-
gent music production systems. plete mixing behaviour, as lack of suitable datasets can be
problematic. One such study focussed on how a collec-
Copyright: c 2015 Alex Wilson et al. This is an open-access article distributed tion of students mixed a number of multitrack audio ses-
under the terms of the Creative Commons Attribution 3.0 Unported License, which sions [10]. It was shown that, among low-level features
permits unrestricted use, distribution, and reproduction in any medium, provided of the resultant audio mixes, most features exhibited less
the original author and source are credited. variance across mixers than across songs.
3. THEORY
g2
When considering a realistic mixing task the number of
variables becomes very large. An equaliser alone may have r
dozens of parameters, such as the center frequency, gain,
bandwidth and filter type of a number of independent bands,
leading to a large number of combinations. There are meth- φ
ods to reduce the number of variables in these situations. g1
In [11], the combination of track gains and simple equal-
isation variables was reduced to a 2D map by means of Figure 1: The point represents a balance of two instru-
a self-organising map, where the simple equalisation pa- ments, controlled by gains g1 and g2 . Any other point on
rameter was the first principal component of a larger EQ the line at angle φ would represent the same balance of
system, showing further dimensionality reduction. While instruments, thus r is a scaling factor.
these approaches can create approximations of the mix-
space, the true representation is difficult to conceive for Track 1 Track 2 Track 3 Track 4
all but the most simple mixing tasks. VOCALS GUITARS BASS DRUMS
g1 =r cos(φ1 ) (2a)
arccos √ 2gn−1 2 gn ≥ 0
gn +gn−1
φn−1 = gn−1 (1e) g2 =r sin(φ1 ) cos(φ2 ) (2b)
2π − arccos √ 2 gn < 0
2
gn +gn−1 g3 =r sin(φ1 ) sin(φ2 ) cos(φ3 ) (2c)
..
Consider a system of four tracks, as shown in Fig. 2. .
Here, φ3 denotes the balance of the drum and bass tracks,
gn−1 =r sin(φ1 ) · · · sin(φn−2 ) cos(φn−1 ) (2d)
to form the rhythmic foundation of the mix. φ2 describes
the projection of this balance onto the guitar dimension, gn =r sin(φ1 ) · · · sin(φn−2 ) sin(φn−1 ) (2e)
ut = φ(1,t) − φ(1,t−1) (3a)
3.2 Characteristics of the mix-space
vt = φ(2,t) − φ(2,t−1) (3b)
With a mix-space having been defined, what characteristics wt = φ(3,t) − φ(3,t−1) (3c)
does the space have? How does the act of mixing explore
this space? We now discuss three scenarios - beginning at a If all mixers begin at the same source then a number of
‘source’, exploring the ‘mix-space’ and arriving at a ‘sink’ questions can be raised in relation to movement through
the mix-space.
3.2.1 The ‘source’
• Moving away from the source, at what point do mix
In a real-world context, when a mixer downloads a mul- engineers diverge, if at all?
titrack session and first loads the files into a DAW, each
mixer will initially hear the same mix, a linear sum of the • How do mix engineers arrive at their final mixes?
raw tracks 1 . While each of these raw tracks can be pre- What paths through the mix-space do they take?
sented in various ways if we presume each track is recorded
with high signal-to-noise ratio (as would have been more • Do mix engineers eventually converge towards an
important when using analogue equipment) then, with all ideal mix?
faders set to 0dB, the perceived loudness of those tracks
with reduced dynamic range (such as synthesisers, electric 3.2.3 The ‘sink’
bass and distorted electric guitars) would be higher than Complementary to the concept of a source in the mix-space,
that of more dynamic instruments. a ‘sink’ would represent a configuration of the input tracks
Much like the final mixes, this initial ‘mix’ can be rep- which produces a high-quality mix that is apparent to a
resented as a point in some high-dimensional, or feature- sizeable portion of mix engineers and to which they would
reduced, space. It is rather unlikely that a mixer would mix towards. As the concept of quality in mixes is still rel-
open the session, hear this mix and consider it ideal, there- atively unknown there are a number of open questions in
fore, changes will most likely be made in order to move the field which can be addressed using this framework.
away from this location in the space. For this reason, this
position in the mix-space is referred to as a ‘source’. • Is there a single sink, i.e. one ideal mix for each mul-
In practice, the session, as it has been received by the mix titrack session? In this case the highest mix-quality
engineer, may be an “unmixed sum” or may be a rough would be achieved at this point.
mix, as assembled by the producer or recording engineer.
In a real-world scenario, the work may be received as a • Are there multiple sinks, i.e. given enough avail-
DAW session, where tracks have been roughly mixed. Al- able mixes, are these mixes clustered such that one
ternatively, where multitrack content is made available on- can observe a number of possible alternate mixes
line, such as in mix competitions, the unprocessed audio of a given multitrack session? These multiple sinks
tracks are usually provided without a DAW session file. would represent mixes that are all of high mix-quality
The latter approach is assumed in this study, in order for but audibly different.
mix engineers to have full creative control over the mixing
process. If mixers were to make unique changes to the ini- 4. EXPERIMENT
tial configuration then that source can be considered to be
radiating omni-directionally in the mix-space. However, To the authors’ knowledge, there is a lack of appropriate
it is possible that, for a given session, there may be some data available to directly test the theory presented in Sec-
changes which will seem apparent to most mixers, for ex- tion 3. In order to examine how mix engineers navigate
ample, a single instrument which is louder than all others the mix-space a simple experiment was conducted. In this
requiring attenuation. For such sessions, the source may instance the mixing exercise is to balance the level of four
be unidirectional, or if a number of likely outcomes exist, tracks, using only a volume fader for each track. Impor-
there may exist a number of paths from the source. tantly, the participants will all begin with a predetermined
balance, in order to examine the source directivity. This ex-
3.2.2 Navigating the mix-space periment aims to answer the following research questions:
The path from the source to the final mix could be repre- Q1. Can the source be considered omni-directional or are
sented as a series of vectors in the mix-space, henceforth there distinct paths away from the source?
named ‘mix-velocity’, and defined in Eqn. 3, for the three
Q2. Is there an ideal balance (single sink)?
dimensions shown in Fig. 2.
1 Here it is significant that a DAW typically defaults to faders at 0dB, Q3. Are there a number of optimal balances (multiple
while a separate mixing console may default to all faders at -∞dB. This sinks)?
allows an experimenter to ensure that all mixers begin by hearing the
same ‘mix’. This has been referred to in previous studies as an ‘unmixed
sum’ or a ‘linear sum’. While the term ‘unmixed’ can be misleading, it Q4. What are the ideal level balances between instru-
does reflect the fact that the artistic process of mixing has not yet begun. ments?
Previous studies have indicated that perceptions of quality
and preference in music mixtures are related to subjec-
tive and objective measures of the signal, with distortion,
punch, clarity, harshness and fullness being particularly
important [12, 13]. By using only track gain and no pan-
ning, equalisation or dynamics processing, most of these
parameters can be controlled.
4.1 Stimuli
The multitrack audio sessions used in this experiment have
been made available under a creative commons license 2 3 .
These files are also indexed in a number of databases of
multitrack audio content 4 5 Three songs were used for
this experiment, which consisted of vocals, guitar, bass and
drums, as per Fig. 2, and as such the interpretations of φn
from here on are those in Fig. 2. Figure 3: GUI of mixing test. The faders are unmarked
The four tracks used from “Borrowed Heart” are raw and all begin at the same central value, which prevents par-
tracks, where no additional processing has been performed ticipants from relying on fader position to dictate their mix.
apart from that which was applied when the tracks were
recorded 6 . The tracks from “Sister Cities” also repre-
sent the four main instruments but were processed using 4.2 Test panel
equalisation and dynamic range compression. These can
In total, 8 participants (2 female, 6 male) took part in the
be referred to as ‘stems’, as the 11 drum tracks have been
mixing experiment. As staff and students within Acous-
mixed down, the two bass tracks (a DI signal and ampli-
tics, Digital Media and Audio Engineering at University of
fier signal) have been mixed together, the guitar track is a
Salford, each of these participants had prior experience of
blend of a close and distant microphone signals and the vo-
mixing audio signals. The mean age of participants was 25
cal has undergone parallel compression, equalisation and
years and none reported hearing difficulties.
subtle amounts of modulation and delay. In the case of
“Heartbeats”, the tracks used are complete ‘mix stems’, in
4.3 Procedure
that the song was mixed and bounced down to four tracks
consisting of ‘all vocals’, ‘all music’ (guitars and synthe- Rather than use loudspeakers in a typical control room, the
sisers), ‘all bass’ and ‘all drums’. For testing, the audio test set-up used a more neutral reproduction. The experi-
was further prepared as follows: ment was conducted in a semi-anechoic chamber at Uni-
versity of Salford, where the background noise level was
• 30-second sections were chosen, so that participants negligible. Audio was reproduced using a pair of Sennheiser
would be able to create a static mix, where the de- HD800 headphones, connected to the test computer by a
sired final gains for each track are not time-varying. Focusrite 2i4 USB interface. Due to the nature of the task,
each participant adjusted the playback volume as required.
• Within each song, each 30-second track was nor- Reproduction was monaural, presented equally to both ears.
malised according to loudness. In this case, loudness While the choice between loudspeakers and headphones
is defined by BS.1770-3, with modifications to in- is often debated [15], in this case, particularly as repro-
crease the measurements suitability to single instru- duction was mono, headphones were considered to be the
ments, rather than full-bandwidth mixes [14]. This choice with greater potential for reproducibility.
allows the relative loudness of instruments to be de- The experimental interface was designed using Pure Data,
termined directly from the mix-space coordinates. an open source, visual programming language. The GUI
used by participants is shown in Fig. 3. Each participant
• For each song, two source positions were selected.
listens to the audio clip in full at least once, then the audio
The φ terms were selected using a random number
is looped while mixing takes place and fader movement is
generator, with two constraints: to ensure the two
recorded. The participant then clicks ‘stop mix’ and the
sources are sufficiently different, the pair of sources
next session is loaded. For each session the user is asked
must be separated by unit Euclidean distance in the
to create their preferred mix by adjusting the faders.
mix-space and to ensure the sources are not mixes
An initial trial was provided in order for participants to
where any track is muted, the values were chosen
become familiar with the test procedure, after which the
from the range π/8 to 3π/8 (see Fig. 2).
six conditions (3 songs, 2 sources each) were presented in
2 http://weathervanemusic.org/shakingthrough a randomised order. The mean test duration was 14.2 min-
3 http://www.cambridge-mt.com/ms-mtk.htm utes, ranging from 11 to 17 minutes. The real-time audio
4 http://multitrack.eecs.qmul.ac.uk/
5 http://medleydb.weebly.com/
output during mixing was recorded to .wav file at a sam-
6 https://s3.amazonaws.com/tracksheets/Hezekiah+Jones+- pling rate of 44,100Hz and a resolution of 16 bits. Fader
+Tracksheet.xlsx positions were also recorded to .wav files using the same
0 Song 1 − both sources Song 2 − both sources
π/2 π/2
−5
Relative Loudness (LU) 3π/8 3π/8
−10
π/4 π/4
−15
π/8 π/8
−20
−25 0 0
Vocals Guitar Bass Drums φ1 φ2 φ3 φ1 φ2 φ3
(a) Song1 (b) Song2
Figure 4: Normalised gain levels of each track, evaluated
over all final mix positions. Song 3 − both sources All songs and sources
π/2 π/2
format. As shown in Fig. 3, the true instrument levels were 3π/8 3π/8
hidden from participants by displaying arbitrary fader con-
trols. The range of the faders was limited to ± 20dB from
the source, to prevent solo-ing any instrument, due to the π/4 π/4
uniqueness of the mix-space breaking down at boundaries.
π/8 π/8
5. RESULTS AND DISCUSSION
For each participant, song and source, the recorded time- 0 0
series data was downsampled to an interval of 0.1 seconds, φ1 φ2 φ3 φ1 φ2 φ3
then transformed from gain to mix domains using Eqn. 1. (c) Song3 (d) All songs
From this data the vectors representing mix-velocity, de-
scribed in Section 3.2.2, were obtained using Eqn. 3. Figure 5: Boxplots showing the distribution of φ terms at
final mix positions. While balances vary with song, vo-
5.1 Instrument levels cal/backing balance and guitar/rhythm balance are more
consistent than the bass/drums balance.
Since the experiment is concerned with relative loudness
levels between instruments and not the absolute gain val-
ues which were recorded, normalised gains can be calcu-
lated from Eqn. 2, with r = 1. When all songs, sources and
participants are considered, the distribution of normalised
cos(φ1 )
gains at the final mix positions is shown in Fig. 4, ex- vocals/backing =20 × log10 /sin(φ1 ) (4a)
pressed in LU. In Fig. 4 and 5 the boxplots show the
cos(φ2 )
guitar/rhythm =20 × log10 /sin(φ2 ) (4b)
median at the central position and the box covers the in-
terquartile range. The whiskers extend to extreme points bass/drums =20 × log10 cos(φ3 )
/sin(φ3 ) (4c)
not considered outliers and outliers are marked with a cross.
Two medians are significantly different at the 5% level
if their notched intervals do not overlap. Fig. 4 shows Balance Song 1 Song 2 Song 3 All
good agreement with previous studies, particularly a level
of ≈ −3LU for vocals [7, 10] and ≈ −10LU for bass vocals/backing -0.95 -0.23 +1.98 +0.54
(see Fig. 1 of [10]). Fig. 6 also shows the final posi- guitar/rhythm -5.15 -2.04 -1.78 -2.38
tions of all mixes of each song, where mix ‘1A’ is the mix bass/drums +2.27 -0.83 -3.35 -1.12
produced by mixer 1, starting at source A, etc. This indi-
Table 1: Median level-balances (in loudness units) from
cates a clustering of mixes based on the source position.
Fig. 5, between sets of instruments defined by Fig. 2.
Fig. 5d shows the box-plot of each φ value when data for
all songs, sources and participants is combined. Since the
audio tracks were loudness-normalised, the median value
5.2 Source-directivity
can be used to determine the preferred balance of tracks
in terms of relative loudness, using Eqn 4. The results are Movement away from the source is characterised by the
shown in Table 1. Had the experiment been performed in a first non-zero element of the mix-velocity triple u, v, w
more conventional control room with studio monitors, less (see Eqn. 3). The displacement and direction of this move
variance might have been observed [15]. is used to investigate the source directivity. Fig. 6 shows
π/2 π/2
3π/8 5B 4B
7B 1B 3π/8 1B
7B4B
5B
8B
5A 8B
B 3A
3B4A
1A B 5A 3B
1A 3A
4A
8A 8A
2B 6B 6B
2B
φ2 π/4 φ2 π/4
7A 7A
π/8 6A A
2A π/8 2A A6A
(a) Song 1 - the central cluster of mixes contains mixes originating at both sources.
π/2 π/2
5A 5B 5A 5B
3π/8 3π/8
B B
3B4B 4B8B3B
8B
7A1B 1B 7A
2B 2B
3A 6B
7B 3A
6B
1A 1A B
7
φ2 π/4 φ2 π/4
4A 4A
6A8A 8A6A
2A 2A
A A
π/8 π/8
(b) Song 2 - 7A is the only mix in this study which has more nearest neighbours from the other source.
π/2 π/2
5B 5B
3π/8 3π/8
B7B
2B 2BB
7B
3B 3B
6B 2A
6A 5A 2A
6A
5A 6B
4B
3A
8B 3A 4B
φ2 π/4 1B 1A
4A φ2 π/4 4A 1B 8B
1A
7A 7A
8A 8A
A A
π/8 π/8
(c) Song 3 - distinct cluster of mixes formed of those which started from source A
Figure 6: Positions of sources and final mixes in the mix-space. Source-directivity is indicated by added vectors.
the source positions within the mix-space, marked ‘A’ and 5.3 Mix-space navigation
‘B’. The initial vectors are also shown, indicating the direc-
tion and step size of the first changes to the mix. None of Fig. 7 shows the probability density function (PDF) of
the sources can be considered omnidirectional, as certain φn,t when averaged over the eight mixers depicted in Fig.
mix-decisions are more likely than others. This directivity 6. The function is estimated using Kernel Density Estima-
indicates that the source position has an immediate influ- tion, using 100 points between the lower and upper bounds
ence on mixing decisions. of each variable. This plot displays the mix configurations
PDF of φ1 PDF of φ1 PDF of φ1
0.1 B A 0.1 A B 0.1 B A
A A A
0.08 B 0.08 B 0.08 B
probability
probability
probability
0.06 0.06 0.06
0 0 0
0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2
probability
probability
0.06 0.06 0.06
0 0 0
0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2
probability
probability
0.06 0.06 0.06
0 0 0
0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2 0 π/8 π/4 3π/8 π/2
Figure 7: Estimated probability density functions of φ terms, for each of the three songs, averaged over all mixers. Sources
positions are highlighted with A and B. As the functions often differ it can be seen that exploration of the mix-space is
dependant on initial conditions.
which the participants spent most time listening to and it dian balances in Fig. 5c and the clustering of final positions
is seen that all distributions are multi-modal. There are shown in Fig. 6c indicate that mixers were more consistent
peaks close to the initial positions, the final positions and with this song than others. This may be due to the tracks
other interim positions that were evaluated during the mix- representing processed stems of a full mix, where the inter-
ing process. There are a number of different approaches channel balances in these stems, subject to dynamic range
to multitrack mixing of pop and rock music, one of which compression as well as the relative level of reverberation
is to start with one instrument (such as drums or vocals) and other effects, may have provided clues as to how the
and build the mix around this by introducing additional el- groups were balanced in that final mix from which stems
ements. Some participants were observed mixing in this were obtained. This further suggests that the more prior
fashion, shown in Fig. 7, where peaks at extreme values of work that has been put into the mix, the less likely subse-
φn show that instruments were attenuated as much as the quent mixers are to explore the entire mix-space.
constraints of the experiment would allow. Since this experiment gathered data for only three songs,
For Song 1, φ1 is well balanced and centered close to the results should be considered as specific rather than gen-
π/4. This indicates that mixers tended to listen in states eral. It is not known at this time how many songs would
where the relative loudness of the vocal and backing track need to be studied to be able to generalise to mixing as
were similar. A similar pattern is observed for Song 2, a whole, however, these three songs are considered to be
where φ3 , shows that the level of drum and bass tend to typical, due to their conventional instrumentation.
be adjusted such that the tracks have similar loudness (Ta-
ble 1 shows the median loudness difference within final 5.4 Application of results
mixes was <1dB). The distributions of φ2 indicates that
the guitar was often set to be of lower loudness than the In automatic fader control, rather than aiming for equal
rhythm section, as also shown in Table 1. loudness across all instruments, the preferred balances be-
There are notable differences due to the source. The dis- tween semantic pairings of instruments, shown in Fig. 5d,
tributions for Song 2 suggest that exploration depended on could be used as the target for optimisation. This would
the initial source configuration, with Source A leading to require the unsupervised clustering of audio tracks into
louder vocals and louder guitar than Source B. However, semantically-linked instrument groups, a task which is cur-
for Song 2, the distributions of φ terms are similar for rently an active area of research [16–18].
both source positions, simply offset. This suggests that, Intelligent mixing systems aim to generate audio mix-
while different regions of the mix-space were explored, tures based on some desired criteria, ideally ‘Quality’. With
they were explored in a similar fashion. a defined mix-space it is possible to utilise a number of dy-
Overall, for Song 3, the distributions in Fig. 7, the me- namic techniques in generating mixes. The results of the
experiment outlined in this paper could be used to train [6] S. Mansbridge, S. Finn, and J. D. Reiss, “Imple-
an intelligent mixing system to produce a number of al- mentation and evaluation of autonomous multi-track
ternate mixes which the user could select from, in order fader control,” in Audio Engineering Society Conven-
to further train the system. Further information regarding tion 132, Budapest, Hungary, April 2012.
mixing style can be found from the data. For example, the
probability density function of mix-velocity could differ- [7] P. Pestana and J. D. Reiss, “Intelligent Audio Produc-
entiate between mixers who mixed using either careful ad- tion Strategies Informed by Best Practices,” in AES
justment of the faders towards a clear goal or by alternating 53rd International Conference: Semantic Audio, Lon-
large displacements with fine-tuning. Knowing the distri- don, UK, January 2014, pp. 1–9.
bution of step size used by human mixers will aid optimi- [8] J. Reiss and B. De Man, “A semantic approach to au-
sation of search strategies in intelligent mixing systems. tonomous mixing,” Journal on the Art of Record Pro-
duction, vol. Issue 8, Dec. 2013.
6. CONCLUSIONS
[9] E. Deruty, F. Pachet, and P. Roy, “Human-Made Rock
For a level-balancing task, a mix-space has been defined Mixes Feature Tight Relations Between Spectrum and
using the gains of each track. A number of features of Loudness,” Journal of the Audio Engineering Society,
the space have been presented and an experiment was per- vol. 62, no. 10, pp. 643–653, 2014.
formed in order to investigate how mix engineers explore
this space for a four track mixture of modern popular music. [10] B. De Man, B. Leonard, R. King, and J. Reiss, “An
From these early results it has been observed that each analysis and evaluation of audio features for multitrack
source has a directivity that is not equal in all directions, music mixtures,” in ISMIR, Taipei, Taiwan, October
i.e. that not all possible first decisions in the mix process 2014, pp. 137–142.
are equally likely. For each song there are varying degrees
[11] M. Cartwright, B. Pardo, and J. Reiss, “Mixploration:
of clustering of final mixes and it is seen that the final mix
rethinking the audio mixer interface,” in International
is dependant on the initial conditions. The exploration of
Conference on Intelligent User Interfaces, Haifa, Is-
the space is also dependant on the initial conditions. This
rael, February 2014.
experiment has indicated a certain level of agreement be-
tween participants regarding the ideal balances between [12] A. Wilson and B. Fazenda, “Perception & evaluation
groups of instruments, although this varies according to of audio quality in music production,” in Proc. of the
the song in question. 16th Int. Conference on Digital Audio Effects (DAFx-
Ultimately, the theory presented here could be expanded 13), Maynooth, Ireland, 2013, pp. 1–6.
to include other mix parameters. Since panning, equali-
sation and dynamic range compression/expansion are each [13] ——, “Characterisation of distortion profiles in rela-
an extension to the track gain (either channel-dependant, tion to audio quality,” in Proc. of the 17th Int. Con-
frequency-dependant or signal-dependant), it should be pos- ference on Digital Audio Effects (DAFx-14), Erlangen,
sible to add these parameters to the existing framework. Germany, 2014, pp. 1–6.
[14] P. D. Pestana, J. D. Reiss, and A. Barbosa, “Loudness
7. REFERENCES measurement of multitrack audio content using modifi-
[1] M. Terrell, A. Simpson, and M. Sandler, “The Math- cations of itu-r bs. 1770,” in Audio Engineering Society
ematics of Mixing,” Journal of the Audio Engineering Convention 134, Rome, Italy, May 2013.
Society, vol. 62, no. 1, 2014.
[15] R. L. King, B. Leonard, and G. Sikora, “Loudspeak-
[2] “ISO 9000:2005 Quality management systems – Fun- ers and headphones: The effects of playback systems
damentals and vocabulary,” 2009, http://www.iso.org/ on listening test subjects,” in Proc. of the 2013 Int.
iso/catalogue detail?csnumber=42180. Congress on Acoustics, Montréal, Canada, June 2013.
[3] R. King, B. Leonard, and G. Sikora, “Consistency of [16] S. Essid, G. Richard, and B. David, “Musical instru-
balance preferences in three musical genres,” in Audio ment recognition by pairwise classification strategies,”
Engineering Society Convention 133, San Francisco, Audio, Speech, and Language Processing, IEEE Trans-
USA, October 2012. actions on, vol. 14, no. 4, pp. 1401–1412, 2006.
[4] ——, “Variance in level preference of balance engi- [17] V. Arora and L. Behera, “Musical source clustering and
neers: A study of mixing preference and variance over identification in polyphonic audio,” IEEE/ACM Trans.
time,” in Audio Engineering Society Convention 129. Audio, Speech and Lang. Proc., vol. 22, no. 6, pp.
San Francisco, USA: Audio Engineering Society, Nov 1003–1012, Jun. 2014.
2010.
[18] J. Scott and Y. E. Kim, “Instrument identification in-
[5] E. Perez-Gonzalez and J. Reiss, “Automatic gain and formed multi-track mixing.” in ISMIR, Curitiba, Brazil,
fader control for live mixing,” in IEEE Workshop on October 2013, pp. 305–310.
Applications of Signal Processing to Audio and Acous-
tics, 2009. WASPAA’09. IEEE, 2009, pp. 1–4.