0% found this document useful (0 votes)
28 views15 pages

The Reverse Engineering of A Mix

Uploaded by

Elvis Tahiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views15 pages

The Reverse Engineering of A Mix

Uploaded by

Elvis Tahiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/234136449

The Reverse Engineering of a Mix

Thesis · January 2009

CITATIONS READS

2 1,228

1 author:

Daniele Barchiesi
University College London
12 PUBLICATIONS 678 CITATIONS

SEE PROFILE

All content following this page was uploaded by Daniele Barchiesi on 18 August 2014.

The user has requested enhancement of the downloaded file.


PAPERS

Reverse Engineering of a Mix*

DANIELE BARCHIESI AND JOSHUA REISS

Centre for Digital Music, Queen Mary University of London, London, UK

It is shown how to reverse engineer the parameters that, starting from a multitrack recording,
can produce a given mix. Linear effects and dynamic processors, which comprise all the effects
commonly used in the mixing and mastering stages, are considered. Two different techniques
based on least-squares optimization are described. Starting from a multitrack recording and a
target mix, which is obtained by applying effects to each of its channels, impulse responses and
gain envelopes are calculated, which can be used to estimate gains, delays, filters, panning
settings, and combinations of these processors; or to estimate time-varying gain envelopes
produced by dynamic effects, such as compressors and expanders. Theoretical and experimental
results show that, given some assumptions about the nature of the processing originally applied,
the proposed techniques are able to precisely and efficiently retrieve the mixing parameters.

0 INTRODUCTION converted into digital formats and remastered, where the


remastering usually consists in improving their quality by
0.1 Problem Definition and Applications means of signal processing techniques (for instance,
In a music production the released product is often far denoising, click and hum removal, and so on) and
more than the recording of a music performance. It is the performing again the mixing and/or mastering processes.
result of the work of musicians and of mixing and mastering At this stage it would be very helpful to have information
engineers. Although mixing and mastering often overlap, about what effects have been applied originally and how, in
the basic difference between them is that the former acts order to produce a remastered edition that sounds better, and
on single channels or instruments and is performed before not different from the original record. In other words, it
the latter, which acts on groups of channels or instruments would be useful to have a recording of the engineer’s perfor-
or on whole mixes. Both mixing and mastering involve the mance. Unfortunately this is often not possible because,
use of many audio effects in order to process the recorded before people started using computers and digital effects,
signals for technical or artistic purposes [1], [2]. there was not an effective way to store information about
A basic classification of audio effects distinguishes the mixing and mastering parameters. The techniques
between linear and nonlinear signal processors. Gains, presented in this paper may contribute to fill this gap,
delays, stereo panners, and filters (which are, indeed, a allowing—to some extent—the reverse engineering of a mix.
combination of gains and delays) belong to the former Besides remastering, an interesting new trend in the
category, whereas dynamic effects, such as compressors music industry is the release of records along with their
or expanders, and other particular effects, such as distor- raw multitrack recordings. This allows users to create their
tion modules, belong to the latter. own mixes and to perform custom processing on the audio
In this paper we will consider linear effects and dynamic material (a couple of examples, among the most famous
processors, which comprise all the effects commonly used artists, can be found by visiting the Web sites of Nine Inch
in the mixing and mastering stages, and we will show how Nails and Radiohead.1 An MPEG format called spatial
to reverse engineer the parameters that, starting from a audio object coding (SAOC) [3] is currently under stan-
multitrack recording, can produce a given mix. In particular dardization and will rule the storage and transmission of
we will describe two different techniques, which can be multiple objects (instruments and/or speech tracks) that
used to estimate gains, delays, filters, panning settings (and can be spatially placed and remixed during reproduction.
combinations of these processors) or time-varying gain In this scenario reverse engineering of the mixed version
envelopes produced by dynamic effects. is a valuable learning tool that can show how professional
Nowadays digital audio and enhanced recording and engineers have mixed and mastered the record.
signal processing techniques allow to produce recordings
with a higher fidelity than in the past, and this, of course, 0.2 Background
reflects in a consumer’s improved listening experience. There exist very few scientific publications regarding
For this reason many old analog recordings are being the reverse engineering of a mix. A quite wide literature

1
*Manuscript received 2009 October 26; revised 2010 May 10. http://remix.nin.com and http://www.radioheadremix.com/

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 563
BARCHIESI AND REISS PAPERS

deals with related topics, such as the estimation of effects where X is the matrix whose columns contain the tracks xk.
parameters [4] or the automatic adjustment of the parame- If we consider the column space of X, which is the subspace
ters of an effect based on a target [5], [6]. There are generated by any linear combination of the input tracks,
several commercial products [7], [8] that implement this then we can project the vector representing the target
second principle, suggesting an equalization curve that ^k
mix into that space and find a set of optimal coefficients a
matches a target frequency response, which is a problem that minimize the Euclidean distance a ^ ¼ min kt  Xak
a
that has been extensively addressed in the fields of adap- between target and estimated mix [12]. This can be done
tive filtering and systems equalization [9]. Although these using the least-squares formula
tools can be useful in some situations, they do not really
^ ¼ ðXT XÞ1 X T t:
a (2)
tackle the problem of finding parameters applied to a mix,
because they act on single tracks or whole sessions and, It follows from Eq. (1) that the target mix belongs to the
therefore, are not able to distinguish different parameters column space of X. Therefore the least-squares solution is
applied to different channels or instruments. able to retrieve exactly the original gains as long as the tracks
The adress algorithm [10], although mainly designed xk are linearly independent, which is an assumption that will
for source separation, is able to retrieve the panning pa- be discussed in Section 1.2. This technique is much less
rameters of a mix and is a good example of parameter computationally expensive than a heuristic optimization pro-
estimation acting on a multitrack recording. However, to cedure such as the genetic algorithm. Moreover it produces
the best of the authors’ knowledge the only scientific exact results even with a large number of input tracks.
publication dealing explicitly with the reverse engineering A more complex mixing model can be designed if we
problem is a paper by Kolasinski [11]. The goal of his allow each input track to be processed by a linear time-
algorithm is to find the gains originally applied to a invariant (LTI) system. One of the basic principles of
multitrack recording using a genetic optimization. This is digital signal processing states that every LTI system is
a very powerful technique, which combines a random uniquely defined by its impulse response and that its out-
search with rules inspired by evolutionary processes and put y can be calculated as the convolution of the input
has been employed successfully to solve difficult optimi- signal x with the impulse response h,
zation problems. However, this method requires a huge
yðnÞ ¼ ðx  hÞðnÞ:
computational cost and produces less accurate results as
the number of tracks increases. If we express this relation in matrix notation, con-
In Section 1 we will describe an alternative approach, sidering a Pth-order LTI system, then we obtain the
which allows us to retrieve gain parameters exactly, even for following:
a large number of tracks, and which requires much less
y ¼ XP h
computational time. This technique will then be extended to
the estimation of any linear time-invariant effect applied to where XP is the matrix whose columns contain shifted
each channel of the multitrack recording. Section 2 will versions of the input signal up to the time index P.
describe the estimation of dynamic effects, and Section 3 will Since we are considering a multitrack recording, the
deal with the evaluation of the proposed techniques. Finally new mixing model can be written as
we will draw our conclusions in Section 4, along with plans
for further research. The appendixes will show some theoret- t ¼ XK; P a (3)
ical results of the estimation of linear time-invariant systems.
where the matrix XK,P contains shifted versions of all the
1 LINEAR TIME-INVARIANT SYSTEM input tracks and the vector a contains the coefficients of
ESTIMATION different Pth-order LTI systems applied to each channel.
Once again, the optimization of a can be solved projecting
1.1 Least-Squares Solution the target mix into the space generated by any linear com-
The basic principle behind all the techniques described bination of the shifted input tracks using the least-squares
in this paper is to represent the multitrack recording and formula, Eq. (2).
the final mix (which, from now on, will be referred as the This technique can be used to estimate the impulse
target mix) as vectors in a high-dimensional Hilbert space. response produced by all the audio effects that fall into
This will allow us to view and solve the estimation of the category of LTI systems, which includes gains, delays,
mixing parameters using geometric methods. stereo panners, and equalization filters. However, we can-
In the simplest mixing scenario we can assume that the not make prior assumptions on the nature of the proc-
target t(n) is generated applying different gains ak to the essing originally applied and, in particular, about the
various channels xk(n) of the multitrack recording, length of the impulse responses that generated the target
K
mix. If the multitrack recording was mixed using FIR
tðnÞ ¼ ( a x ðnÞ:
k¼1
k k (1) systems, then it is possible to increase the estimation order
P until the target belongs to the column space of XK,P,
obtaining a theoretically exact solution. On the other hand,
This linear combination can be written in matrix notation as if any IIR filter has been employed, then we would need to
t ¼ Xa estimate an infinite number of coefficients in order to

564 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

obtain an exact solution. Appendix 1 describes the relation amplified by the gain gchn applied in its channel strip.
between the original impulse responses and an upper Then there is an infinite choice of settings that will pro-
bound of the estimation error, providing a sufficient con- duce the target mix, that is, the set of parameters for which
dition for the success of the least-squares method. gchn þ gaux is constant. Whenever we are given the auxil-
It is practical to implement the least-squares problem iary buses or the master channels as part of the multitrack
described by Eq. (3) on a frame-by-frame basis, since this recording, we can still use the proposed algorithm, but the
leads to a smaller computational load. Moreover, by least-squares estimation will be applied to a set of tracks
restricting the estimation to small windows, the assump- that are no longer linearly independent. The resulting sys-
tion on the time invariance of the filters used during the tem of equations will be undetermined and, consistently,
mixing stage is no longer a strict requirement, since it is there will be an infinite set of valid parameters that can
needed only during the time interval defined by each win- produce the target mix. Still, the least-squares approach
dow. As a practical example, if we consider an eight-track leads to a desirable solution in that it will choose the pa-
recording and an estimation order P ¼ 500, the minimum rameters with the smaller l2 norm [13], avoiding unrealis-
window size that results in a determined system of equa- tic gains or filter coefficients.
tions is 8  500 ¼ 4000 samples, which corresponds to
about 90 ms at the standard CD sample rate of 44.1 kHz. 1.3 From Impulse Responses to
More details on this topic will be discussed in Section Mixing Parameters
2.1, where we will treat the gain envelope produced by Fig. 1 is the flowchart of a basic stereo mixing console.
dynamic effects as a time-varying system to be estimated The input tracks x1, . . . , xK are processed through the gains
with a frame-by-frame technique. g1, . . . , gK and the delays d1, . . . , dK. The resulting signals
In theory the least-squares method could be employed are then equalized and placed in a particular position of
to estimate the impulse response of convolutional re- the stereo field through the panning gains pL1, pR1, . . . ,
verberators, since these effects belong to the category of pLK, pRK. The composition of all the processors contained
FIR linear processors. However, the duration of a realistic in the dashed boxes in the image can be treated as a single
impulse response can easily reach several seconds, which linear time-invariant system and estimated independently
would lead to an optimization problem that is too large for for left and right channels with the technique described in
the processing power of the current computers. Section 1.1. The resulting impulse responses can then be
applied to each track in order to reproduce the left and
1.2 Linear Independence of Input Tracks and right channels of the target mix.
Undetermined Systems However, if there is the need to distinguish between equal-
The linear mixing model described in Section 1.1 ization parameters, delays, and gains, it is possible to sepa-
assumes that the input tracks contained in the multitrack rate from the estimated impulse response the contribution
recording (and their delayed versions employed in the of the equalization filter, including some assumptions on
estimation of equalization filters) are linearly indepen- the filter itself. Regarding the delays, we can assume that
dent. This means that none of those signals can be the equalization introduces the minimum possible delay,
expressed as a linear combination of the others or, more and therefore discard all the initial zero coefficients of the
intuitively, that the tracks do not contain submixes of the
various channels (or of their filtered versions). Even
when two instruments are harmonically or rhythmically
correlated, as is the case for backing vocal tracks, the
resulting signals will be linearly independent. If some of
the tracks contain leakage from different sources, which
often happens in a multichannel recording of a drum kit,
the tracks will still be linearly independent, as discussed
in Appendix 3.
Among the common practices employed during the
mixing process, we can mention the use of auxiliary buses
and the additional processing of the master channels [1].
In the former case one or several tracks are routed to an
auxiliary bus, where they are transformed by audio effects
and then added to the final mix, while in the latter addi-
tional processing is applied to the left and right master
channels after the tracks have been mixed. In those cases
the estimation of mixing parameters is not unique.
This fact is immediately obvious if we consider a sim-
ple example, which can be extended to more complex
processing chains. Suppose that one of the tracks is routed
to an auxiliary bus and multiplied by a gain gaux before
being added to the final mix that contains the same signal Fig. 1. Flowchart of a basic mixing console.

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 565
BARCHIESI AND REISS PAPERS

estimated impulse. Gain and equalization can be distin- 2 DYNAMIC EFFECTS ESTIMATION
guished assuming that the filter does not affect the norm of
the signal, and thus the gain ak of a given track will be Dynamic effects form a category of nonlinear signal
processors whose objective is to modify the dynamic range
kyk k
ak ¼ (4) of the input signal. Compressors, limiters, expanders, and
kxk k noise gates are the most common effects that belong to this
where y represents the output of the system and the norm category and are widely used for technical or artistic rea-
can be the Euclidean norm kxk2 or the infinity norm sons [1]. The shared aspect of all dynamic effects is that
kxk1 ¼ max jxðnÞj if we assume that the equalization they apply a time-varying gain to the input signal based on
n
does not affect the peak value of the input signal. a measurement of the signal level.
As can be noted in Fig. 1, every track in the left or right Fig. 2 shows a basic model of a dynamic effect. The
channel is processed by two different gains. The first, input signal x(n) is fed into a level measurement module
identified by the symbol gk, is used to balance the contri- whose output goes to a gain computer. Based on the type
bution of a particular instrument in the mix, whereas the and parameters of the effect, the gain computer outputs an
second, pLk or pRk, is used for the panning. Consider now envelope function. This signal is then filtered to produce
the parameter estimation of an arbitrary track and omit the the time-varying gain e(n), which is multiplied by the
indexes for clarity of notation. The gains aL and aR input signal to produce the output y(n).
derived from Eq. (4) are actually the product of g with Although dynamic effects do not belong to the category
the panning gains pL or pR. It is possible to separate these of LTI processors, it is possible to tackle the estimation of
two components, assuming a particular panning law. time-varying gain envelopes ek(n) for each input track,
One of the simplest and most used laws is the equal using the technique described in Section 1.1 on a frame-
power panning law, which constrains the panning gains to by-frame basis.
follow the relations
2.1 Frame-Based Polynomial Gain Estimation
p2L þ p2R ¼1 One first attempt at the estimation of gain envelopes was
pL ¼ cosðyÞ; pR ¼ sinðyÞ to perform the least-squares optimization of the gain param-
eters described by Eq. (2) on small windows, assuming that
where y 2 [0, p/2] is the angle in the stereo field. There- the gains were constant within those regions. Unfortunately
fore for a given track we can determine the first gain g by this approach does not work because the error introduced
computing the following: where the actual envelopes are not constant leads to a noisy
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi and unreliable estimation.
a2L þ a2R ¼ g2 p2L þ g2 p2R For this reason a new model has been designed where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the envelopes are allowed to follow polynomial trajecto-
¼ g2 ðp2L þ p2R Þ ries within each window, which is described by
¼ g: " #
K P

We can then divide the gains aL and aR by this value and tðnÞ ¼ ( (a k;p lðnÞ
p
xk ðnÞ: (5)
retrieve the panning angle y by inverting the panning law, k¼1 p¼0

y ¼ arccosðpL Þ ¼ arcsinðpR Þ; y 2 ½0; p=2: Here the target mix t in each small window is expressed by
the sum of the input tracks xk multiplied by a polynomial
Clearly the mixing model described in this section is a envelope of order P, which is the function contained in the
basic one, and it is possible to consider more complex proc- square brackets. l(n) represents a linear function that goes
essing chains including, for example, auxiliary buses or from 0 to 1 during the time interval defined by each window.
additional effects in the master channels. In this case the Once again, we can express this linear model in matrix
algorithm will return one valid solution for each channel, as notation, obtaining a mixing model that is similar to the
discussed in Section 1.2. Whether to apply the solution one defined by Eq. (3), except that now the matrix XK,P
directly or divide the global impulse responses into smaller does not contains shifted version of the input tracks but the
subsystems, which reflect a particular mixing model, is a multiplication between the input channels xk(n) and the
task that is outside the scope of this work. polynomial functions l(n)p. At this point we can estimate

Fig. 2. Model of a dynamic effect.

566 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

the optimal polynomial coefficients a^k;p using the least- we would like to choose an order P that is high enough to be
squares formula, Eq. (2), and define different gain enve- able to describe complex trajectories. Unfortunately these
lopes ek for each track xk, two objectives are contradictory in that, as we increase the
P estimation order P, we also need to increase the window
ek ðnÞ ¼ ( a^
p¼0
k;p lðnÞ
p
: length to ensure that the least-squares algorithm solves an
(over)determined system of equations. As a result the esti-
mation will be successful if the target envelopes are smooth
2.2 Polynomial Estimation and enough to be correctly described by polynomial functions
Envelope Smoothness of a given order within each window.
The two critical parameters that must be controlled for There is not a simple way of describing the smoothness
the envelope estimation are the polynomial order and the of the envelope produced by a dynamic processor, because
length of the window used in the frame-based algorithm. this will depend on the particular implementation of the
Adjusting these two variables involves a tradeoff between effect. As will be shown in Section 3.2, we empirically
model complexity and number of variables to be estimated. found that a polynomial order between 2 and 6, with a
On one hand we would like to choose short windows window length chosen to be four times the total number
because the envelopes are in general low-frequency func- of estimation variables, provides good results, considering
tions, which are likely to be correctly approximated by different dynamic effect models.
polynomial functions in small regions. On the other hand
3 EVALUATION
Table 1. Mixing parameters for four-track test recording. 3.1 Evaluation of LTI System Estimation
Track Gain (dB) Delay (samples) Equalization In order to evaluate the LTI algorithm, we first mixed a
four-track test recording using different LTI processors for
Drums 6 0 Free-hand FIR 3 each channel. We then compared their impulse responses
Guitar 0 30 Low-pass IIR 4 with the ones estimated using our method.
Bass 6 50 None Table 1 shows the mixing parameters applied to each
track. The recording is sampled at the standard CD quality
Percussion 0 0 None
(44.1 kHz/16 bit) and is 30 seconds long. Figs. 3 and 4

Fig. 3. Drums equalization filter. (a) Frequency response. (b) Impulse response.

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 567
BARCHIESI AND REISS PAPERS

show the frequency and impulse response of the filters equalized are scaled and shifted versions of the original
used in the test. The equalization applied to the drums responses, where the scaling depends on the gain applied
track is obtained using a 50th-order FIR filter whose to the track and the shift depends on the delay. The
frequency response is an interpolation between fixed gains impulse responses estimated for the channels that had not
defined at 100, 1000, and 10 000 Hz. The second filter used been equalized are a delta function, scaled and delayed
for the equalization of the guitar track is a second-order appropriately.
low-pass IIR filter with cutoff frequency at 1 kHz.
Fig. 5 depicts the impulse responses retrieved for each 3.2 Compression Envelope Estimation
track choosing an estimation order P ¼ 100. As can be The estimation of dynamic effects envelopes has been
seen, the impulse responses of the tracks that had been evaluated mixing an eight-track test recording with a

Fig. 4. Impulse response of guitar equalization filter.

Fig. 5. Multitrack estimation of linear time-invariant systems.

568 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

dynamic compressor applied to each channel. The spe- each track, choosing within ranges of typical values. All
cifics of the recording are the same as for the one used in models follow the scheme depicted in Fig. 2, but each one
the previous evaluation. has a different rms level measurement.
A dynamic compressor generates a gain envelope based
on a measurement of the rms value of the input signal, • Model A The rms is computed using Eqn. (6) on a
which is defined as frame-by-frame basis. The windows are overlapping by
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 50% of their length, and linear interpolation is used
u M=21
u x2 ðn  mÞ between consecutive frames.
rmsðnÞ ¼ t ( M
: (6) • Model B The rms is approximated by Eq. (7). The
m¼M=2
parameter a is fixed and computed using a time constant
of 100 ms.
However, in the implementation of real-world effects this
• Model C The rms is computed filtering the square of
measurement is often approximated by filtering the
the input signal with the same time-variant IIR used to
squared input signal with a first-order low-pass IIR filter
process the compression envelope after the gain com-
and taking the square root of the output [14],
puter. The filter is followed by a square-root calculation.
rms2 ðnÞ ¼ ax2 ðnÞ þ ð1  aÞrms2 ðn  1Þ (7)
Once the tracks had been mixed, we proceeded with the
where a is the time constant of the filter. The same first- estimation of the compression envelopes choosing a poly-
order IIR low-pass filter is then applied to the time- nomial order P ¼ 4 and a window whose length was
varying gain produced by the gain computer (see Fig. 2) chosen to be four times the total number of estimation
as a smoothing filter, but this time using two different parameters: 4  4  8 ¼ 128 samples.
time constants during the attack and the release portions Figs. 6, 7, and 8 show the results for the three different
of the input x(n). models. The black and gray lines represent the true and
According to the literature [4], [14], [15] there are estimated envelopes, respectively.
various ways of choosing the relation between the user- As can be seen, the algorithm is able to retrieve the
defined time constants of the effect (which can be speci- correct envelopes in most of the regions where only the
fied in ms) and the coefficient a. Moreover different black lines are visible. There are some areas in tracks 6, 7,
implementations may use only one of the two filters and 8 where the estimation is wrong, but this is due to the
placed before and after the gain computer, or may use the fact that the channels do not contain any signal in those
time-varying filter with attack and release time constants regions, and therefore this does not affect the accuracy of
for both the rms measurement and the gain envelope the method. However, Fig. 7 shows some small estimation
smoothing. errors in most tracks. This is because the compression
Depending on all these variables, the resulting compres- model B produces the least smooth envelopes, and the
sion envelope may be more or less smooth, and therefore its fourth-order polynomial used in the estimation is not able
estimation can be more or less successful, as described in to follow accurately the gain trajectories in each window.
Section 2.2. For this reason we decided to test three differ- These errors may be reduced by trying other values of
ent compressor models whose four standard parameters polynomial order and window length or processing the
(threshold, ratio, attack, and release) were set randomly for estimated envelopes with a smoothing filter.

Fig. 6. Estimation of compression envelopes, model A.

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 569
BARCHIESI AND REISS PAPERS

3.3 Reverse Engineering Demonstration dynamic compressors are implemented using model C,
Software described in Section 3.2. After having set all the mixing
We have developed a demonstration software that parameters it is possible to create the target mix using the
can be used to test our proposed algorithms and which button in the lower left panel. Alternatively the target mix
can be downloaded freely from http://www.isophonics. can be loaded from an external file choosing the mode
net/content/reverse-engineering-mix. option in the same panel.
Fig. 9 shows the main GUI of the program. The main The lower right portion of the GUI contains the controls
panel in the upper part of the interface is a basic stereo for the estimation of mixing parameters. Choosing from
mixer. The user can load up to eight channels of a the mode option and typing the estimation order, it is
multitrack recording and create a custom mix adding possible to test both the LTI systems and the dynamic
effects such as delay, equalization filters, compressors, effects estimation. Once the optimization is completed,
gains, and pan controls. The equalization is obtained using the user can view the target and estimated equalization
128th-order FIR filters designed to have a free-hand fre- pressing the Show EQ button in each channel of the mixer
quency response similar to the one shown in Fig. 3. The or the target and estimated envelopes pressing the Show

Fig. 7. Estimation of compression envelopes, model B.

Fig. 8. Estimation of compression envelopes, model C.

570 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

ENV buttons. More details on the functionalities of the been correctly identified. There is an offset in the global
software can be found on the Web, along with a short gain of the equalizers, which is due to the difference
video, which shows how to use it to reproduce results between the gain set in the logic channel strips, the attenua-
analogous to the ones presented in this paper. tion caused by the panning, and the parameters retrieved by
the reverse engineering demonstration software. However,
3.3.1 Real-World Example this does not affect the accuracy of the solution since adding
In order to test the proposed estimation algorithms in a the various attenuations results in the same global gain.
real-world situation, we mixed a six-track recording using A noisy estimation can be observed in correspondence
the Apple Logic Pro software. The signals are sampled at with the very high frequencies of the retrieved equalizer
the standard CD quality and are available as part of the responses. We believe that this is due to the quantization
downloadable demonstrative application. Fig. 10 depicts noise introduced when exporting the mix. An intuitive
a screen shot of the mixer panel. Once the mixed version explanation of this fact is that the least-squares algorithm
had been exported, we proceeded with the parameter tries to adjust the high-frequency components of the sig-
estimation loading the mix into the reverse engineering nals in order to match the quantization noise.
demonstration software. We set the estimation order to
512 and ran the LTI systems algorithm. When the algo- 4 CONCLUSIONS AND FURTHER RESEARCH
rithm terminates, the mean normalized error for the left
and right channels is shown in the GUI, 4.1 Current Achievements
  In this paper we proposed two algorithms based on a
1 ktL  eL k ktR  eR k
e ¼ þ least-squares optimization that can be used for reverse engi-
2 ktL k ktR k
neering a mix. The evaluation of our techniques shows that,
where tL and tR are the target mixes in the left and given the raw multitrack recording and the final or target
right channel, respectively, and eL and eR are the esti- mix, it is possible to estimate the parameters of a wide
mated mixes. In our experiment the error resulted in range of different effects, including linear time-invariant
e  5:42  104 . processors (gains, delays, stereo panners, and filters) and
Figs. 11 and 12 show the estimated frequency response of dynamic effects.
the drums and guitar equalizers, which had been processed The theory behind the optimization process is based
with the Channel EQ in Logic to produce the target mix. As on the definition of linear mixing models and on the simple
can be seen, the frequency response of the two filters has principle of projection in a vectorial space. Therefore the

Fig. 9. Main GUI of reverse engineering demonstration software.

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 571
BARCHIESI AND REISS PAPERS

estimation requires a very small computational cost if com- by dynamic processors if only one of the two categories of
pared with heuristc optimization algorithms. Moreover the effects has been used in the mix. In order to tackle this
retrieved functions are impulse responses and gain enve- problem, one approach is to exploit the fact that the num-
lopes, which are general parameters that do not require any ber of parameters of most dynamic effects is very small
knowledge about the implementation of the original effects. if compared to the number of variables required for the
estimation of the envelopes. (For instance, the typical
4.2 Further Research parameters of a compressor are threshold, ratio, attack,
The proposed system allows one to retrieve the impulse and release.) If we consider a particular compressor model
responses of linear effects or the gain envelopes produced it is possible to define the envelope as a function of the

Fig. 10. Screenshot of mixer panel used to generate target mix.

Fig. 12. Estimated frequency response of guitar equalizer (right


Fig. 11. Estimated frequency response of drums equalizer. and left channels).

572 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

parameters mentioned and perform a joined optimization [7] S. T. Pope and A. Kouznetsov, “Expert Mastering
of linear effects and compression over large windows of Assistant,” Tech. Document. (University of California and
the signal. However, since dynamic effects are nonlinear, FASTLab, Inc., Santa Barbara, CA, 2008 Sept.), p. 11.
this optimization cannot be performed using a simple [8] TC Electronic Inc., Assimilator Manual (2002);
least-squares approach. Preliminary results show that, http://www.tcelectronic.com/assimilatordownloads.asp.
even considering a single track and one of the compressor [9] O. Kirkeby and P. A. Nelson, “Digital Filter
models described in Section 3.2, it is still an open problem Design for Inversion Problems in Sound Reproduction,”
how to retrieve the compression parameters. J. Audio Eng. Soc., vol. 47, pp. 583–595 (1999 July/Aug.).
Another strategy that can be investigated is to perform a [10] D. Barry, B. Lawlor, and E. Coyle, “Real-Time
time–frequency analysis of the target mix. Since dynamic Sound Source Separation: Azimuth Discrimination and
effects and filters are used to modify the signals in the time Resynthesis,” presented at the 117th Convention of the
and frequency domains, it may be possible to separate Audio Engineering Society, J. Audio Eng. Soc. (Abstracts),
their contributions and perform two separate estimations. vol. 53, p. 104 (2005 Jan./Feb.), Convention Paper 6258.
Another direction for further research regards the [11] B. Kolasinski, “A Framework for Automatic
improvement of the proposed algorithms and, in particu- Mixing Using Timbral Similarity Measures and Genetic
lar, of the LTI systems estimation. As described in Appen- Optimization,” presented at the 124th Convention of the
dix 1, the convergence of the algorithm depends on the Audio Engineering Society, (Abstracts) www.aes.org/
target impulse response and on the time delays considered events/124/124thWrapUp.pdf, (2008 May), convention
in the optimization. The present technique takes into paper 7496.
account the first P coefficients of each FIR filter, which [12] D. Barchiesi and J. Reiss, “Automatic Target
leads to an optimal solution only if the original filters are Mixing Using Least-Squares Optimization of Gain and
minimum phase. It may be possible to improve the robust- Equalization Settings,” in Proc. 12th Int. Conf. on Digital
ness of the algorithm by finding an optimal set of delays Audio Effects (DAFx ’09), vol. 1 (2009 Sept.), pp. 7–14.
using a matching pursuit type algorithm [16]. [13] S. Boyd and L. Vandenberghe, Convex Optimiza-
Finally one of the main disadvantages of the simple least- tion (Cambridge University Press, New York, 2004).
squares approach is that it is sensitive to noise. We observed [14] U. Zolzer, DAFx: Digital Audio Effects (John
this problem in the LTI estimation described in Section Wiley & Sons, Chichester, UK, 2002).
3.3.1, where the signal was corrupted by very low quantiza- [15] G. W. McNally, “Dynamic Range Control of Dig-
tion noise. This may be solved by employing a regularized ital Audio Signals,” J. Audio Eng. Soc., vol. 32, pp. 316–
least-squares method that enforces a constraint on the 327 (1984 May).
smoothness of the estimated frequency responses. [16] P. S. K. Y. C. Pati and R. Rezaiifar, “Orthogonal
Matching Pursuit: Recursive Function Approximation
with Applications to Wavelet Decomposition,” in Conf.
REFERENCES
Re., 27th Asilomar Conference on Signals, Systems and
[1] R. Izhaki, Mixing Audio: Concepts, Practices and Computers, vol. 1 (1993 Nov.), pp. 40–44.
Tools, 1st ed. (Focal Press, Oxford, UK, 2008).
[2] B. Katz, Mastering Audio: The Art and the Science
APPENDIX 1
(Focal Press, 2002).
CONVERGENCE OF THE LTI SYSTEMS
[3] J. Breebaart, J. Engdegård, C. Falch, O. Hellmuth,
ESTIMATION
J. Hilpert, A. Hoelzer, J. Koppens, W. Oomen, B. Resch,
E. Schuijers, and L. Terentiev, “Spatial Audio Object Let tðnÞ 2 RN be the target mix and
Coding (SAOC)—The Upcoming MPEG Standard on
Parametric Object Based Audio Coding,” presented at the  ¼ spanfxk ðn  pÞg; k ¼ 1; . . . ; K; p2
124th Convention of the Audio Engineering Society, be the subspace generated by any linear combination of
(Abstracts) www.aes.org/events/124/124thWrapUp.pdf, the input tracks xk delayed by p samples (where L repre-
(2008 May), convention paper 7377. sents an arbitrary set of delays considered during the esti-
[4] U. Simmer, D. Schmidt, and J. Bitzer, “Parameter mation).
Estimation of Dynamic Range Compressors: Models, Proce- In general the target mix t can be expressed as a linear
dures, and Test Signals, presented at the 120th Convention of combination of infinite elements,
the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts)
vol. 54, p. 736 (2006 July/Aug.), convention paper 6849. K þ1
[5] S. Heise, M. Hlathy, and J. Loviscach, “Automatic
Adjustment of Off-the-Shelf Reverberation Effects,”
tðnÞ ¼ ((a
k¼1 p¼1
k; p xk ðn  pÞ

2 3
presented at the 126th Convention of the Audio Engineering K
Society, (Abstracts) www.aes.org/events/126/126thWrapUp.
pdf, (2009 May), convention paper 7758.
¼ (4( a
k¼1 p2
k; p xk ðn  pÞ þ (a
2
p=
k; p xk ðn  pÞ5:

[6] D. Reed, “A Perceptual Assistant to Do Sound


Equalization,” Proc. 5th Int. Conf. on Intelligent User Every vector v 2 RN can be written as the sum of its
Interfaces (New Orleans, LA, 2000 Jan.), pp. 212–218. projection on the subspace P (v) and a component

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 573
BARCHIESI AND REISS PAPERS

orthogonal to the subspace ? ðvÞ. Therefore the target However, if our goal is to retrieve the equalization curve
mix can be written as that has been used to process a given input channel, con-
t ¼ P ðtÞ þ ? ðtÞ: sidering the distance in the frequency domain is a much
more meaningful metric for the optimization algorithm. In
The estimation error J is the squared norm of the differ- fact this is not an issue because the least-squares solution
ence between target and estimated mix, is identical with orthogonal transforms.
Let F be the matrix whose rows contain the Fourier
J ¼ kt  P ðtÞk2
basis. Consider now the mixing model [Eq. (3)] in the
¼ k? ðtÞk2 : (8) Fourier domain,
The orthogonal component of the target mix is Ft ¼ FðXaÞ
8 2 39
< K = where we omitted the subscripts K, P for clarity of nota-
? ðtÞ ¼ ? 4
: k¼1 p 2 ((
ak;p xk ðnpÞþ ak;p xk ðnpÞ5
; (
p2
=
tion. The least-squares solution of this model can be writ-
ten as
(
K ^ ¼½ðFXÞH ðFXÞ1 ðFXÞH Ft
a
¼ ( (
k¼1 p2
ak;p ? ½xk ðnpÞ
¼ðX T FH FXÞ1 XFH Ft
)
where the operator ()H indicates the complex conjugate or
þ (
p2
=
ak;p ? ½xk ðnpÞ : Hermitian of its argument. Since F is an orthogonal
matrix, the last equation reduces to
The orthogonal component of vectors belonging to the ^ ¼ ðXT XÞ1 Xt
a
subspace  is zero, so the previous equation reduces to
which is the least-squares solution in the time domain.
K
? ðtÞ ¼ ((a
k¼1 p 2
=
k; p ? ½xk ðn  pÞ:
APPENDIX 3
LINEAR INDEPENDENCE OF MICROPHONE
Substituting into Eq. (8) leads to
RECORDINGS IN THE PRESENCE OF
 2
 K  INTERFERING SOURCES
 
J¼ 
((
 k¼1 p 2= 
ak; p ? ½xk ðn  pÞ

 Consider the drum kit depicted in Fig. 13, which con-
sists of J sources fsj gJj¼1 recorded using J microphones,
K producing the tracks fxk gJk¼1 . Each of the signals captured
 ((a
k¼1 p 2
=
k; p k? ½xk ðn
2
 pÞk2 : (9) by the microphones will contain a linear combination of
the sources,
The squared norm in Eq. (9) is bounded by J

B ¼ maxk? ½xk ðn  pÞk2 :


xk ðnÞ ¼ ( g s ðn   Þ
j¼1
jk j jk
k; p

Therefore the total error J will be where Djk and gjk are the delay and the attenuation due
to the distance between the jth source and the kth
K

((a
microphone.
JB k; p :
2

k¼1 p 2
In matrix form, stacking the (appropriately delayed)
=
sources in the columns of matrix S, the previous equation
This result shows that the estimation error is bounded by can be written as
the energy of the impulse response in the region that is X ¼ SG (10)
not taken into account during the optimization.
For example, if the subspace  is generated by the set where X contains the microphone signals in each of
{xk (n  p)}, where p ¼ 0, . . . , P, the estimation will its columns and G will be referred to as the mixing
produce a small error only if the energy of the impulse matrix. The matrix S contains linearly independent col-
responses applied in the target mix drops to zero after the umns for the reason explained in Section 1.2. Therefore
Pth sample. its rank will be equal to J2. In order to prove the linear
independence of the recorded tracks xk, we must show that
APPENDIX 2 the matrix X has rank J. For the properties of ranks we
EQUIVALENCE OF LEAST-SQUARES SOLUTION have that a sufficient condition for the linear indepen-
IN THE TIME AND FREQUENCY DOMAINS dence of the observed signals xk is given by the mixing
matrix G being full rank.
The method described so far estimates LTI systems Let us denote sjk ¼ sj(n  Djk) as the signal produced by
finding the optimal impulse responses in the time domain. the jth source and delayed according to the distance

574 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August
PAPERS REVERSE ENGINEERING OF A MIX

between sj and the microphone xk. Then we can explicitly As a consequence the inner product hgj,gki is zero for all j
write Eq. (10) as 6¼ k, and the columns of the mixing matrix are mutually

2 3
g00 2 0 3
6 7
6 g10 76 0 7
6 76 7
6 .. 76 .. 7
6 . 76 . 7
6
7
6 76 7 2
2 36 76 3
j j j j j j 6 gJ0 76 0 7 j j
6 76 . 7
6 76 7 . 7 6 7
4 s00 s10    sJ0    s0J s1J    sJJ 56 ... 76 7 ¼
. 7 4 1 x  xJ 5:
6 76
j j j j j j 6 76 7
j j
6 0 76 0J 7
6 g
7
6 76 7
6 0 76 g1J 7
6 7
6 . 76 7
.. 7
6 . 76 4 . 5
4 . 5
0 gJJ

Denoting each column of the matrix G by gk, we can orthogonal. This ensures that matrix G is full rank, and
observe that those vectors are sparse with disjoint support. that the observations xk are linearly independent.

Fig. 13. Typical setup of a multichannel drum recording.

THE AUTHORS

D. Barchiesi J. Reiss

J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August 575
BARCHIESI AND REISS PAPERS

Daniele Barchiesi was born in Desenzano del Garda, retrieval systems, time scaling and pitch shifting tech-
Italy, in 1985. He joined the Centre for Digital Music at niques, polyphonic music transcription, loudspeaker
Queen Mary University of London, UK, in 2008, where design, automatic mixing for live sound, and digital
he received an M.Sc. degree in electronic engineering in audio effects. His primary focus of research, which ties
2009. He is currently pursuing a Ph.D. degree, working on together many of these topics, is on the use of state-
sparse representations for blind deconvolution problems. of-the-art signal processing techniques for professional
His main research interests include signal processing and sound engineering.
optimization for audio applications. In his spare time he Dr. Reiss has published over 80 scientific papers and
enjoys singing and playing the piano. serves on several steering and technical committees. As
l
coordinator of the EASAIER project, he led an inter-
national consortium of seven partners working to improve
Josh Reiss received a Ph.D. degree in physics from the access to sound archives in museums, libraries, and
Georgia Institute of Technology, specializing in analysis cultural heritage institutions. He is cochair of the AES
of nonlinear systems. Technical Committee on High-Resolution Audio. He was
He is presently a senior lecturer with the Centre for program chair of ISMIR2005. In 2007 he was general
Digital Music at Queen Mary University of London, UK. chair of the 31st AES Conference, “New Directions in
He made the transition to audio and musical signal High-Resolution Audio,” and in 2009 he was general
processing through his work on sigma–delta modulators, secretary of the 35th AES International Conference,
which led to patents and a nomination for a best paper “Audio for Games.” He was also chair of the recent
award from the IEEE. He has investigated music 128th AES Convention in London.

576 J. Audio Eng. Soc., Vol. 58, No. 7/8, 2010 July/August

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy