Adam2006 Workshop PassAcoustDetection
Adam2006 Workshop PassAcoustDetection
www.elsevier.com/locate/apacoust
Laboratoire Images, Signaux et Syste`mes Intelligents groupe Ingenierie des Signaux Neuro-Sensoriels,
Universite Paris 12, France
b
Defence R&D Canada Atlantic, P.O. Box 1012, Dartmouth, NS, Canada B2Y 3Z7
c
Naval Undersea Warface Center, Code 71, Bldg 1351, Newport, RI 02841, USA
Song of the Whale Research Team, International Fund for Animal Welfare Charitable Trust, London, UK
e
Sea Mammal Research Unit, Gatty Marine Laboratory, University of St. Andrews, Scotland, UK
f
Marine Mammal S&T Program, Oce of Naval Research, Arlington VA 22217-5660, USA
Received 25 May 2006; accepted 25 May 2006
Available online 31 July 2006
Abstract
The Second International Workshop on Detection and Localization of Marine Mammals Using
Passive Acoustics was held in Monaco, 1618 November 2005, two years after the rst workshop on
the same topic, held in Dartmouth, NS, Canada in November 2003. This paper is the overview of
this workshop.
2006 Elsevier Ltd. All rights reserved.
Keywords: Workshop; Detection; Localization; Marine mammals; Passive acoustics; Monaco; Oceanographic
musee
0003-682X/$ - see front matter 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.apacoust.2006.05.013
1062
1. Introduction
The Second International Workshop on Detection and Localization of Marine Mammals
Using Passive Acoustics was held in Monaco, 1618 November 2005, two years after the
rst workshop on the same topic, held in Dartmouth, NS, Canada in November 2003.
Sperm whales are deep diving odontocetes with maximum recorded dive depths well in
excess of 1000 m. Long dive durations of 3045 min [13] make them particularly dicult to
detect visually. While submerged, sperm whales produce loud regular clicks with measured
on-axis source levels of up to 236 dB re: 1 lPa rms [8] These clicks make the sperm whale
an excellent candidate for passive acoustic detection and location.
Researchers need to detect sperm whale clicks for a number of reasons. These include
studies of social behavior (see for example [11]) and population surveys (e.g. [2,5]).
Increasing concerns about the eects of loud anthropogenic sounds on marine mammals
also require us to be better able to detect and localize marine mammals.
The purpose of this workshop was to present the current research on the detection and
localization of marine mammals using passive acoustics. Researchers were invited to present their scientic work, to detail the advantages and drawbacks of the methods they have
used and to show their recent results through a practical approach. The last half-day was
reserved for the comparison of all scientic methods presented using the same database of
recorded odontocete signals (sperm whale), provided by NUWC (US).
The workshop has encouraged presentations from dierent elds. Scientic researchers
were specialists in biology, acoustics, signal processing, mathematics, electronics, computer science. The workshop was opened to Masters and PhD students.
At this workshop, the 45 oral presentations of the 12 oral sessions have been lesson by
150 participants from 17 dierent countries: France, USA, UK, Spain, Canada, Denmark,
Italy, Germany, Switzerland, Brazil, Greece, Holland, Luxemburg, Monaco, Poland, Russia, and Slovenia.
In this overview, we present a brief comparison of detection and localization results,
with summary of discussions, including discussion about 3rd workshop.
2. General summary
The workshop was divided into sessions on Detection, Localization and Signal Processing, Bearing, Material, Tools, and Applications Using Passive Acoustics.
During the session dedicated to Detection, one of the goals was to detect sounds emitted
by marine mammals either in continuous recordings or in real-time applications. It is extremely dicult and sometimes impossible to identify the sound source while recording cetacean underwater sounds by omnidirectional hydrophone. Several individuals in a group
had the same problems with detection. For identication of the signals, dierent signal processing approaches for real-time detection of vocalizing marine mammals, both classical
and new, were submitted: binary thresholded Fast Fourier Transform, DTW algorithm
for classifying blue whale infrasonic calls, Schur Algorithm, Hilbert Huang Transform,
Sinewave Modeling and Bayesian Inference, Teager Kaiser Energy Operator, etc.
Regarding the session on localization, several techniques were presented, including the
use of a single hydrophone, 2 hydrophones (taking into account reections), or a network
of hydrophones. The authors demonstrated their performance results, in terms of the precision of the position. This approach is particularly important, especially to avoid expo-
1063
sure of marine mammals to dangerous sound levels (especially from High power active
sonar).
One of the strongest moments of the workshop concerned the denition of a sounds
generator model for sperm whales. Several oral presentations showed that it was possible
to complete the known model dened by Mohl. This model had been conrmed by the
results obtained either by one hydrophone or by hull-mounted hydrophones.
We have moreover reserved a session dedicated to materials and software: use of a large
aperture, ber-optic linked, vertical linear array for recordings of deep-diving cetaceans;
development of a satellite telecommunicating acoustics buoy network for real-time localization; Acoustic Tool Development with XBAT PAMGUARD, Ishmael, etc.
In the end, the workshop permitted the opening of the use of passive acoustics for goals
other than detection or localization. During one session, we were able to see how the
hydrophones could give us other information on marine mammals: measuring whale hearing during echolocation, using acoustic localization techniques to study long term reproductive ecology in an aquatic mating pinnipeds, the bearded seal, Sperm whale acoustic
size estimation, as well as extracting the number of marine mammals.
3. Dataset description
Two datasets of sperm whale (Physeter macrocephalus) vocalizations were provided by
NAVSEA Newport to the 2nd International Workshop on Detection and Localization of
Marine Mammals using Passive Acoustics, held in Monaco November 1618, 2005. The
data were collected on March 25th and 30th, 2002 at the US Navys Atlantic Undersea Test
and Evaluation Center (AUTEC), located o Andros Island, Bahamas in the Tongue of the
Ocean (TOTO). The AUTEC range consists of wideband, bottom-mounted hydrophones
located approximately 2 nmi apart, at a depth of roughly 1800 m. The sperm whale data
covered a bandwidth of approximately 50 Hz to 24 kHz. The data were recorded using synchronized Tascam 8-channel digital recorders with 16 bit samples at a sample rate of
48 kHz. The data were then extracted to waveles from the recorders. Additional information, such as relative hydrophone locations and depths, relevant AUTEC SVPs, observer
sighting notes, and notes regarding sounds present on the recordings were also provided.
The datasets were collected on two geographically separated groups of hydrophones
(Table 1). The rst data set consisted of six hydrophone channels, labeled A through F.
The recordings, each 20 min long, were of multiple sperm whales vocalizing simultaneously. This dataset was collected between 22:40:49 and 23:00:49 UTC. During this period, observers from the Woods Hole Oceanographic Institution (WHOI) saw three sperm
whales in the vicinity of the hydrophones, though more may have been present. The second dataset was recorded on ve hydrophones, G through K. These recordings were each
25 min long, recorded from 15:24:25 to 15:49:25 UTC, of a single vocalizing sperm whale
with reverberation. WHOI observers saw either one or two sperm whales during the period from 17:16:29 to 18:21:28 UTC in the vicinity of these hydrophones.
A time oset was found in both data sets between channels recorded on two separate
Tascam recorders. A time oset of 1.4987 s needs to either be subtracted from the times
for hydrophones A, B and C, or added to hydrophones D, E and F, for Data Set #1 to
be properly time-aligned. In order to properly time-align the data provided for Data Set
#2, an oset of 2.3395 seconds must be either subtracted from the times for hydrophones
G and H, or added to hydrophones I, J and K.
1064
Table 1
Hydrophone relative locations and depths
Hydrophone
X (m)
A
B
C
D
E
F
G
H
I
J
K
18501.34
10447.48
14119.88
16179.48
12557.48
17691.59
10658.04
12788.99
14318.86
8672.59
12007.50
Y (m)
9494.38
4244.37
3034.43
6294.09
7471.71
1975.28
14953.63
11897.12
16189.18
18064.35
19238.87
Z (m)
1687.63
1677.27
1627.08
1672.47
1670.99
1633.92
1530.55
1556.14
1553.58
1361.93
1522.54
1065
sounds from that individual animal in the recording. The archive currently contains some
14,000 sounds from 8 species of baleen whale.
5. Localization results
Three participants presented localization results from the workshop dataset. Dataset 1
was processed by only one participant (Ron Morrissey), and Dataset 2 was processed by
three groups: Giraudet and Glotin (Universite du Sud Toulon Var), Morrissey et al.
(Naval Undersea Warfare Center), and Nosal and Frazer (University of Hawaii at
Manoa). This section summarizes the techniques used by the dierent groups, and presents
their combined results. A full description of their techniques is available in their associated
workshop papers [3,9,10].
5.1. Giraudet and Glotin
In the Giraudet and Glotin technique [3], the signal is rst passed through a full-wave
rectication and thresholding process to detect clicks from background noise. The signals
from the various hydrophones are subsampled and cross-correlated in 10-s chunks for an
initial estimate of the Time Dierences Of Arrivals (TDOAs). The ve highest correlation
peaks are kept for each hydrophone pair. TDOAs that correspond to reected paths
(echos) are eliminated by identifying them with an autocorrelation of the signals over
one-minute periods for each hydrophone. If TDOAs dier by more than 50 ms from at
least 2 other TDOAs calculated within 100 s for a given hydrophone pair, they are eliminated. Over each 10-s window, TDOAs that correspond to a single source recorded on
four dierent hydrophones are identied by time-gating the TDOAs within a 2 ms error
window. For these four hydrophones, three independent TDOAs are selected (relative to a
single hydrophone), and recalculated precisely with the original time series (not the subsampled ones). These nal TDOAs are used to calculate the source position by minimizing
the least mean square (LMS) error on the positions, using a constant sound speed of
1500 m/s.
5.2. Morrissey et al.
The detection algorithm of Morrissey et al. [9] was developed as part of the US Marine
Mammal Monitoring on Navy Ranges (M3R) program. The detection is a two-stage process utilizing a binary thresholded FFT as the rst stage. The threshold is a function of
time and frequency, and is based on the time-averaged power within each frequency
bin. The FFT output is divided into two broad classes: clicks and other. Clicks are split
out of the data stream and sent to a data association algorithm called a scanning sieve.
This technique uses a master channel as a template, and correlates it across other channels
to estimate Time Dierences Of Arrival (TDOAs). TDOAs are fed into standard 2-D and
3-D hyperbolic localization algorithms.
5.3. Nosal and Frazer
The method of Nosal and Frazer [10] relies exclusively on the dierence in arrival
time between direct and reected paths (DRTDs). A depth-dependent sound-speed
1066
prole, typical for the area and time of recording, is used. At each receiver (5 total in
the dataset), DRTDs are modeled via the ray-based model BELLHOP [12] for a list of
ranges and candidate source depths. The 25-min recordings are split into 20 s time intervals that overlap by 15 s. Each time interval and candidate source depths are processed
separately. For each receiver, the measured DRTD is estimated from the recorded regular sperm whale clicks by using a detection and classication scheme that picks out
associated pairs of direct and surface-reected arrivals. The measured DRTD is compared to the modeled DRTDs, and the range that best agrees gives the estimated horizontal separation between the source and receiver. This denes a circle centered at the
receiver with a radius given by the estimated horizontal separation. The point closest
to all 5 receiver circles is chosen as the estimated whale position for the current depth.
Closest is determined by creating a surface for each receiver that has value 1 on the
circle and that decays to zero away from the circle (both inward and outward). A
Gaussian weighting is used for the decay. The 5 surfaces are added to create an ambiguity surface. Its maximum is declared the estimated whale position for the current
depth. Estimated positions are compared at all depths, and the one with the maximum
ambiguity value is the estimated whale position for the current time interval. The ambiguity value corresponding to this position is the performance value for the time interval.
Times with performance values below a given threshold are eliminated from the results,
as are times for which DRTDs cannot be estimated on at least 4 receivers. The process
is automated in MATLAB.
5.4. White et al.
The method of White et al. [12] is a variant on time dierence of arrival algorithms.
Estimates of the time delays between sensor pairs are computed using a single reference
sensor (sensor number 2). These delays are not computed using the raw acoustic data
but using an optimal detection statistic. Surface-reected clicks are identied on the basis
of their extended delay times, caused by surface reverberation. A histogram-based scheme
is employed to reject spurious peaks in the cross-correlation function. A delay is computed
for 30 s blocks and successive blocks are overlapped by 75%. The delays are then converted to estimates of source location by minimising a weighted mean squared error cost
function. The cost function reects the dierence between the modelled and estimated
delays. The weights in the cost function are computed directly from the delay data and
reect the level of condence in each of the delay estimates. The model used for the prediction of delay estimates is based on a linear sound speed prole, rather than an isospeed
model. The time delays for such a linear prole are computed analytically, resulting in an
ecient algorithm. The weighted least squares cost function is optimised using a deterministic optimisation algorithm (NelderMeade) which is initialised with a constellation of
points about the previous estimated source location. Unfeasible solutions, e.g. below
the plane of the sensors, are avoided by augmenting the cost function using a quadratic
barrier penalty function.
5.5. Combined results
The results from the three groups are shown in Fig. 1, which displays the Northing,
Easting and depth values as a function of time.
1067
Fig. 1. Localization results (Northing, Easting, depth): Nosal and Frazer (+); Giraudet and Glotin (h);
Morrissey et al. (*), White et al. ().
It should be noted that the three groups used dierent sound speeds. Nosal and Frazer
used the sound speed prole for the month of March from the Generalized Digital Environment Model (US Naval Oceanographic Oce); Giraudet and Glotin used a constant
speed of 1500 m/s, and Morrissey et al. used the sound speed prole provided with the
workshop dataset. White et al. used a linear sound speed prole starting at 1470 m/s at
800 m and with a gradient of 0.017 s 1.
The results in the horizontal plane are very close for all four groups; the results of Morrissey et al. are shifted by approximately 100 m SSE compared to the others, which presumably can be explained by the dierent sound speed prole that they used. The depth
results are also very close for all four groups, and follow a similar dive pattern. In the vertical plane, the results of GiraudetGlotin being approximately 50100 m deeper than the
others. The scatter in their individual results is also very small (within 100 m for all), which
is impressive considering the large distances involved in this problem.
1068
The results shown here dier from the initial results presented at the workshop, except
for White et al. whose results were not presented at the workshop. The initial results from
Morrissey et al. did not converge well, which led them to nding that two of the ve
hydrophones had a time delay of 2.34 s that was initially unaccounted for. All authors
re-ran their respective algorithms to produce the results shown in this document. This
re-processing emphasized a specic strength of the Nosal and Frazer algorithm: it does
not require the individual hydrophones to be accurately time-synchronized to produce
good localizations, as long as the source does not transit a signicant distance over the
duration of the clock delays. This comes from the fact that source ranges are estimated
independently for each individual hydrophone. In this case, the 2.34 s delay led to insignicant changes in their localization results, though it had a large impact on the results
of Morrissey et al. and Giraudet and Glotin.
6. Discussions on third workshop
At the end of the workshop, Dr Robert Gisiner led a discussion of the topic of a third
workshop, to nd out whether there was interest in a third workshop, and what format
would be preferred. This section is a summary of these discussions written by Dr Gisiner.
Though the Oce of Naval Research is considering co-sponsoring this next workshop, the
views below are those of the people present at the Second workshop, and should not be
interpreted as some sort of ONR preference.
6.1. Meeting timing
We talked about identifying dataset needs in April 06, making the datasets available
around MayJune 06 and then giving people about a year to work on them, with a meeting
in the AprilJune 2007 time frame.
6.2. Meeting location
It was generally agreed that a site in the eastern US or Canada, or in western Europe
would create the least overall travel cost for participants. Since the 2nd meeting was in
Monaco, the general feeling was that the next meeting should be on the east coast of
the US or Canada.
6.3. Data distribution
If large data sets are used, it may be hard to transmit them by email. Alternatives are to
mail CDs or post datasets on the internet. There was a suggestion that the data sets, and
other information about the next workshop, be posted on Dave Mellingers MobySoft
website, others may also be interested in posting the materials on their site.
6.4. In summary
Meeting:
Data sets compiled and made available by June 2006.
Meeting held in AprilJune 2007 on the east coast of North America.
1069
The meeting should consist mostly of analyses of the provided data sets, with relatively
more time set aside for discussion than in past meetings, and perhaps more poster
presentations.
Data sets:
At minimum; four 5-min time series with multiple species, or one long time series of
20+ min.
Three (or more) sensors (to enable optional localization and tracking).
The data should be sampled at a minimum of 100 kHz. If no mysticetes are included in
the sample, some reduction in dataset size could possibly be obtained by high-pass ltering to eliminate data between DC and 5 kHz.
The verication and validation information will be provided by expert human review of
the samples, optionally supplemented by direct visual observations that were conducted
concurrently with the acoustic sampling.
Training datasets: provide a sample set of representative sounds for each species, containing between 20 and 100 samples each, none of the samples drawn from the test dataset, though ideally from the approximately same location and time of year.
Datasets should be posted on a host website along with other required or optional data:
Required additional data:
precise sensor locations in lat/long;
data about the sampling and data processing such as gain, frequency sensitivity, etc.;
time precision and accuracy, and clock drift information.
Optional:
SVP (real-time is best, archived historical averages second-best);
bathymetry and bottom properties.
A dataset containing an articial source playing realistic sounds is desirable, but
optional.
Datasets containing added noise or clutter sounds are optional.
Analysis Products:
Performance of the automated detection algorithm expressed as ROC data (high probability of correct detection of signals of interest with low false alarms).
Classication to species level is ideal, but more general classication should be accepted.
Localization and tracking analyses optional.
Analysis of performance under diering noise regimes optional.
7. Conclusions
This workshop proved a rich hand full exchange. We would like to thank the authors
for the calibre of their oral presentations and for allowing the community to fuel the
debate on the passive acoustic detection and localization of marine mammals. The 2007
conference is being planned, which shows the desire to pursue these exchanges and move
forward in the multidisciplinary research dedicated to marine mammals. Lets meet again
at the next rendezvous!
Acknowledgements
Association DIRAC (France); Oce of Naval Research Global (ONRG), US; Fondation Albert 1er, Monaco; International Association of Geophysical Contractors (IAGC),
1070
US; Industry Research Funders Coalition (IRFC), US; Avisoft Bioacoustics, Germany;
Cetacean Research Technology Nauta, Italy; Conseil General des Alpes-Maritimes,
France and Ixsea, France.
The Ministere de lEcologie (the French Ministry of Environment) and the Ministere de
la Recherche (the French Ministry of Research) are endorsing the workshop.
References
[1] Adam O. The use of the Hilbert Huang transform to analyze transient signals emitted by sperm whales. Appl
Acoust, this volume, doi:10.1016/j.apacoust.2006.04.001.
[2] Barlow J, Taylor B. Estimates of sperm whale abundance in the northeastern temperate pacic from a
combined acoustic and visual survey. Mar Mammal Sci 2005;21(3):42945.
[3] Giraudet P, Glotin H. Real-time 3D tracking of whales by precise and echo-robust TDOAs of clicks
extracted from 5 bottom-mounted hydrophones records of the AUTEC. Appl Acoust, this volume,
doi:10.1016/j.apacoust.2006.05.003.
[4] Kandia V, Stylianou Y. Detection of sperm whale clicks based on the Teager Kaiser energy operator. Appl
Acoust, this volume, doi:10.1016/j.apacoust.2006.05.007.
[5] Leaper R, Gillespie D, Papastavrou V. Results of passive acoustic surveys for odontocetes in the Southern
Ocean. J Cetacean Res Manage 2000;2(3):18796.
[6] Loptaka M, Adam O, Laplanche C, Motsch JF, Zarzycki J. Sperm whale clicks analysis using recursive timevariant lattice lter. Appl Acoust 2006, this volume, doi:10.1016/j.apacoust.2006.05.011.
[7] Mellinger D, Clark C. MobySound: a reference archive for studying automatic recognition of marine
mammal sounds. Appl Acoust 2006, this volume, doi:10.1016/j.apacoust.2006.06.002.
[8] Mhl B, Wahlberg M, Madsen PT, Heerfordt A, Lund A. The monopulsed nature of sperm whale clicks. J
Acoust Soc Am 2003;114(2):114354.
[9] Morrissey RP, Ward J, DiMarzio N, Jarvis S, Moretti DJ. Passive acoustic detection and localization of
sperm whales (Physeter macrocephalus) in the tongue of the ocean. Appl Acoust 2006, this volume,
doi:10.1016/j.apacoust.2006.05.014.
[10] Nosal E-M, Frazer LN. Delays between direct and reected arrivals used to track a single sperm whale. Appl
Acoust 2006, this volume, doi:10.1016/j.apacoust.2006.05.005.
[11] Rendell L, Whitehead H. Vocal clans in sperm whales (Physeter macrocephalus). In: Proceedings of the
Royal Society: Biological Sciences. 2003. Published online. <doi:10.1098/rspb.2002.2239>.
[12] White PR, Leighton TG, Finfer DC, Prowles C, Baumann O. Localisation of sperm whales using bottommounted sensors. Appl Acoust 2006, this volume, doi:10.1016/j.apacoust.2006.05.002.
[13] Whitehead H. Sperm whales: social evolution in the ocean. Chicago, IL: University of Chicago Press; 2003.