0% found this document useful (0 votes)
27 views11 pages

Towards Context-Aware Automatic Haptic Effect Generation For Home Theatre Environments

Uploaded by

cyange.liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Towards Context-Aware Automatic Haptic Effect Generation For Home Theatre Environments

Uploaded by

cyange.liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Towards Context-aware Automatic Haptic Effect Generation for

Home Theatre Environments


Yaxuan Li∗ Yongjae Yoo∗
Centre for Intelligent Machines & Centre for Intelligent Machines &
Centre for Interdisciplinary Research in Music, Centre for Interdisciplinary Research in Music,
Media and Technology Media and Technology
McGill University McGill University
Montréal, Québec, Canada Montréal, Québec, Canada
yaxuan.li@mail.mcgill.ca yongjae.yoo@mcgill.ca

Antoine Weill-Duflos Jeremy R. Cooperstock


Centre for Intelligent Machines & Centre for Intelligent Machines &
Centre for Interdisciplinary Research in Music, Centre for Interdisciplinary Research in Music,
Media and Technology Media and Technology
McGill University McGill University
Montréal, Québec, Canada Montréal, Québec, Canada
antoine.weill-duflos@mcgill.ca jer@cim.mcgill.ca

ABSTRACT KEYWORDS
The application of haptic technology in entertainment systems, such Haptics, 4D effect generation, automatic haptic effect authoring,
as Virtual Reality and 4D cinema, enables novel experiences for home theatre, immersive experience
users and drives the demand for efficient haptic authoring systems. ACM Reference Format:
Here, we propose an automatic multimodal vibrotactile content Yaxuan Li, Yongjae Yoo, Antoine Weill-Duflos, and Jeremy R. Cooperstock.
creation pipeline that substantially improves the overall hapto- 2021. Towards Context-aware Automatic Haptic Effect Generation for Home
audiovisual (HAV) experience based on contextual audio and visual Theatre Environments . In 27th ACM Symposium on Virtual Reality Software
content from movies. Our algorithm is implemented on a low-cost and Technology (VRST ’21), December 8–10, 2021, Osaka, Japan. ACM, New
system with nine actuators attached to a viewing chair and extracts York, NY, USA, 11 pages. https://doi.org/10.1145/3489849.3489887
significant features from video files to generate corresponding hap-
tic stimuli. We implemented this pipeline and used the resulting 1 INTRODUCTION
system in a user study (n = 16), quantifying user experience accord- The advent of four-dimensional (4D) theatre technology has had
ing to the sense of immersion, preference, harmony, and discomfort. a profound effect on viewer experience. Vestibular (motion) feed-
The results indicate that the haptic patterns generated by our al- back, along with bursts of air, mist effects, thermal stimuli, and
gorithm complement the movie content and provide an immersive vibrotactile effects are now commonly employed in immersive 4D
and enjoyable HAV user experience. This further suggests that the movie-viewing environments. Such components enrich the movie
pipeline can facilitate the efficient creation of 4D effects and could watching experience through enhanced realism, increased enjoy-
therefore be applied to improve the viewing experience in home ment, and greater immersion [17]. There is a growing need for
theatre environments. content that can take advantage of the 4D capabilities of movie
theatres with such components, without necessitating undue pro-
CCS CONCEPTS duction costs.
• Human-centered computing → Haptic devices; Mixed / aug- This demand has likely grown as a result of the COVID-19 pan-
mented reality; User studies; Virtual reality. demic, which has further pushed consumption of cinema content to
the home theatre environment, supported by a variety of streaming
services such as Netflix, Amazon Prime, and Disney+. For home
∗ Both authors contributed equally to this research. theatre setups, an inexpensive, but highly limited, option is the
use of haptic actuators installed in chairs to directly convert or
filter the audio track to generate vibrotactile feedback. Although
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed semi-automatic generation of richer haptic effects for 4D cinema
for profit or commercial advantage and that copies bear this notice and the full citation is being explored in both academia and industry, this process gen-
on the first page. Copyrights for components of this work owned by others than ACM erally remains mostly manual, dependent on skilled industry ex-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a perts, time-consuming, and costly [38]. The typical process takes
fee. Request permissions from permissions@acm.org. three designers approximately 16 days to produce the effects for
VRST ’21, December 8–10, 2021, Osaka, Japan a feature-length 4D film. This cost obviously impacts the creation
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9092-7/21/12. . . $15.00 and distribution of 4D movies, and motivates exploration of more
https://doi.org/10.1145/3489849.3489887 efficient authoring methods.
VRST ’21, December 8–10, 2021, Osaka, Japan Y. Li, Y. Yoo, A. Weill-Duflos, J. Cooperstock

Figure 1: Overview of the haptic effects generation algorithm. With movie content as input, the fusion of psychoacoustic
measurements indicating human perception of sound determines vibration intensity. In parallel, the saliency map suggests
the event location in the scene, and assigns weights to actuators for the generation of the spatiotemporal vibrotactile effects.

The contribution of this paper is our automatic haptic effect gen- feedback: 1) tactile feedback with temperature stimuli [20, 51], vi-
eration algorithm, designed primarily for the needs of the 4D home bration [2, 33, 36, 40, 51, 63, 67], and pressure [3, 4, 25, 60]; 2) kines-
theatre content experience, along with a (n = 16) user study we thetic feedback resulting from limb position movement [22, 34] and
conducted to evaluate the performance of the proposed algorithm force [7, 14, 15, 34, 50]; and 3) proprioception or vestibular feedback
on six movie clips from different genres including sci-fi, action, car- arising from body motion [15, 56]. Gaw et al. and Delazio et al.
toon, family-comedy, and horror/thriller. Our algorithm, illustrated demonstrated the feasibility of using force feedback to increase
in Figure 1, generates haptic effects from an analysis of the audio- viewers’ ability to understand the presented media content [18, 22]
visual stream. The stream conveys information about the high-level and Dionisio et al. highlighted the effectiveness of thermal stimuli
context of the scene such as tonal properties (a composite of the in VR [20].
mood, narrative, and specific genre features); location of events and
characters; dynamics of the atmosphere; and intensity of action. 2.2 Authoring of Vibrotactile Stimuli
We used metrics that reflect salient perceptual characteristics of To achieve a high-quality overall experience, it is essential to create
the audio and visual modalities. These include four psychoacoustic haptic effects that match the audiovisual content of the scene. At
parameters, used in the audio analysis to determine the intensity present, this relies predominantly on the intuition and experience
of vibration (Figure 1(b)) and estimates of visual saliency for every of expert designers, working manually with effects authoring plat-
frame (Figure 1(c)), which are used to determine the spatiotemporal forms; for example, Macaron [55], a web-based vibrotactile effects
distribution of vibrotactile effects (Figure 1(d)). The resulting haptic editor, Vibrotactile Score [39], based on composition of patterns of
effects are delivered through a vibrotactile chair with nine actuators musical scores, H-Studio [13], Immersion’s Haptic Studio,1 HFX
(Figure 1(e)). Studio [16], posVibEditor [53] and SMURF [32].
As noted previously, use of such haptic authoring tools tends to
2 RELATED WORK be costly and time-consuming. To address this challenge, automatic
2.1 Haptics for Enhancing the Viewing haptic effects generation methods have been developed, which typ-
ically produce haptics from audio properties. An early effort by
Experience Chafe tried to combine the vibrotactile sensation with musical in-
Inclusion of multisensory information has been demonstrated to struments [8]. Similarly, Chang and O’Sullivan used low-frequency
greatly enhance user experience in terms of immersion, flow, absorp- components, extracted from audio files, to provide vibration on a
tion, and engagement in a range of entertainment activities includ- mobile phone using a multi-function transducer as an actuator [9].
ing virtual and augmented reality and gaming [12, 24, 35, 57, 77, 79]. For the creation of vibrotactile effects in games, Chi et al. found
There is growing interest in applying similar effects in 4D film, en- that simple mapping of the audio according to frequency bands
hancing the cinematic viewing experience beyond stereoscopic (3D) resulted in excessive ill-timed vibrations, causing feelings of haptic
cinema. numbness and annoyance [11]. To avoid this problem, they estab-
Today, numerous companies including D-Box Technologies, Me- lished key moments when vibrations should occur by identifying
diaMation (MX4D), and CJ 4D Plex (4DX), manufacture 4D movie target sounds in real-time using an acoustic similarity measure-
platforms for theatres and theme parks. Their hardware provides ment. Contrary to other researchers, who employed frequency
haptic effects including leg tickling, vestibular (motion) effects, air bands and filters to extract suitable audio features for haptic con-
pressure and thermal stimulation. With respect to the inclusion version [29, 47], Lee and Choi [37] translated auditory signals to
of haptic effects, Danieu et al. presented a perceptually organized
review of HAV devices [17], describing the three types of haptic 1 https://www.immersion.com/technology/#hapticdesign-tools
Towards Context-aware Automatic Haptic Effect Generation for Home Theatre Environments VRST ’21, December 8–10, 2021, Osaka, Japan

vibrotactile feedback by bridging the perceptual characteristics of The importance of sound cues to human attention motivated the
loudness and roughness. work of Evangelopoulos et al. who formulated measures of audiovi-
Another direction for haptic authoring employs cues from the sual saliency by modelling perceptual and computational attention
visual stream to drive the generation of haptic effects. For exam- to such cues [21]. Huang and Elhilali further demonstrated that
ple, Rehman et al. took the graphical display of a soccer game, auditory salience is also context dependent [26], since the salient
divided the field into five areas, and mapped each to an area-based components vary according to specific characteristics of the scene.
vibration pattern to indicate the position of the ball [63]. Similarly, Accordingly, to achieve an effective multimedia user experience,
Lee et al. mapped the trace of a soccer ball to vibration effects on haptic effects created from the auditory modality should consider
an array of 7 × 10 actuators mounted on the forearm, enabling a characteristics of the source content, such as the genre of mate-
spatiotemporal vibrotactile mapping of the location of the ball in rial [27, 56] The generation of haptic effects that are congruent
the game [36]. Moreover, the use of computer vision and physical to the given audiovisual contents has been studied for decades. A
motion modeling techniques has been investigated to extract key common approach is the extraction of bass and treble components
elements of the scene or estimate camera motion from first-person from the low-frequency component of a music signal [9] to map
videos. These cues are applied to create corresponding haptic effects, to different frequencies [27]. Such techniques based on audio at-
including vestibular (motion) [38], force [15], and vibrotactile feed- tributes have been employed to create meaningful and enjoyable
back [30, 31, 56]. More recent research has demonstrated impressive haptic effects in many apps and games, using tools such as Apple
performance using Generative Adversarial Networks to produce Core Haptics2 and Lofelt Studio.3
tactile signals from images of different materials [42, 43, 64, 65]. In this study, we emphasized achieving a high-level match be-
tween haptic effects and movie events to improve the experience of
3 AUTOMATIC HAPTIC EFFECT media content consumption. We attempted to achieve this through
AUTHORING ALGORITHM the use of four psychoacoustic measures: low-frequency energy,
loudness, sharpness, and booming.
While the research described in the previous section endeavors to In terms of vibrotactile perception, humans are the most sensitive
generate automatic vibrotactile effects based on a mapping from
between 160 and 250 Hz [66]. Following the approach of Hwang et
audio or visual contents, we propose instead a multimodal algorithm
al. [27], we consider frequencies up to 215 Hz as borderline bass-
that integrates the contextual information from both audio and band, as this corresponds approximately to the maximum frequency
visual streams. We also take into account the perceptual psychology of the motor we use. We calculate the low-frequency energy value
of the expected audience to generate reasonable patterns that align as the integral of the amplitudes of the frequency components up
automatically with the movie contents. to 215 Hz.
Our pipeline employs four acoustic measures (Section 3.1) and Loudness is a well-known perceptual measure, indicating the
the visual saliency map (Section 3.2). For the former, we use the psy- perceived perception of sound pressure. It is calculated in the deci-
choacoustic parameters of sharpness, booming, low-frequency en- bel scale as a weighted integration of spectral loudness, N (f ), for
ergy and loudness to quantify the perceptual quality of the movie’s frequency f ,
audio track. The timing and amplitude of the vibrotactile actuation
are determined from these auditory features. For example, gunshot ∫ 24 Bar k
sounds in a fight scene or booming sounds in racing or piloting L= N (f )d f ,
0.3Bar k (1)
scenes create an atmosphere of intense emotions and, ideally, in-
1
crease the immersion of the audience. For the latter, we estimate N (f ) = 20loд10 I (f ).
the saliency map from the visual stimuli for each frame, which Wk
identifies the region of expected visual interest [48]. As shown in Wk is a weighting factor, which is extracted from the equal-loudness
a previous study [30], the map can provide effective location and contour of the International Standards Organization on loudness
direction information for vibrotactile rendering; however, it cannot (ISO 532) [1]. I (f ) is the intensity of the sound component at fre-
automatically determine the magnitude of the haptic parameters. quency f .
Finally, we integrate the information from both audio and visual Sharpness and booming are also weighted summations of spec-
features to generate the vibrotactile effects. These are presented tral energy in the frequency domain, emphasizing high- and low-
through an array of nine vibration motors, installed in a chair. Al- frequency components, respectively. For example, a glass-breaking
though designed for spatiotemporal vibrotactile feedback in this sound would exhibit high sharpness and low booming while the
study, our pipeline for automated haptic effects authoring could engine sound of a sports car would be the opposite. These values
likewise be applied to other haptic modalities such as temperature, can be derived by the weighted integral of spectral loudness N (f )
airflow or impact. divided by loudness L. Sharpness S is calculated as follows:

3.1 Audio Modality Analysis ∫ 24Bar k


N (f )д(f )f d f
The sounds corresponding to an event are especially significant S = 0.11 0.3Bar k (2)
L
since the audio and haptic modalities share several common charac-
teristics, such as the physical propagation principle and perceptual
properties (roughness, consonance, etc.). In movie clips, sound often 2 https://developer.apple.com/documentation/corehaptics

accompanies important events, such as the one depicted in Figure 2. 3 https://lofelt.com/


VRST ’21, December 8–10, 2021, Osaka, Japan Y. Li, Y. Yoo, A. Weill-Duflos, J. Cooperstock

chase sequences. However, these approaches often cannot filter out


human voices, propagating incongruent vibration effects alongside
the intended effects. Moreover, effects with high-frequency com-
ponents, such as sci-fi (e.g., laser weapons or aircraft sounds) or
magical effects, are often omitted since these scenes do not have
enough low-frequency components. To overcome such problems,
we sought another psychoacoustic parameter. From our observa-
tions, we picked sharpness and booming to represent effects with
high and low frequency components, respectively. Loudness and
low-frequency energy were selected to embody the absolute mag-
nitude of sound intensity and attributes of the sound effects. In
general, the portion and magnitude of low-frequency components
increase when the source object associated with the sound effect
is large and located near the camera. Examples include the sound
spectrum of guns firing, cannons, bombs or other explosions.
For data processing, we used the 44.1 kHz recorded stereo sound
of the movie clip. Calculation of the four psychoacoustic parameters
was performed using the sound data stream sampled every 20 ms.
We then normalized each parameter in the range of [0, 1] by dividing
by its maximum.
To determine the haptic output, we first select parameter-specific
thresholds for use during the extraction of the target sounds from
the movie scenes. For our study, we used post-normalization values
of (sharpness, booming, low-energy, loudness) = (0.5, 0.65, 0.8, 0.85).
If a calculated acoustic parameter exceeds any of these thresholds,
vibrotactile effects will be presented. The threshold values were
determined empirically as suitable to reduce undesirable haptic
effects caused from sounds that are not related to the event context,
such as narration or background music.
Users can adjust these thresholds according to the movie con-
tents with our haptic-effects-generation pipeline. The intensity of
vibration is determined by the highest parameter value among the
four parameters. For example, if the audio of the scene results in a
parameter vector of (0.6, 0.4, 0.5, 0.9), the vibration intensity would
Figure 2: An example of psychoacoustic analysis on a movie be 0.9 times the maximum. The intensity of vibration was spatially
scene for sharpness, booming, low energy, and loudness mapped using the vibrotactile actuator array in the fusion phase,
with thresholds values of 0.5, 0.65, 0.8, 0.85, respectively. described in Section 4.
(Note that parameter values are normalized to 0 to 1.) The
fusion of four parameters provides the values for vibrotac- 3.2 Visual Modality Analysis
tile rendering of the events presented in the movie scene. In parallel with sound processing, we also analyzed the visual con-
tent of the movie source. As demonstrated by Kim et al. [30], the
where д(f ) is the weighting function. Booming B is calculated as salient region is an effective indicator of the location of the events
∫ 24Bar k that occur on screen and can be associated with the spatial distri-
0.3Bar k
N (f )f d f bution of the actuators. Accordingly, we first predicted the saliency
B= (3)
(0.49529Bark(f ) + 0.14176)L area of the frames using a convolutional neural network (CNN)
model, choosing this framework because of its superior perfor-
The reader is referred to the reference of Zwicker and Fastl [80] for mance on 2D image tasks such as object detection [44, 76], recogni-
further details regarding these psychoacoustic variables and the tion [6, 41, 58, 69], and segmentation [23, 45, 59] compared to other
associated calculations. machine learning algorithms.
We used these metrics to analyze the attributes of sound effects To implement the model for saliency detection, we followed the
in movie scenes, inferred the characteristics of the sound played, pyramid feature selective network presented in [75].4 The overall
and then decided on the intensity of the haptic effects. Previous architecture of the model is illustrated in Figure 3. The input image
efforts on the haptic effects generation from sound often utilized is first fed into convolutional layers based on VGG net [58] as CNN
low-frequency components since haptic sensation lies in the lower has shown a strong ability to learn features of the images. Several
end of the frequency spectrum. Such approaches help to make the
movie scenes more immersive, especially highly dynamic and vio- 4 https://github.com/CaitinZhao/cvpr2019_Pyramid-Feature-Attention-Network-
lent ones including elements like explosions, gunfights, races, and for-Saliency-detection
Towards Context-aware Automatic Haptic Effect Generation for Home Theatre Environments VRST ’21, December 8–10, 2021, Osaka, Japan

Figure 3: Structure of the saliency detection model.


Figure 5: Nine-actuators layout dividing the scene and dis-
tributing the vibrotactile intensity for rendering. The exam-
ple of △12 with centroid C 12 distributing weight to actuator
A5 , A6 , and A9 is shown.

Figure 4: Saliency map generated by CNN model: (a) raw in-


put image, (b) predicted saliency mask from the model, and
(c) result after applying a filter on the saliency mask to re-
move noisy salient edges.

modules with attention mechanisms are then deployed as attention


is a perfect fit to select features and detect areas of interest. More
specifically, the branch with Conv 3-3, Conv 4-3, Conv 5-3 blocks Figure 6: Example of △12 with actuator A5 , A6 , and A9 as ver-
based on VGGnet [58], context-aware pyramid feature extraction texes and α ∆A5 , α ∆A6 , α ∆A9 distributing to three vertexes respec-
12 12 12
(CPFE) module, and channel-wise attention (CA) module captures tively.
high-level features with the rich context of the images. The CPFE
module adopts atrous convolution with dilation rates of 3, 5, and 7 sensations [54], we developed an algorithm that provides vibrotac-
to obtain context information from the multi-receptive field. The tile patterns with reasonably high spatiotemporal resolution. To
CA module assigns different weights to channels according to their provide an opportunity for improved immersion, we also placed
response to salient objects, while another branch with Conv 1-2, actuators on the armrests in order to extend the rendering of effects
Conv 2-2 taken from VGGnet [58] extracts low-level features of beyond the user’s back and seat.
the images. A spatial attention module follows and aims to refine We determined the timing and intensity of vibration rendered
detailed information of the salient area and filter out the noises in by each actuator as follows. First, we mapped the locations of the
the background that may distract saliency mask generation. nine actuators on the chair to on-screen vertices A1 , A2 , ..., A9 as
We pre-trained the model on the DUTS-train dataset [68] with shown in Figure 5. Considering the shape of the chair and the
10,553 training images. We then developed an interface for gener- aim of covering the user’s back, arms, and bottom, we distributed
ating saliency maps for our dataset composed of movie frames. As A3 , A7 on the armrest and A8 , A9 on the seat. We arranged A1−2
shown in Figure 4, we ultimately obtained the salient area (shown and A4−6 in a manner that provided coverage over the user’s back
in yellow) and background area (shown in dark purple) for each and obtained a configuration of actuators of 14 split triangles of
frame. similar size, as needed for later steps of our rendering algorithm.
This arrangement resulted from several iterations of self-tests and
4 INFORMATION FUSION AND pilot tests conducted by the authors. We arranged the actuators so
IMPLEMENTATION ON HARDWARE that A1A8 , A2A9 divide the frame into three equal horizontal ranges;
After analyzing the audio and visual modalities, we obtained the A3A4 , A4A5 , A5A6 , A6A7 divide the frame into four equal horizontal
targeted frames with psychoacoustic measurements and the area of ranges, and A3A7 bisects it vertically. We overlaid this actuator
interest for each frame. The next step consisted of fusing the two position division mask on each frame, which is generated by the
streams of information outputs and translating them to vibrotac- CNN model with the predicted saliency region. With connections
tile sensations rendered with hardware. For our haptic rendering drawn between the nearest neighbors, this actuator distribution
platform, we used a chair with nine vibrotactile actuators inte- divides the frame into fourteen triangles, as shown in Figure 5(b).
grated into the back, seat, and arms as illustrated in Figure 5(a). The In each triangle, we calculated the centroid Ci of the saliency shape,
saliency map must be down-sampled, typically to a small grid, for i ∈ {1, ..., 14} and the distance from centroid Ci to its three adjacent
example, of 3 × 3 [30] or 2 × 3 actuators [28], making contact with vertices. Figure 5(b) provides an example of C 12 in △A5A6A9 .
the user’s back and hips for rendering moving tactile sensations. Second, we assigned weights that are inversely proportional to
Amemiya et al. simulated vibration in the seat pan, presented in the distance between the centroid and the three adjacent vertices.
conjunction with a visual display of optical flow to intensify the More specifically, we defined α A n
△i as the reciprocal of the distance
perception of forward velocity [5]. Inspired by phantom tactile between Ci and the adjacent vertex An , Ci A−1 n . An example of △12
VRST ’21, December 8–10, 2021, Osaka, Japan Y. Li, Y. Yoo, A. Weill-Duflos, J. Cooperstock

on the armrests of the chair, 2.18–5.63 g on the seat, and 2.41–7.99


g on the backrest were measured by a triaxial accelerometer (PCB
electronics; Model 356A01) and a data acquisition card (National
Instruments; Model USB-4431). The frequency of vibration ranged
from 110 to 230 Hz, and the rise time of the ERM motors was ap-
proximately 50 ms. Although this response time is non-negligible,
it is also not excessive for our purposes. First, as reported by Lee
and Choi [37], in terms of the delay of driving actuation, real-time
haptic applications on mobile devices incur approximately 20-30
ms delay, but participants perceive such haptic effects as “simulta-
Figure 7: Example of four weights w ∆A6 , w ∆A6 , w ∆A6 , and w ∆A6 neous”. Second, the movie industry standardized on 24 frames per
5 6 12 13
contributing to WA6 of actuator A6 second, whereas television productions typically use 25 (PAL) or 30
(NTSC) frames per second. Although higher frame-rate recording
is commonly used in digital productions, television broadcasts are
nevertheless mostly confined to the lower rates, for which each
frame therefore represents 33-42 ms. Accordingly, the 50 ms rise
time of the actuator constitutes only a minor difference, which is
difficult to perceive as asynchronous. As a movie starts playing, the
microcontroller receives vibration amplitudes synchronously from
the computer and triggers the ERMs to present vibration effects
every 100 ms.

5 USER STUDY
This section describes the user study for evaluating the effective-
ness of our haptic effects generation algorithm, employing the hard-
ware implementation described in Section 4. In the user study, we
compared participants’ subjective ratings regarding haptic effects
Figure 8: Hardware implementation and experiment setup. generated from different conditions with several movie clips.
with actuator A5 , A6 , and A9 is shown in Figure 6. Next, we assigned
An 5.1 Participants
a weight of w △i k to Ank by
We recruited 16 participants for the experiment (20-34 years old,
An 9 male, 7 female). No participants reported any known sensory
An α △i k disorders that could affect their auditory, vision, or haptic percep-
w △i k = An An An
, k ∈ {1, 2, 3} (4)
α △i 1 + α △i 2 + α △i 3 tion. They were also asked to wear thin T-shirts to allow for better
perception of vibrotactile stimuli on their torso. Each participant
where Ank , k ∈ {1, 2, 3} are three adjacent vertexes of △i. Then,
An was compensated approximately 8 USD for their participation.
for each actuator A, we summed the weights w △i k adding from its
adjacent triangles to obtain WAi . As shown in Figure 7, we have 5.2 Stimuli
four weights contributing to WA6 of actuator A6 : w ∆A6 , w ∆A6 , w ∆A6 , We tested four different haptic rendering conditions, described in
5 6 12
w ∆A6 , which are obtained from Equation 4. We then normalized all Table 1. Six short (approximately one-minute) movie clips were
13
the weights of each actuator A and generated rendering weights selected, including scenes with fighting, shooting, chasing, or ex-
βn , n ∈ {1, 2, ..., 9}. ploding. They were trimmed from four different movies: Alita: Battle
Finally, we obtained the overall vibration intensity for the cur- Angel (2019), The Boss Baby (2017), I Am Legend (2007), and Escape
rent frame by linearly mapping the psychoacoustic measurement Room (2019). Table 2 summarizes the movie clips, their length, and
values as described in Section 3.1 to the actuators’ amplitudes, mul- the number of haptic effects rendered in each scene. In total, 24
tiplied by the rendering weight βn of each actuator. As a result, the combinations from six movie clips and four rendering conditions
location of vibration follows the salient region; when the user’s were provided to each participant. Each of the six movie clips was
visual attention rests on the left side of the movie scene, the cor- considered as a block, and within the block, the presentation order
responding vibrotactile effects with intensity derived from audio of haptic effects was randomized.
analysis, generated using the above mapping,will also be presented
on the left side of the body, as seen in Figure 5. 5.3 Methods and Procedure
We built the hardware system with nine eccentric rotating mass The user study was conducted in a recording studio at the author’s
(ERM) vibrotactile actuators (Seeed Studio; RB-See-403, ϕ =10 mm) institution, depicted in Figure 8. Participants sat approximately
controlled by a Teensy 3.2 microcontroller, connected to the me- 45-55 cm in front of a 21 inch monitor, and wore headphones (Sony,
dia player computer. The actuators are embedded in the chair as WH-H900N) to hear the movie clip sounds, and prevent them from
illustrated in Figure 8. Vibration amplitude ranges of 1.77–5.31 g hearing the faint noises produced by the vibration motors. The
Towards Context-aware Automatic Haptic Effect Generation for Home Theatre Environments VRST ’21, December 8–10, 2021, Osaka, Japan

Table 1: Four conditions of haptic effects generations. Table 3: Questionnaire for assessing the haptic-audio-visual
experience.
Condition Description
Vibration effects were generated randomly and pre- Immersion The haptic effects make me more immersed in
Random
sented through the video clips. the movie.
Vibration effects were generated based on the audio pa- Preference I liked the experience of vibration sensations
Audio rameters described in Section 3.1 only, and presented
while watching this movie clip.
actuator A1 , A2 , and A5 .
Harmony The haptic effects are consistent with the con-
Vibration effects were presented in the salient area as
mentioned in Section 3.2, but not considered audio vari- tent of the scene.
Visual Discomfort The haptic effects are uncomfortable for me.
ables. All vibrotactile actuators corresponding to the
salient area in the screen vibrated.
Table 4: Two-way ANOVA results for the four subjective rat-
The full pipeline described in Section 3 and 4, were used
Multimodal ing.
for generating vibration effects.

Effect
Table 2: Six movie clips for user study with their time length Measure Factor Statistics
size (η 2 )
and components in the scene Rendering F (3, 45) = 138.52, p < .001 0.512
Immersion Movie F (5, 75) = 3.47, p = .004 0.021
# Movie clip Length Components in the scene Interaction F (15, 225) = 1.25, p = .232 0.023
1
Alita: Battle Angel
58 s Grappling, Melee fighting Rendering F (3, 45) = 120.51, p < .001 0.480
(2019) [Clip A] Preference Movie F (5, 75) = 2.74, p = .018 0.018
2
Alita: Battle Angel
62 s Brawling, Crashing, Tasering Interaction F (15, 225) = 1.21, p = .258 0.024
(2019) [Clip B] Rendering F (3, 45) = 158.97, p < .001 0.556
3
The Boss Baby
60 s
Chasing, Fighting, Laughing, Harmony Movie F (5, 75) = 2.32, p = .043 0.013
(2017) Giggling, Crunching Interaction F (15, 225) = 0.61, p = .864 0.011
4
I Am Legend (2007)
62 s
Gunshots, Slamming, Rendering F (3, 45) = 16.25, p < .001 0.113
[Clip A] Crashing, Screaming Discomfort Movie F (5, 75) = 2.20, p = .054 0.025
I Am Legend (2007) Gunshots, Barking, Snarling, Interaction F (15, 225) = 0.75, p = .735 0.026
5 73 s
[Clip B] Growling and participants and the experimenter wearing face masks, while
Whooshing, Exploding, maintaining a 2 m distance.
6 Escape Room (2019) 74 s
Crawling
5.4 Data Analysis
participants were first asked to read through the consent form and We initially averaged the 16 participants’ ratings per each of the 24
listen to the explanation of the experiment. All the participants combinations for a simple comparison by plots (Figure 9). For the
agreed and signed the consent form, and then they were asked to sit statistical analysis of the results, we conducted a two-way ANOVA,
on the vibrotactile chair. Next, the experimenter confirmed that the one-way ANOVA and Tukey’s HSD post-hoc tests for each of the
participant could easily detect vibrations from each of the actuators. four subjective ratings to see the effect’s significance.
The experimenter also confirmed that the intensity of the vibration
was not excessive, as this might cause discomfort throughout the 6 RESULTS
experiment.
6.1 ANOVA analysis
A short training session was given to the participants. They
experienced a 20-second-long movie clip cropped from Alita: Battle We performed a two-way analysis of variance (ANOVA) test for
Angel. The clip was different from the movie clips used in the main immersion, preference, harmony, and discomfort to understand the
session introduced in Table 2. Four different haptic conditions, as effects of haptic rendering methods and movie content. Table 4
described in Table 1, were presented along with the movie. summarizes the statistical analyses. For all the variables, the hap-
In every trial in the main session, the participants watched the tic rendering method largely affected the user’s evaluation (effect
movie while perceiving the haptic effects. They were then given a size η 2 near 0.5). The effects of the movie clip’s content were also
questionnaire (Table 3) to evaluate their experience with the haptic significant in immersion, preference, and harmony scores, despite
rendering in terms of Immersion, Preference, Harmony, and Discom- having small effect sizes. Interestingly, for discomfort rating, the
fort, using Likert scales ranging from 0 (strongly disagree) to 10 effect size of the haptic rendering factor drastically decreased, and
(strongly agree). After completing the questionnaire, the partici- the movie’s effects turned out to be insignificant. The interaction
pants took a short break with at least one minute to avoid fatigue terms were not significant, meaning the effects of different haptic
and adaptation. rendering methods were not affected by different movie contents.
The entire procedure took approximately one hour per partici-
pant, and was carried out under approval of the McGill University 6.2 In-depth Analysis
Research Ethics Board, REB # 21-02-023, in compliance with re- To understand the results better, we conducted a more in-depth
gional COVID-19 restrictions. This included paying attention to analysis on the rendering methods in each of the movie clips. Fig-
sanitary measures, wiping down of the apparatus after each use, ure 9 shows the average scores of the four subjective scores. Note
VRST ’21, December 8–10, 2021, Osaka, Japan Y. Li, Y. Yoo, A. Weill-Duflos, J. Cooperstock

Figure 9: Results of experiment with six movie clips to evaluate the users’ haptic-audio-visual experience by a questionnaire
in terms of immersion, preference, harmony, and discomfort. The error bars represent standard errors. The conditions with
the same letters above the bar indicate there are no significant differences among them.

Table 5: Results of one-way ANOVA for each movie clip in The detailed statistical results are illustrated in Table 5. For
terms of immersion, preference, harmony, and discomfort. each of the one-way ANOVAs, we conducted a posthoc analysis of
The first row of each clip represents F(3, 45) value and the Tukey’s HSD test to understand which components are statistically
second row with p < 0.01 or 0.05 indicates there is a signifi- meaningful. Posthoc grouping labels are presented above the bar;
cant effect. different alphabet coding indicates a significant difference between
items.
Movie Immersion Preference Harmony Discomfort Here, the summaries of observations are:
Alita: Battle 29.495 33.021 27.962 3.141
Angel [Clip A] p < .001 p < .001 p < .001 p = .032 • The effect of the haptic rendering method has a clear signif-
Alita: Battle 23.932 20.126 37.443 2.586 icant effect for all the subjective ratings, showing that the
Angel [Clip B] p < .001 p < .001 p < .001 p = .061 multimodal haptic rendering condition performed best.
26.263 21.293 21.36 6.400 • A general trend of Random < Visual < Audio < Multimodal
The Boss Baby
p < .001 p < .001 p < .001 p = .001 can be observed throughout the plots. Though there is only a
I Am Legend 25.405 17.010 30.233 1.762 slight difference between audio and visual rendering scores,
[Clip A] p < .001 p < .001 p < .001 p = .164 we can infer that time-synchronicity would be more impor-
I Am Legend 18.370 18.115 24.377 1.152 tant than presenting the location of the event.
[Clip B] p < .001 p < .001 p < .001 p = .336
• For discomfort rating, the content of the movie seems to
21.986 20.562 22.922 5.072
Escape Room affect the user’s evaluation. For example, discomfort ratings
p < .001 p < .001 p < .001 p = .003
show a significant difference in Alita [Clip A], Boss Baby,
and Escape Room clips, but the others show no differences.
that the scale was inverted in discomfort ratings, i.e., a lower score
indicates better performance (less discomfort). We performed a one-
way analysis of variance (ANOVA) test to assess the performance 7 DISCUSSION
of the four haptic conditions for each movie clip. In terms of the We can observe that the multimodal haptic effect condition delivers
subjective evaluation metrics immersion, preference, and harmony, the best user experience with regards to immersion, preference, and
all four haptic rendering conditions had a statistically significant harmony, with the lowest level of discomfort. The results solidly
effect (p < 0.01 or p < 0.05) on user experience while viewing present the effectiveness of our algorithm. Furthermore, using both
all six movie clips. However, in terms of discomfort, the haptic visual and audio features in the haptic effects generation increased
rendering conditions only had a statistically significant effect on the subjective rating scores.
user experience while viewing Alita: Battle Angel [Clip A], The Boss In all the experimental conditions, multimodal rendering re-
Baby, and Escape Room. ceived ratings over 8 out of 10. We can conclude the rendering
Towards Context-aware Automatic Haptic Effect Generation for Home Theatre Environments VRST ’21, December 8–10, 2021, Osaka, Japan

algorithm effectively communicated the context of the scene. The are currently investigating these issues. Regarding application sce-
two features, location and amplitude, are well mapped and easily narios, we initially considered a home theatre watching experience,
distinguishable by the participants and helped them in their under- where the algorithm is easily attached to the streaming services to
standing of the scene. Moreover, this was achieved without the use provide 4D haptic effects. The commercialization of this and sim-
of high-performance actuators such as voice coils. ilar work could have an impact on the future of at-home content
A point for further discussion is user discomfort, which includes consumption.
the effects of fatigue caused by an immoderate number of vibra-
tions, or excessive cognitive load due to the rapidly varying stimuli.
7.2 Limitations
As illustrated in Figure 9, we observed that multimodal rendering
achieves the lowest level of discomfort. Furthermore, the large stan- Despite offering automatic haptic authoring and promising applica-
dard deviation indicates a relatively substantial individual variance tion possibilities, our pipeline suffers from a number of limitations.
on the "comfortable range" on vibration perception. In Section 3.2, we presented our visual saliency detection algorithm
We found that the score of psychoacoustic parameters-based for distributing vibrotactile effects to produce an immersive experi-
rendering is slightly better than saliency-based rendering; however, ence. However, we observed a small number of scenes with strong
the difference was not statistically significant. This implies that time sound, but no visual objects on screen, such as ambush attacks
synchronicity is more crucial than location matching for vibrotactile from a ghost or monster in the dark, an electricity blackout, or
rendering. However, we picked mostly dynamic scenes for this heavy wind blowing out the fire in the scene. In such cases where
study, which are accompanied by clear, distinctive sound effects. sound components exist in the absence of corresponding visual
Scenes that are not dynamic and violent (e.g., slow camera motion, objects, we fixed the saliency area in the middle of the frame and as-
speaking, romance scenes, etc.) need further investigation. signed A5 to take responsibility for rendering vibrotactile actuation.
Furthermore, movies contain many blurred frames to guarantee
the fluidity of characters’ actions, which poses a challenge for the
7.1 Extensibility of the Algorithm saliency detection model and results in imprecise edge detection of
The algorithm’s flexibility lends itself to scaling and expansion to objects.
other applications such as Virtual Reality (VR), Augmented Re- These limitations may be overcome by more powerful, accu-
ality (AR), and games. The pipeline also allows for retroactively rate, or movie-specific detection or segmentation machine learning
introducing 4D effects to previous movies made without 4D in models, incorporating, for example, methods of stereo sound local-
mind. We provide a UI for our program that allows users to adjust ization [62], attention [73], and depth extraction [78], and equally,
thresholds for each of the psychoacoustic parameters of sharpness, taking advantage of large-scale movie datasets such as IMDB and
booming, low-energy, and loudness. Once a video is selected to Movielens. We emphasize that the proposed technique does not
be processed, users can choose and tune psychoacoustic measure- aim to serve as a replacement for manual design by haptics experts.
ments and their thresholds. The program will generate a text file The quality of the haptic patterns from our pipeline is inferior to
for driving actuators and a video for haptic effects visualization. those created by laborious manual authoring. However, we believe
Haptics practitioners can use our tool to create vibrotactile effects that our pipeline provides useful insights, and offers a preliminary
for their videos regardless of VR or gaming content. version of high-quality effects that can be implemented in an early
Moreover, since the feature extraction part is detached from the stage in the content authoring.
calculation of the rendering parameters, we can separately add or
remove audiovisual parameters in that stage. For example, color 8 CONCLUSION
map extraction [71] and camera motion estimation [61, 74] could be In this study, we proposed an automatic multimodal haptic ren-
easily integrated with the current system. Higher-level contextual dering algorithm for movie content, which extracts audiovisual
information can be obtained by using machine learning techniques, features to infer basic contextual information of the scene and ren-
such as violent scene detection [10, 19, 46, 70] or semantic seg- ders haptic effects. We especially targeted ways to improve the
mentation [49, 72] to make haptics more comprehensively fit the home theatre experience. Using a haptic chair equipped with an
storyline. array of vibration motors, we conducted a user study to evaluate
In terms of haptic actuation mechanisms, different modalities the effectiveness of the algorithm in terms of user experience by
such as thermal or airflow sensations could be easily attached. For measuring subjective ratings of immersion, preference, harmony,
example, in a related experimental multi-haptic armrest design [52], and discomfort. The results demonstrate that the proposed multi-
there are two different types of vibrations, thermal sensation, air- modal rendering noticeably improved the viewing experience. Our
flow, and poking mechanisms. These mechanisms would be mapped future work will expand the algorithm and include additional types
to multiple different audiovisual features extracted from the scene of haptic modalities.
and would be expected to improve the watching experience. The
challenge for such conversion is to determine the appropriate map-
ping between the movie content or detected events and the associ- ACKNOWLEDGMENTS
ated type of mechanism or actuator to drive in response. Of course, We thank for Sri Gannavarapu, David Marino, Linnéa Kirby, and
there remains the challenge of parameter selection and tuning to David Ireland for their valuable feedback. We appreciate all study
ensure the appropriate mapping between the content or detected participants for participating in the study and reviewers for their
event and the associated type of actuator to drive in response. We comments.
VRST ’21, December 8–10, 2021, Osaka, Japan Y. Li, Y. Yoo, A. Weill-Duflos, J. Cooperstock

REFERENCES [21] Georgios Evangelopoulos, Konstantinos Rapantzikos, Alexandros Potamianos,


[1] 2017. ISO 532-1:2017. https://www.iso.org/standard/63077.html Petros Maragos, A Zlatintsi, and Yannis Avrithis. 2008. Movie Summarization
[2] Md. Abdur Rahman, Abdulmajeed Alkhaldi, Jongeun Cha, and Abdulmotaleb Based on Audiovisual Saliency Detection. In 2008 15th IEEE International Confer-
El Saddik. 2010. Adding Haptic Feature to YouTube. In Proceedings of the 18th ence on Image Processing. IEEE, 2528–2531.
ACM International Conference on Multimedia (Firenze, Italy) (MM ’10). Association [22] D. Gaw, D. Morris, and K. Salisbury. 2006. Haptically Annotated Movies: Reaching
for Computing Machinery, New York, NY, USA, 1643–1646. https://doi.org/10. Out and Touching the Silver Screen. In The 14th Symposium on Haptic Interfaces
1145/1873951.1874310 for Virtual Environment and Teleoperator Systems (HAPTICS’06). 287–288. https:
[3] Damien Ablart, Carlos Velasco, and Marianna Obrist. 2017. Integrating Mid-air //doi.org/10.1109/HAPTIC.2006.1627106
Haptics into Movie Experiences. In Proceedings of the 2017 ACM International [23] Qichuan Geng, Zhong Zhou, and Xiaochun Cao. 2018. Survey of Recent Progress
Conference on Interactive Experiences for TV and Online Video. 77–84. in Semantic Image Segmentation with CNNs. Science China Information Sciences
[4] Jason Alexander, Mark T. Marshall, and Sriram Subramanian. 2011. Adding 61, 5 (2018), 051101.
Haptic Feedback to Mobile TV. In CHI ’11 Extended Abstracts on Human Factors [24] Gheorghita Ghinea, Christian Timmerer, Weisi Lin, and Stephen R. Gulliver.
in Computing Systems (Vancouver, BC, Canada) (CHI EA ’11). Association for 2014. Mulsemedia: State of the Art, Perspectives, and Challenges. ACM Trans.
Computing Machinery, New York, NY, USA, 1975–1980. https://doi.org/10.1145/ Multimedia Comput. Commun. Appl. 11, 1s, Article 17 (Oct. 2014), 23 pages.
1979742.1979899 https://doi.org/10.1145/2617994
[5] Tomohiro Amemiya, Koichi Hirota, and Yasushi Ikei. 2016. Tactile Apparent [25] T. Hoshi, M. Takahashi, T. Iwamoto, and H. Shinoda. 2010. Noncontact Tactile
Motion on The Torso Modulates Perceived Forward Self-motion Velocity. IEEE Display Based on Radiation Pressure of Airborne Ultrasound. IEEE Transactions
Transactions on Haptics 9, 4 (2016), 474–482. on Haptics 3, 3 (2010), 155–165. https://doi.org/10.1109/TOH.2010.4
[6] Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Es- [26] Nicholas Huang and Mounya Elhilali. 2017. Auditory Salience using Natural
calante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, and Soundscapes. The Journal of the Acoustical Society of America 141, 3 (2017),
Sergio Escalera. 2017. A Survey on Deep Learning Based Approaches for Action 2163–2176.
and Gesture Recognition in Image Sequences. In 2017 12th IEEE International [27] Inwook Hwang, Hyeseon Lee, and Seungmoon Choi. 2013. Real-Time Dual-Band
Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 476–483. Haptic Music Player for Mobile Devices. IEEE Transactions on Haptics 6, 3 (2013),
[7] Jongeun Cha, Mohamad Eid, and Abdulmotaleb El Saddik. 2009. Touchable 3D 340–351. https://doi.org/10.1109/TOH.2013.7
Video System. ACM Transactions on Multimedia Computing, Communications, [28] Ali Israr and Ivan Poupyrev. 2011. Tactile Brush: Drawing on Skin with a Tac-
and Applications (TOMM) 5, 4 (2009), 1–25. tile Grid Display. In Proceedings of the SIGCHI Conference on Human Factors in
[8] Chris Chafe. 1993. Tactile Audio Feedback. In Proceedings of the International Computing Systems. 2019–2028.
Computer Music Conference. International Computer Music Association, 76–76. [29] Maria Karam, Frank A. Russo, and Deborah I. Fels. 2009. Designing the Model Hu-
[9] Angela Chang and Conor O’Sullivan. 2005. Audio-Haptic Feedback in Mobile man Cochlea: An Ambient Crossmodal Audio-Tactile Display. IEEE Transactions
Phones. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems on Haptics 2, 3 (2009), 160–169. https://doi.org/10.1109/TOH.2009.32
(Portland, OR, USA) (CHI EA ’05). Association for Computing Machinery, New [30] Myongchan Kim, Sungkil Lee, and Seungmoon Choi. 2012. Saliency-Driven
York, NY, USA, 1264–1267. https://doi.org/10.1145/1056808.1056892 Tactile Effect Authoring for Real-Time Visuotactile Feedback. In Proceedings of
[10] Liang-Hua Chen, Hsi-Wen Hsu, Li-Yun Wang, and Chih-Wen Su. 2011. Violence the 2012 International Conference on Haptics: Perception, Devices, Mobility, and
Detection in Movies. In 2011 Eighth International Conference Computer Graphics, Communication - Volume Part I (Tampere, Finland) (EuroHaptics’12). Springer-
Imaging and Visualization. IEEE, 119–124. Verlag, Berlin, Heidelberg, 258–269. https://doi.org/10.1007/978-3-642-31401-
[11] Dongju Chi, Donghyun Cho, Sungjin Oh, Kyunngkoo Jun, Yonghee You, Hwan- 8_24
mun Lee, and Meeyoung Sung. 2008. Sound-Specific Vibration Interface using [31] Myongchan Kim, Sungkil Lee, and Seungmoon Choi. 2014. Saliency-Driven
Digital Signal Processing. In 2008 International Conference on Computer Science Real-Time Video-to-Tactile Translation. IEEE Transactions on Haptics 7, 3 (2014),
and Software Engineering, Vol. 4. 114–117. https://doi.org/10.1109/CSSE.2008.931 394–404. https://doi.org/10.1109/TOH.2013.58
[12] Alexandra Covaci, Longhao Zou, Irina Tal, Gabriel-Miro Muntean, and Gheo- [32] Sang-Kyun Kim. 2013. Authoring Multisensorial Content. Signal Processing:
rghita Ghinea. 2018. Is Multimedia Multisensorial? - A Review of Mulsemedia Image Communication 28, 2 (2013), 162–167.
Systems. ACM Computing Surveys (CSUR) 51, 5, Article 91 (Sept. 2018), 35 pages. [33] Yeongmi Kim, Jongeun Cha, Jeha Ryu, and Ian Oakley. 2010. A Tactile Glove
https://doi.org/10.1145/3233774 Design and Authoring System for Immersive Multimedia. IEEE MultiMedia 17, 3
[13] Fabien Danieau, Jérémie Bernon, Julien Fleureau, Philippe Guillotel, Nicolas (2010), 34–45.
Mollet, Marc Christie, and Anatole Lécuyer. 2013. H-Studio: An Authoring Tool [34] Y. Kim, Sunyoung Park, Hyungon Kim, Hyerin Jeong, and J. Ryu. 2011. Effects of
for Adding Haptic and Motion Effects to Audiovisual Content. In Proceedings Different Haptic Modalities on Students’ Understanding of Physical Phenomena.
of the Adjunct Publication of the 26th Annual ACM Symposium on User Interface In 2011 IEEE World Haptics Conference. 379–384. https://doi.org/10.1109/WHC.
Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13 2011.5945516
Adjunct). Association for Computing Machinery, New York, NY, USA, 83–84. [35] Ernst Kruijff, Alexander Marquardt, Christina Trepkowski, Jonas Schild, and An-
https://doi.org/10.1145/2508468.2514721 dré Hinkenjann. 2017. Designed Emotions: Challenges and Potential Methodolo-
[14] F. Danieau, J. Fleureau, A. Cabec, P. Kerbiriou, P. Guillotel, N. Mollet, M. Christie, gies for Improving Multisensory Cues to Enhance User Engagement in Immersive
and A. Lécuyer. 2012. Framework for Enhancing Video Viewing Experience with Systems. The Visual Computer 33, 4 (2017), 471–488.
Haptic Effects of Motion. In 2012 IEEE Haptics Symposium (HAPTICS). 541–546. [36] Beom-Chan Lee, Junhun Lee, Jongeun Cha, Changhoon Seo, and Jeha Ryu. 2005.
https://doi.org/10.1109/HAPTIC.2012.6183844 Immersive Live Sports Experience with Vibrotactile Sensation. In Proceedings
[15] Fabien Danieau, Julien Fleureau, Philippe Guillotel, Nicolas Mollet, Anatole of the 2005 IFIP TC13 International Conference on Human-Computer Interaction
Lécuyer, and Marc Christie. 2012. HapSeat: Producing Motion Sensation with (Rome, Italy) (INTERACT’05). Springer-Verlag, Berlin, Heidelberg, 1042–1045.
Multiple Force-Feedback Devices Embedded in a Seat. In Proceedings of the 18th https://doi.org/10.1007/11555261_100
ACM Symposium on Virtual Reality Software and Technology (Toronto, Ontario, [37] Jaebong Lee and Seungmoon Choi. 2013. Real-time Perception-level Translation
Canada) (VRST ’12). Association for Computing Machinery, New York, NY, USA, from Audio Signals to Vibrotactile Effects. In Proceedings of the SIGCHI Conference
69–76. https://doi.org/10.1145/2407336.2407350 on Human Factors in Computing Systems. 2567–2576.
[16] Fabien Danieau, Philippe Guillotel, Olivier Dumas, Thomas Lopez, Bertrand Leroy, [38] Jaebong Lee, Bohyung Han, and Seungmoon Choi. 2015. Motion Effects Synthesis
and Nicolas Mollet. 2018. HFX Studio: Haptic Editor for Full-Body Immersive for 4D Films. IEEE Transactions on Visualization and Computer Graphics 22, 10
Experiences. In Proceedings of the 24th ACM Symposium on Virtual Reality Software (2015), 2300–2314.
and Technology (Tokyo, Japan) (VRST ’18). Association for Computing Machinery, [39] Jaebong Lee, Jonghyun Ryu, and Seungmoon Choi. 2009. Vibrotactile Score:
New York, NY, USA, Article 37, 9 pages. https://doi.org/10.1145/3281505.3281518 A Score Metaphor for Designing Vibrotactile Patterns. In World Haptics 2009 -
[17] Fabien Danieau, Anatole Lécuyer, Philippe Guillotel, Julien Fleureau, Nicolas Third Joint EuroHaptics conference and Symposium on Haptic Interfaces for Virtual
Mollet, and Marc Christie. 2012. Enhancing Audiovisual Experience with Haptic Environment and Teleoperator Systems. 302–307. https://doi.org/10.1109/WHC.
Feedback: A Survey on HAV. IEEE Transactions on Haptics 6, 2 (2012), 193–205. 2009.4810816
[18] Alexandra Delazio, Ken Nakagaki, Roberta L Klatzky, Scott E Hudson, Jill Fain [40] P. Lemmens, F. Crompvoets, D. Brokken, J. van den Eerenbeemd, and G. de
Lehman, and Alanson P Sample. 2018. Force Jacket: Pneumatically-actuated Jacket Vries. 2009. A Body-conforming Tactile Jacket to Enrich Movie Viewing. In
for Embodied Haptic Experiences. In Proceedings of the 2018 CHI Conference on World Haptics 2009 - Third Joint EuroHaptics conference and Symposium on Haptic
Human Factors in Computing Systems. 1–12. Interfaces for Virtual Environment and Teleoperator Systems. 7–12. https://doi.
[19] Claire-Hélène Demarty, Cédric Penet, Guillaume Gravier, and Mohammad So- org/10.1109/WHC.2009.4810832
leymani. 2011. The Mediaeval 2011 Affect Task: Violent Scene Detection in [41] Shan Li and Weihong Deng. 2020. Deep Facial Expression Recognition: A Survey.
Hollywood Movies. In MediaEval 2011 Workshop. IEEE Transactions on Affective Computing (2020).
[20] J. Dionisio. 1997. Virtual hell: a trip through the flames. IEEE Computer Graphics [42] Yaoyao Li, Huailin Zhao, Huaping Liu, Shan Lu, and Yueyang Hou. 2021. Research
and Applications 17, 3 (1997), 11–14. https://doi.org/10.1109/38.586012 on Visual-tactile Cross-modality Based on Generative Adversarial Network. Cog-
nitive Computation and Systems (2021).
Towards Context-aware Automatic Haptic Effect Generation for Home Theatre Environments VRST ’21, December 8–10, 2021, Osaka, Japan

[43] Huaping Liu, Di Guo, Xinyu Zhang, Wenlin Zhu, Bin Fang, and Fuchun Sun. 2020. [67] Markus Waltl. 2010. Enriching Multimedia with Sensory Effects: Annotation and
Toward Image-to-tactile Cross-modal Perception for Visually Impaired People. Simulation Tools for The Representation of Sensory Effects. VDM Verlag.
IEEE Transactions on Automation Science and Engineering 18, 2 (2020), 521–529. [68] Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai
[44] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Yin, and Xiang Ruan. 2017. Learning to Detect Salient Objects with Image-level
Matti Pietikäinen. 2020. Deep Learning for Generic Object Detection: A Survey. Supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern
International Journal of Computer Vision 128, 2 (2020), 261–318. Recognition. 136–145.
[45] Shervin Minaee, Yuri Y Boykov, Fatih Porikli, Antonio J Plaza, Nasser Kehtarnavaz, [69] Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, and Sergio Escalera.
and Demetri Terzopoulos. 2021. Image Segmentation Using Deep Learning: A 2018. RGB-D-based Human Motion Recognition with Deep Learning: A Survey.
Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021). Computer Vision and Image Understanding 171 (2018), 118–139.
[46] Guankun Mu, Haibing Cao, and Qin Jin. 2016. Violent Scene Detection Using [70] Jing Yu, Wei Song, Guozhu Zhou, and Jian-jun Hou. 2019. Violent Scene Detection
Convolutional Neural Networks and Deep Audio Features. In Chinese Conference Algorithm Based on Kernel Extreme Learning Machine and Three-dimensional
on Pattern Recognition. Springer, 451–463. Histograms of Gradient Orientation. Multimedia Tools and Applications 78, 7
[47] Suranga Nanayakkara, Elizabeth Taylor, Lonce Wyse, and S H. Ong. 2009. An (2019), 8497–8512.
Enhanced Musical Experience for the Deaf: Design and Evaluation of a Music [71] Lin-Ping Yuan, Wei Zeng, Siwei Fu, Zhiliang Zeng, Haotian Li, Chi-Wing Fu, and
Display and a Haptic Chair. In Proceedings of the SIGCHI Conference on Human Huamin Qu. 2021. Deep Colormap Extraction from Visualizations. arXiv preprint
Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for arXiv:2103.00741 (2021).
Computing Machinery, New York, NY, USA, 337–346. https://doi.org/10.1145/ [72] Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang,
1518701.1518756 Ambrish Tyagi, and Amit Agrawal. 2018. Context Encoding for Semantic Seg-
[48] Ernst Niebur. 2007. Saliency Map. Scholarpedia 2, 8 (2007), 2675. mentation. In Proceedings of the IEEE conference on Computer Vision and Pattern
[49] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolu- Recognition. 7151–7160.
tion network for semantic segmentation. In Proceedings of the IEEE International [73] Jing Zhang, Xin Yu, Aixuan Li, Peipei Song, Bowen Liu, and Yuchao Dai. 2020.
Conference on Computer Vision. 1520–1528. Weakly-supervised salient object detection via scribble annotations. In Proceed-
[50] S. O’Modhrain and I. Oakley. 2004. Adding Interactivity: Active Touch in ings of the IEEE/CVF conference on computer vision and pattern recognition. 12546–
Broadcast Media. In Proceedings of 12th International Symposium on Haptic Inter- 12555.
faces for Virtual Environment and Teleoperator Systems (HAPTICS ’04). 293–294. [74] Tong Zhang and Carlo Tomasi. 1999. Fast, Robust, and Consistent Camera Motion
https://doi.org/10.1109/HAPTIC.2004.1287211 Estimation. In Proceedings of 1999 IEEE Computer Society Conference on Computer
[51] Saurabh Palan, Ruoyao Wang, Nathaniel Naukam, Li Edward, and Katherine J Vision and Pattern Recognition (Cat. No PR00149), Vol. 1. IEEE, 164–170.
Kuchenbecker. 2010. Tactile Gaming Vest (TGV). [75] Ting Zhao and Xiangqian Wu. 2019. Pyramid Feature Selective Network for
[52] Nathan J. A. Pollet, Emanuel Uzan, Patricia Batista Ruivo, Tal Abravanel, Aishwari Saliency detection. CoRR abs/1903.00179 (2019). arXiv:1903.00179 http://arxiv.
Talhan, Yongjae Yoo, and Jeremy R. Cooperstock. 2021. Multimodal Haptic org/abs/1903.00179
Armrest for Immersive 4D Experiences. In IEEE World Haptics Conference, Work [76] Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu. 2019. Object
in Progress. Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks
[53] Jonghyun Ryu and Seungmoon Choi. 2008. posVibEditor: Graphical Authoring and Learning Systems 30, 11 (2019), 3212–3232.
Tool of Vibrotactile Patterns. In 2008 IEEE International Workshop on Haptic Audio [77] Zhiying Zhou, A. D. Cheok, Wei Liu, Xiangdong Chen, Farzam Farbiz, Xubo
visual Environments and Games (HAVE’08). 120–125. https://doi.org/10.1109/ Yang, and M. Haller. 2004. Multisensory musical entertainment systems. IEEE
HAVE.2008.4685310 MultiMedia 11, 3 (2004), 88–101. https://doi.org/10.1109/MMUL.2004.13
[54] Oliver S. Schneider, Ali Israr, and Karon E. MacLean. 2015. Tactile Animation [78] Tao Zhou, Deng-Ping Fan, Ming-Ming Cheng, Jianbing Shen, and Ling Shao.
by Direct Manipulation of Grid Displays. In Proceedings of the 28th Annual ACM 2021. RGB-D Salient Object Detection: A Survey. Computational Visual Media
Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST (2021), 1–33.
’15). Association for Computing Machinery, New York, NY, USA, 21–30. https: [79] Longhao Zou, Irina Tal, Alexandra Covaci, Eva Ibarrola, Gheorghita Ghinea,
//doi.org/10.1145/2807442.2807470 and Gabriel-Miro Muntean. 2017. Can Multisensorial Media Improve Learner
[55] Oliver S. Schneider and Karon E. MacLean. 2016. Studying Design Process and Experience?. In Proceedings of the 8th ACM on Multimedia Systems Conference
Example Use with Macaron, A Web-based Vibrotactile Effect Editor. In 2016 IEEE (Taipei, Taiwan) (MMSys’17). Association for Computing Machinery, New York,
Haptics Symposium (HAPTICS). IEEE, 52–58. NY, USA, 315–320. https://doi.org/10.1145/3083187.3084014
[56] Jongman Seo, Sunung Mun, Jaebong Lee, and Seungmoon Choi. 2018. Substituting [80] Eberhard Zwicker and Hugo Fastl. 2013. Psychoacoustics: Facts and models. Vol. 22.
Motion Effects with Vibrotactile Effects for 4D Experiences. In Proceedings of Springer Science & Business Media.
the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC,
Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA,
1–6. https://doi.org/10.1145/3173574.3174002
[57] Donghee Shin. 2019. How Does Immersion Work in Augmented Reality Games?
A User-centric View of Immersion and Engagement. Information, Communication
& Society 22, 9 (2019), 1212–1229.
[58] Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Net-
works for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
[59] Farhana Sultana, Abu Sufian, and Paramartha Dutta. 2020. Evolution of Image
Segmentation Using Deep Convolutional Neural Network: A Survey. Knowledge-
Based Systems 201 (2020), 106062.
[60] Y. Suzuki and M. Kobayashi. 2005. Air Jet Driven Force Feedback in Virtual
Reality. IEEE Computer Graphics and Applications 25, 1 (2005), 44–47. https:
//doi.org/10.1109/MCG.2005.1
[61] Jiexiong Tang, John Folkesson, and Patric Jensfelt. 2018. Geometric Correspon-
dence Network for Camera Motion Estimation. IEEE Robotics and Automation
Letters 3, 2 (2018), 1010–1017.
[62] Antigoni Tsiami, Petros Koutras, and Petros Maragos. 2020. Stavis: Spatio-
temporal Audiovisual Saliency Network. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition. 4766–4776.
[63] S. u. Rehman, J. Sun, L. Liu, and H. Li. 2008. Turn Your Mobile Into the Ball:
Rendering Live Football Game Using Vibration. IEEE Transactions on Multimedia
10, 6 (2008), 1022–1033. https://doi.org/10.1109/TMM.2008.2001352
[64] Yusuke Ujitoko and Yuki Ban. 2018. Vibrotactile Signal Generation from Texture
Images or Attributes using Generative Adversarial Network. In International
Conference on Human Haptic Sensing and Touch Enabled Computer Applications.
Springer, 25–36.
[65] Yusuke Ujitoko, Yuki Ban, and Koichi Hirota. 2020. Gan-based Fine-tuning of
Vibrotactile Signals to Render Material Surfaces. IEEE Access 8 (2020), 16656–
16661.
[66] Ronald T Verrillo. 1966. Vibrotactile Sensitivity and The Frequency Response of
The Pacinian Corpuscle. Psychonomic Science 4, 1 (1966), 135–136.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy