Using E-Annotation Tools For Electronic Proof Correction Tion
Using E-Annotation Tools For Electronic Proof Correction Tion
TION
Once you have Acrobat Reader open on your computer, click on the Comment tab at the right of the toolbar:
This will open up a panel down the right side of the document. The majority of
tools you will use for annotating your proof will be in the Annotations section,
pictured opposite. We’ve picked out some of these tools below:
1. Replace (Ins) Tool – for replacing text. 2. Strikethrough (Del) Tool – for deleting text.
Strikes a line through text and opens up a text Strikes a red line through text that is to be
box where replacement text can be entered. deleted.
3. Add note to text Tool – for highlighting a section 4. Add sticky note Tool – for making notes at
to be changed to bold or italic. specific points in the text.
Highlights text in yellow and opens up a text Marks a point in the proof where a comment
box where comments can be entered. needs to be highlighted.
5. Attach File Tool – for inserting large amounts of 6. Drawing Markups Tools – for drawing
text or replacement figures. shapes, lines and freeform annotations on
proofs and commenting on these marks.
Allows shapes, lines and freeform annotations to be
Inserts an icon linking to the attached file in the
drawn on proofs and for comment to be made on
appropriate place in the text.
these marks.
How to use it
• Click on the Attach File icon in the Annotations
section.
• Click on the proof to where you’d like the attached
How to use it
file to be linked.
• Click on one of the shapes in the Drawing Markups
• Select the file to be attached from your computer section.
or network.
• Click on the proof at the relevant point and draw the
• Select the colour and type of icon that will appear selected shape with the cursor.
in the proof. Click OK. • To add a comment to the drawn shape, move the
cursor over the shape until an arrowhead appears.
• Double click on the shape and type any text in the
red box that appears.
J O S L 12216 Dispatch: 22.10.16 CE: Wiley
Journal Code Manuscript No. No. of pages: 15 PE: Pravin Kumar A
Journal of Sociolinguistics, 2016: 1–35
1 提示话语中的附加信息(类似于书面语中括号的功能)。此外,侧头还与韵
2 律与话语标记有相关性,这种相关性同时受到说话者性别的影响。
3 [Chinese]
4
5 KEYWORDS: Embodiment, computer vision, multimodality, head cant,
6 body positioning, prosody, gender, interaction
7
8
9 1. INTRODUCTION
10
11 Language is a multilayered, multimodal system; in spoken talk, meanings – and
12 particularly social meanings – are conveyed not only by phonetics, syntax, and
13 pragmatics, but also by facial expression, gesture, and movement. A growing
14 body of research takes seriously the consequences of this fact by addressing the
15 central issue of embodiment: the complex ways in which the meaning-making
16 capacity of language is tied to the physical bodies of those who use language.
17 In the view from the cognitive sciences, this implies that the full sensory
18 experience of any event is deeply intertwined with, and even to an extent may
19 inescapably constitute, the mental representations of that event (Glenberg and
20 Kaschak 2003; Matlock, Ramscar and Boroditsky 2003; Barsalou 2008). In
21 linguistics, this line of analysis takes form in the concept of multimodality,
22 whereby the production of meaning is always in progress and can recruit
23 resources from diverse semiotic modes including but not necessarily privileging
24 spoken language (Kress and Van Leeuwen 2001). Multimodality in linguistics
25 has a long history, even if the term is relatively new. At least as early as
26 Birdwhistell’s kinesics (1952, 1970) linguistic anthropologists have recognized
27 the rich communicative capacity of the body, and later researchers such as
28 McNeill (1992, 2008) and Kendon (1995, 2004) argued for the integration of
29 gesture and spoken language as two parts of one system.
30 Though numerous experimental studies have provided convincing pieces of
31 evidence for the claim that speech and bodily movements and postures are
32 tightly connected (see, for instance, Mendoza-Denton and Jannedy 2011;
33 Loehr 2012; Voigt, Podesva and Jurafsky 2014), our understanding of how
34 they interact moment to moment and coalesce into meaningful signs is based
35 primarily on observational study. Scholars of conversation analysis (CA) in
36 particular have explored such moment-to-moment multimodality. The first
37 article in this Series (Mondada 2016) employs just such a CA approach to
38 consider embodiment and interactional multimodality as related to the
39 ‘ecology of the activity’ taking place in interaction, looking at full-body
40 physical positioning as a crucial resource for meaning-making. Indeed, as
41 Mondada (2016: 341) notes, in linguistic communication, ‘potentially every
42 detail can be turned into a resource for social interaction.’
43 But with this unlimited potential comes a set of daunting analytical
44 challenges. Kendon (1994) provides an early review of observational studies,
1 recruitable for meaning-making and second, that variation may reflect large-
2 scale social and ideological structures, then a broad view of the possibilities of
3 computational methodologies is inevitably a step forward. Any interactional
4 feature that can be recorded and defined cleanly is potentially available for
5 computational modeling, and such modeling allows us to put such features
6 under the microscope and uncover something about how these features
7 combine to produce social meaning.
8 We demonstrate the possibilities of such an analysis in this paper by
9 analyzing one such interactional variable – head cant (colloquially, side-to-side
10 tilt of the head) – in a multimodal dataset of 65 different speakers. We use
11 computational tools to examine how visual, textual, and acoustic properties
12 combine in interaction, and how these interactions correlate with social and
13 interactional factors. Of course, a statistical association does not directly reveal
14 social meaning, but indicates that meaning may be at work at the local level.
15 Thus, we allow our statistical analysis to guide us in our choice of specific
16 examples for qualitative analysis. Our analysis confirms head cant’s role as an
17 interactional variable, its robust connection to prosodic variation, and its
18 participation in communicative and social meanings having to do with floor
19 management and with a frame of shared understanding between the speaker
20 and interlocutor.
21 In section 2, we explain our methodology in detail, as well as the dataset
22 to which we apply it, which includes 65 speakers across two distinct
23 interactional contexts: YouTube video blog (henceforth ‘vlog’) monologues;
24 and experimentally-collected laboratory dialogues. We take advantage of a
25 computer vision algorithm to calculate head cant annotations automatically
26 and use these annotations to both generate statistical results and guide a
27 qualitative analysis, exploring the interactional functions of head cant in
28 three stages. In section 3, we consider the simple question of the
29 distribution of head canting: is cant more prevalent when an interlocutor
30 is physically present? Is head cant a listening gesture? In section 4, we
31 explore high-level statistical connections between head canting and prosodic
32 features indicative of conversational engagement. Then, in section 5, we
33 draw upon those connections to engage in a quantitatively-guided
34 qualitative analysis of head cant. This involves identifying particular
35 functions of head cant, discussing them in context, and providing
36 statistical support for these where possible.
37
38 1.1 Head movement and posture as interactional variables
39
Language researchers have long known that movements of the head can
40
participate in a diverse field of meanings. McClave (2000) provides a
41
comprehensive review, cataloguing an extensive list of functions of head
42
movement: as signals for turn-taking; as semantic and syntactic boundary
43
markers; to locate discourse referents; or to communicate meanings like
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 5
1 Thus, head cant’s meaning-making potentials are not by any means limited to
2 associations with gender and sexuality.
3 In this work we propose that many of the gendered associations of head cant
4 may stem from a deeper relationship between head cant and what Tannen and
5 Wallat (1987), building on the work of Goffman (1974), call the ‘interactive
6 frame’ – or the definition of what is taking place at a given interactional
7 moment – as well as the entailed alignment or orientation to one’s
8 interlocutor, or what Goffman (1981) calls ‘footing.’ In particular, head cant
9 appears to participate in communicating orientation towards the interlocutor
10 and a sense of shared understanding, in some cases even serving a relatively
11 explicit ‘bracketing’ function which speakers use to create parentheticals,
12 asides, and confessions.
13
14
2. CASE STUDY METHODOLOGY
15
16 In this study we investigate head cant as an interactional feature and a
17 semiotic resource. In this section we describe the selection of data,
18 preprocessing to prepare the data for analysis, and our computational
19 methodology for extracting head cant measurements.
20
21 2.1 Data
22
We compare two interactional contexts: two-person dialogues between friends
23
recorded in a laboratory setting; and video blog monologues on YouTube with
24
no apparent physically present interlocutor. We refer to these settings
25
throughout the paper as ‘Lab’ and ‘Vlog,’ respectively. The two settings
26
allow us to compare speakers who are anticipating and getting immediate
27
feedback from an interlocutor with those who are not. Our dataset in total from
28
these sources includes more than 18 hours of speech from 65 speakers.
29
30
Laboratory dialogues. The first interactional context is dyadic interactions
31
between familiars recorded in the Interactional Sociophonetics Laboratory at
32
Stanford University in California. The lab has the acoustical specifications of a
33
sound-proof recording booth to ensure high quality audio recordings, but is
34
staged as a living room to facilitate less self-conscious interactions. In addition
35
to being audio recorded via lavalier wireless microphones, interactants were
36
videorecorded by concealed video cameras (though their presence was known
37
to all participants) positioned to capture head-on images. As many computer
38
vision algorithms have been developed for video blog data, it was imperative
39
that speakers not be positioned at a significant angle to the camera lens.
40
Participants engaged in two conversational tasks. First, speakers discussed
41
their answers to a variety of ‘would you rather . . .’ questions, such as ‘Would
42
you rather always be overdressed, or always be underdressed?’ This task,
43
which lasted approximately five minutes, gave participants an opportunity to
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 7
1 relax into the recording environment and enabled the researcher to adjust
2 audio recording levels as needed. For the remainder of the approximately
3 30-minute recording session, speakers asked each other a variety of questions
4 presented on a large rolodex on a coffee table positioned between the
5 interactants. Questions, like ‘How has the way you dress changed since high
6 school?’, were chosen to encourage speakers to reflect on identity construction
7 without asking them about it explicitly. Participants were informed that they
8 could use questions as prompts as desired, but that their conversation did not
9 need to stick to the prompts at all. Following the recording session, participants
10 filled in online surveys designed to collect demographic information as well as
11 assessments of the interaction.
12 Data for 33 speakers are considered here. Of these, 22 were women, and 11
13 men. The great majority of the dyads were between friends or close friends
14 (according to participant characterizations of the relationship), with a handful
15 between romantic partners or family members. The majority of speakers were
16 undergraduates aged 18–22; the remainder of speakers were mostly in their
17 mid to late twenties. Although the results below focus on gender, the corpus
18 was reasonably diverse with respect to several other variables. Speakers
19 represented a range of racial groups. The majority self-identified as white, a
20 sizeable minority (of nine) as multiracial, and the remainder as African
21 American, Asian American, or Latinx. The majority of speakers were from the
22 West Coast of the U.S.A., though a significant group (of eight) were from the
23 South; the remainder were from the Northeast and Midwest.
24 Data were recorded directly onto a Mac Pro located in a room outside the
25 living room space. Each speaker was recorded onto separate audio and video
26 tracks. Each audio track was orthographically transcribed in Elan (Lausberg
27 and Sloetjes 2009) and force-aligned using FAVE to automatically determine
28 the timing for each word in the transcript based on its alignment with the
29 audio file (Rosenfelder et al. 2011).
30
31 Video blog monologues. Video blogs (‘vlogs’) are a form of computer-mediated
32 communication in which people record videos of themselves discussing their
33 lives or other topics of interest, to be shared with close friends or the public at
34 large. For this study, we manually collected a dataset of 32 vlogs from different
35 speakers. Since vlogs can be about a wide variety of topics, for the greatest
36 comparability with our laboratory data we focused on vlogs about three
37 emotive topics tied up in identity: high school students discussing their first day
38 of school; students discussing their experiences studying for and taking the
39 MCATs; and pregnancy vlogs in which pregnant women discuss various stages
40 and milestones of their pregnancies. Vlogs on such topics by women are far
41 more prevalent than those by men; therefore, in this study our Vlog dataset is
42 composed entirely of women. The dataset consists of mostly white speakers
43 (with a handful of Asian American speakers and one African American
44 speaker) ranging in age from mid-teens to approximately 40 years old.
© 2016 John Wiley & Sons Ltd
8 VOIGT ET AL.
1 boundaries for each spoken word from each speaker. We then used a
2 transcript-based method to extract phrases, defining a phrase as any
3 continuous set of words such that no word is more than 100 milliseconds
4 apart from the words surrounding it.
5 We did not, however, need manual transcripts to carry out many of the
6 analyses we were interested in. Since we did not have manual transcripts for
7 the Vlog data, we used an automatic heuristic based on the silence detection
8 function in Praat (Boersma and Weenink 2015) to extract phrases. We
9 generated phrases by running silence detection on the audio channel of each
10 video, defining sounding portions as phrases. The more accurate phrases in our
11 Lab data, extracted by the forced alignment method above, had an average
12 length of 1.50 seconds. We approximated this in the Vlog data by setting the
13 same 100 millisecond minimum boundary between sounding portions used
14 above and starting with a silence threshold of -25dB. We iteratively ran silence
15 detection, increasing or decreasing the silence threshold by 1dB and
16 re-running, until the average phrase length was as close as possible to 1.50
17 seconds.
18 While this procedure may have smoothed over some individual variation in
19 phrasal pacing, our primary need was for consistent units of analysis, which
20 we defined using phonetic rather than intonational, syntactic, or discursive
21 criteria for delineating phrase boundaries. In the analyses to follow we used the
22 transcript-based phrases for the Lab data and the silence-detection-based
23 phrases for the Vlog data; however, the results presented in the following
24 sections held even if we also used silence-detection-based units for the Lab
25 data, further suggesting that these units of analysis are roughly equivalent.
26
27 2.3 Head cant feature extraction
28
We calculated head tilt by adapting the shape-fitting algorithm of Kazemi and
29
Sullivan (2014), as implemented in the open-source machine learning library
30
dlib (King 2009). This algorithm is relatively computationally efficient and
31
robust to differences in video quality, lighting, and occlusion, which made it
32
feasible for the contextual diversity of our data (Figure 1).
33
For each frame of video in the dataset, we first used the standard face-
34
detection implementation in dlib to find the speaker’s face. We then used the
35
aforementioned shape-fitting algorithm on the detected face, with a model pre-
36
trained on the facial landmark data from the 300 Videos in the Wild dataset
37
(Shen et al. 2015) which outputs locations of 68 facial landmark points per
38
frame.
39
We could then calculate head cant using the points for the far corner of the
40
left and right eyes by triangulation (Figure 1). Assuming (as we do in this
41
dataset) a speaker roughly facing the camera, the cant angle is the arctangent
42
of the vertical displacement of these eye corner points over their horizontal
43
distance. We took the absolute value of these measurements as in this work we
44
© 2016 John Wiley & Sons Ltd
10 VOIGT ET AL.
1
Colour online, B&W in print
2
3
4
5
6
7
8
9
10
11
12
13
14 Figure 1: Left, shape-fitting output from Kazemi and Sullivan (2014), showing
robustness to occlusion. Right, visualization of our adaptation for the calculation of
15
head cant angle on a vlog from our dataset, calculated by first fitting a shape model
16 of the face to find landmark points as on the left, and then triangulating cant angle
17 from the corners of the eyes
18
19
20 were interested in head cant primarily as displacement from an upright
21 posture.
22 This method allowed us to generate a continuous estimate of head cant
23 throughout all the videos in our dataset, analogous to measures of acoustic
24 prosody like pitch and loudness, albeit at a more coarse resolution of once per
25 frame (30Hz for a video at 30 frames per second). This method inevitably
26 suffers from some limitations, since by the nature of large-scale automatic
27 modeling we expect the model to introduce noise. At moments of severe
28 occlusion – such as if a speaker turns fully away from the camera – or due to
29 peculiarities of the algorithm’s classification process, we may have failed to
30 detect a face in a given frame or failed to accurately fit the shape model. We
31 handled this by simply keeping track of these failures, and found that they
32 occurred in approximately six percent of frames in the dataset. In the statistical
33 analyses correlating head cant in prosody in section 4, we removed phrases
34 from the analysis where more than half of the video frames that occurred
35 during the phrase constitute classification failures of this type and as such have
36 no accurate measurement.
37 A related limitation lies in the fact that head cant is naturally implicated in
38 other bodily movements and postures, and our measurements may have been
39 affected by this. Body cant, in particular, where the speaker’s entire body is
40 tilted and thus necessarily the head as well, presents an interesting difficulty in
41 this regard. Nevertheless, in our qualitative analyses of the data we found this
42 phenomenon to be relatively rare, and indeed this challenge is perhaps
43 inherent to the study of embodiment. Even if we were hand-labeling the entire
44 dataset, it is unclear whether a body cant of 20 degrees with a relative head
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 11
1 cant of 0 degrees should be labeled as a head cant of 0 or 20, since the head is
2 straight relative to the body but at a 20 degree angle relative to the floor.
3 This difficulty becomes even more stark when we consider the potential for
4 perceptual entanglements. If one speaker’s head is canted their spatial
5 coordinates are necessarily rotated, so should their interlocutor’s head cant
6 best be conceived of relative to that rotated perception, or relative to some
7 ‘objective’ standard like the floor or other contextual grounding? All of the above
8 likely constitutes a direction for future research in its own right, so in this work
9 we sidestepped the issue by taking our computational method at face value.
10
11
3. HEAD CANT IN AND OUT OF INTERACTION
12
13 In framing the importance of head cant as an object of study in section 1, we
14 postulated it to be an ‘interactional variable,’ playing a role in functions such
15 as turn management between interlocutors. For example, head cant could
16 function as a listening posture, signaling the listener role, or it could also signal
17 interest in what the interlocutor is saying. We expected that neither of these
18 functions would be present in the Vlogs, which have no explicit interlocutor,
19 but that either or both could be present in the Lab data.
20 To explore this potential difference between datasets, we randomly sampled
21 5,000 individual frames of video from each speaker in the dataset, and
22 determined whether the head cant measured in that frame occurred during a
23 spoken phrase or not. As shown in Figure 2, speakers in the laboratory setting
24 used more head cant overall than those in vlogs, with a mean cant of 6.4
25 degrees as compared to vloggers’ mean cant of 4.5 degrees (two-sided t-test,
26 t = -105.4, df = 323,430, p < 0.001).
27 We observed no statistical difference between speech and non-speech
28 segments in the vlogs, while the Lab participants used more head cant while
29 not speaking than while speaking (two-sided t-test, t = -21.425, df = 135,860,
30 p < 0.001). Moreover, we saw gender effects within the laboratory data. While
31 men and women appeared to use nearly the same mean head cant of around
32 six degrees during speech segments, an ANOVA analysis revealed a significant
33 interaction effect with gender: men in our dataset used more head cant while
34 not speaking than did women (F = 192.5, p < 0.001).
35 The relative low amount of cant in the vlogs suggests that the movements
36 that people make while speaking and listening in the lab dialogues have an
37 interactive signaling effect. It supports an association between listening and
38 head cant, and it may suggest that cant is playing a role in floor management.
39 It could also, though, reflect the importance of an interlocutor in supporting
40 whatever other functions cant is playing.
41 Our results provide an interesting contrast with the results of Hadar et al.
42 (1983), who used a polarized-light goniometer technique to measure head
43 movements during conversation, finding evidence for constant movement
44 during talk, while listening was marked by the absence of head movement.
© 2016 John Wiley & Sons Ltd
12 VOIGT ET AL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 Figure 2: Distribution of head cant by gender and interactional context,
27 distinguishing between speech context, that is, whether the speaker is currently
28 speaking or not. Error bars represent 95 percent confidence intervals; these intervals
29 are small since the number of observations is very high
30
31 Together these results suggest that listening may be marked more by static but
32 perhaps meaningful postures (such as head cant) while speaking may be
33 marked by dynamic movements.
34 These findings also begin to challenge the gendered associations of head cant
35 mentioned previously. In our data men use more head cant overall, an effect
36 driven by their use during non-speech portions of the interaction. Given this,
37 we raise the question of whether women and men are doing more or less of the
38 same thing, or whether they are actually using cant differently. We will
39 explore these questions in the following sections.
40
41
4. HEAD CANT AND PROSODY
42
43 In the previous section we established a relationship between head cant and
44 the simple fact of speaking, showing that this relationship is affected by
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 Figure 3: Marginal effects of pitch and loudness on head cant across genders and
34 interactional contexts holding other factors constant. All variables are z-scored by
35 speaker, and observations in the model are silence-bounded phrases. Ribbons
36 represent estimated 95 percent confidence intervals around the trend line
37 findings are in accord with prior work comparing the degree to which prosodic
38 variables like pitch and loudness are ‘socially loaded’ differentially by gender:
39 for example, in a study of speed dates McFarland, Jurafsky and Rawlings
40 (2013) found that perceived friendliness is marked by pitch maximum and
41 variability in women as compared to loudness variability in men. Women in
42 Vlogs displayed a strong negative relationship between loudness and head
43 cant: for speakers in this category a decrease in loudness of one standard
44
© 2016 John Wiley & Sons Ltd
16 VOIGT ET AL.
1 the leaders in change, and looking close up at those leaders for commonalities
2 in their social characteristics.
3 Since we found women to combine greater head cant with high pitch and
4 low intensity, and men to combine head cant with high pitch and high
5 intensity, we extracted phrases high in head cant co-occurring with high pitch
6 and low intensity on the one hand, and with high pitch and high intensity on
7 the other. We defined ‘high’ and ‘low’ as the top and bottom 30 percent,
8 respectively, and considered five categories: high pitch alone; high intensity
9 alone; low intensity alone; high pitch with high intensity; and high pitch with
10 low intensity. For each category we randomly sampled and qualitatively
11 examined at least 100 of these central exemplars.
12 One of the strongest trends we observed in this examination was that head
13 cant is implicated both in floor management and in processes of signaling
14 shared understanding – and that the two cannot be easily separated. Head cant
15 appears to frequently be called on to establish that the speaker and the
16 interlocutor share (or ought to share) some pre-existing knowledge about the
17 discourse at hand. In this way we can view head cant as participating in shifts
18 in footing, in the sense of Goffman (1981); that is, head cant may subtly
19 modify the alignment or ‘interactional frame’ (Tannen and Wallat 1987) taken
20 up by a speaker in a given utterance.
21
22 5.1 The framing of shared understanding
23
The interactional frame of ‘shared understanding’ in which head cant
24
participates can take many forms. It appears to carry overtones of
25
friendship, a sense of obviousness, or a taking into confidence, and can
26
appear in the context of repetition or restating. In turn it may be used for many
27
purposes: to induce the interlocutor to interpret a claim as properly belonging
28
to a shared understanding; to propose a presupposition of such understanding
29
that softens an utterance for stylistic purposes; or to indicate dismissiveness of
30
the obvious thing.
31
For example, in the Vlog context, consider a YouTuber named Nat in a video
32
entitled ‘5 Weeks Pregnancy Vlog.’ Nat’s vlog records the journey of her
33
pregnancy, discussing physical and emotional changes throughout and
34
chronicling milestones along the way. Considering the audience design
35
which might influence her linguistic choices, it’s worth noting that Nat’s
36
channel is surprisingly popular, with over 40,000 subscribers, and as of this
37
writing the video in question has had more than 28,000 views; however, the
38
video in question is only the third published by her channel, so perhaps it had
39
far less viewership when it was made.
40
In Example 1, below, Nat is ten minutes into the video, and is talking about
41
telling her two best friends about her new pregnancy, both of whom have
42
children of their own, as well as her husband Weston telling his friends. This
43
moment follows a long and detailed account of telling her parents, and their
44
© 2016 John Wiley & Sons Ltd
18 VOIGT ET AL.
1 excited reactions. In contrast, she gives the story of telling those friends in a
2 few brief sentences, ending with:
3
Example 1
4
5 1 Nat: and they’re of course very excited
6 2 and very supportive and Weston told his two best friends
7
8 During this segment, Nat uses head cant in alternating directions with
9 reduced loudness and variable pitch (Figure 4). The overall effect is to create a
10 sense of obviousness but gratefulness in describing the reactions of her friends
11 to hearing of her new pregnancy, which is strengthened by co-occurrence with
12 the explicit ‘of course.’ Given the excited reactions of her parents she just
13 described in detail, and the knowledge Nat expects to share with her imagined
14 interlocutor that friends are generally excited about pregnancies, these head
15 cants contribute to framing the content of her utterance as almost going
16 without saying. We note that cant here is combined with semi-closed eyes and
17 a smile. While we cannot comment authoritatively on eye and mouth features
18 since we do not have equivalent data on them, it may be these features that
19 contribute intimacy and positive affect. We note that these features can also be
20 measured automatically, and ultimately an understanding of body movement
21 is going to require careful analysis of multiple and co-occurring gestures, or
22 constructions.
23 Moments later, Nat uses head cant again as she reiterates a point made
24 earlier in the discourse: the pregnancy is still meant to be kept a secret to
25 everyone but the couple’s parents and very best friends. Earlier Nat has
26 mentioned this fact several times, but tags it on with a clearly conspiratorial
27 stylistic move generated by not only her words but the near-whispered tone,
28 head cant combined with forward tilt, sly smile, wide open eyes, and a finger to
29 the lips (Figure 5). We note that unlike the earlier uses of cant, here it
30
31
Colour online, B&W in print
32
33
34
35
36
37
38
39
40
41 'and very supportive " 'told his two best friends'
42
43 Figure 4: Nat’s two head cants from 10:43–10:50 in Example 1 (https://youtu.be/
44 fLS8RFnCcII?t=10m43s)
1
Colour online, B&W in print
2
3
4
5
6
7
8
9 'knowing they’re supposed to be quiet'
10
11 Figure 5: Nat’s cant from 11:01–11:05 (https://youtu.be/fLS8RFnCcII?t=11m1s)
12
13 participates in a combination of gestures constituting a highly enregistered or
14 conventionally ‘iconic’ sign. One question we might ask is whether complex
15 enregistered signs like this one occur more frequently in monologues than in
16 face-to-face interaction, which would support the hypothesis that the lack of
17 an interlocutor calls for less subtle gestures.
18 Nat is invoking a set of shared beliefs about the social bonds associated with
19 pregnancy. Example 2 from the Lab setting is a little more risky, as two
20 interlocutors jointly confirm shared knowledge that might be face-threatening.
21 Two friends, a White female (speaker A) and a Hispanic male (speaker B), are
22 discussing the question of whether they have ever been mistaken for a person
23 of another race. The conversation has turned to talking about racial diversity
24 in AP (‘advanced placement’ or college-level) classes, as the Hispanic male
25 describes being mistaken for Asian by virtue of being in those classes, and in
26 other circumstances being mistaken for ‘every race except White.’ After a brief
27 joking digression about how the White female speaker could never be mistaken
28 for Black, she responds to the issue by bringing up the case at her high school:
29
30 Example 2
31
1 A: it was really weird at our school, cause, like
32
2 my school was like,
33 3 B: mostly. . .
34 4 A: a hundred percent White pretty much
35 5 B: White. . . yeah.
36
37 Immediately after the first ‘like’ in line 1, above, the speaker makes a shift to
38 a head-canted posture, and simultaneously her loudness decreases, her speech
39 rate increases, and her voice gets very creaky. These conditions hold through
40 the end of line 4, and her head cant holds in the canted posture as well. Her
41 cant marks a particular type of almost conspiratorial side comment, as if she is
42 making an overly obvious confession, the content of which her interlocutor
43 already knows (indeed, he produces simultaneous speech conveying the same
44 proposition), intensified by the exaggerated ‘a hundred percent’ (Figure 6).
© 2016 John Wiley & Sons Ltd
20 VOIGT ET AL.
1
2
3
4
5
6
7
8
9
10
11
12
13 ‘our school, cause, like’ ‘a hundred percent white’
14
15
16
17
18
19
20
21
22
23
24
25
26 ‘mostly...’ ‘White… yeah.’
27
28 Figure 6: A head cant and its overlapping, softer-spoken response as both speakers
29 discuss the ethnic diversity of their high schools in Example 2.
30
31
32 At the same time, the information in the canted clause is highly relevant to
33 the following discourse and is by no means obvious. After this comment, the
34 speaker goes on to discuss more detailed specifics about the diversity of her
35 high school and childhood community overall before returning to the topic of
36 AP classes, suggesting this is information of which her interlocutor was not
37 previously aware even though they are friends.
38 This example illustrates how head cant is used to establish footing for an
39 interactional frame of shared understanding, as opposed to an indication of
40 actual common ground in the sense of Clark and Brennan (1991). The speaker
41 is drawing upon head cant as an interactional resource to frame the revelation
42 of the lack of diversity at her high school in a particular way: as obvious,
43 expected, and perhaps even somewhat embarrassing.
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 21
1 Table 1: Odds ratios for discourse particles appearing in the top 30 percent of
2 phrases with the highest head cant as compared to the bottom 70 percent. Values
3 higher than 1.0 indicate a positive association with canted phrases, while those
4 lower than 1.0 indicate association with less canted phrases. P-values with Fisher’s
5 exact test are given in small font in parentheses; values for women are significantly
different while for men there is no association
6
7 Discourse particle Women Men
8
9 Shared understanding: I mean, you know 1.58 (p=0.03) 1.30 (p=0.29)
10 Hesitation/floor: um, uh, like 0.83 (p=0.01) 0.99 (p=1.00)
11
12
13 Across all 9,038 phrases in the Lab data, Fisher’s exact test (Fisher 1922)
14 shows (Table 1) that women are significantly more likely to use you know and I
15 mean in phrases with high head cant, and less likely to use um, uh, and like in
16 those phrases. Like, in particular, is strongly associated with phrases with low
17 head cant. Andersen (1998) compiles an extensive review of research on like,
18 finding that overall it acts as a ‘loose talk marker’ from a relevance-theoretic
19 perspective – that is, the speaker is opting to signal a pragmatic ‘discrepancy
20 between the propositional form of the utterance and the thought it represents.’
21 To look at a particular example, the following section of speech occurs
22 during a discussion of finding one’s ‘true passions,’ where the speaker is
23 expressing her surprise at finding that a set of activities in high school she
24 originally participated in to pad her resume turned into a more sincere passion.
25 During this extended turn (Example 3), the speaker starts with a relatively low
26 cant, initiating a slight cant at the first ‘actually’; however, she moves to a
27 large head cant directly upon the phrase containing you know, spoken with a
28 somewhat heightened pitch. This phrase marks the beginning of a
29 conversational aside not constituting the content of her own story, but
30 rather more as an attempt to gain ‘meta-knowledge,’ in Schiffrin’s terms, that
31 her interlocutor is also familiar with the background against which she was
32 making her decisions.
33
Example 3
34
35 1 A: I kinda felt like I was doing it for the resume
36 2 like in high school to be honest
37 3 like, but then like I actually really liked it and then like –
38 4 like you know how like when –
39 5 when you wanna like fill your transcript up with like –
6 I mean resume up with like a bunch of like activities.
40
7 Like you get to choose what activities you want
41 8 B: mmhm
42 9 A: and all the activities I chose were
43
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 23
1 Both the speaker and her interlocutor are students at an elite undergraduate
2 institution: the speaker’s use of you know helps to signal that she has made the
3 very reasonable assumption that her interlocutor, too, knows about needing to
4 bolster one’s resume in high school. Her head cant pointedly marks the
5 sentence and a half that follow as an almost redundant aside, helping to put
6 this decision-making context into a frame of shared understanding that will
7 allow her interlocutor to empathize with the experiences to follow (Figure 7).
8 The speaker continues to cant her head back and forth lightly during those
9 phrases, and as she finishes saying ‘choose what activities you want’ (line 7)
10 her interlocutor responds to the frame by smiling, nodding her head up and
11 down, and backchanneling ‘mmhm.’ Precisely as the speaker returns to
12 talking about her own experiences (line 9) her head cant returns to a neutral
13 upright position, suggesting the bracketing function of the cant has come to an
14 end.
15
16 5.3 Conversational acknowledgements
17
In the preceding section, we found a statistical relation confirming the link
18
between other-oriented discourse markers such as you know and I mean, but
19
these results held only for women in our dataset. One crucial difference across
20
genders in our prosodic findings from section 4 was that men’s increased head
21
cant is associated with increased loudness, precisely the opposite relationship
22
from that found in women.
23
In our analysis of phrases in the data matching the statistical trend – in this
24
case, louder phrases spoken by men with high head cant – we found that this
25
may accompany men’s backchannels, acknowledgements, and affirmative
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 ‘actually really liked it’ ‘you know how like when’ ‘resume up with’
41
42 Figure 7: A speaker canting as she makes an aside about a shared experience of
43 resume-padding in Example 3
44
© 2016 John Wiley & Sons Ltd
24 VOIGT ET AL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14 ‘to think of ideas’ ‘for you’ ‘yeah that’s’
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 ‘yeah… no, I’ ‘supported {laughter}’ ‘supported, yeah’
30
31 Figure 8: Two speakers trade cants and conversational acknowledgements in an
32 extended moment of sarcastic aside (Example 4)
33
34
35 We also found that, in the dialogue but not the monologue context, head
36 cant was more prevalent during times when the interactant was not speaking,
37 suggesting an association with listening. This could be a simple signal that one
38 is listening, yielding the floor, or it could communicate the listener’s
39 orientation in relation to the content. We found high-level statistical
40 correlations between elements of engaged prosody and head cant. There was
41 an overall positive relationship between increased cant in a phrase and
42 increased pitch, and a complex relation between cant and loudness.
43 All of these correlations showed important gender effects. Men canted while
44 listening more than women, suggesting that the traditional gendered
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 27
1 associations that link head cant to hegemonic femininity are likely not telling
2 the whole story. Increased cant correlated robustly with higher pitch among
3 women, but appeared only as a trend among men. Finally, while men’s
4 loudness increased with cant, women’s decreased, particularly in the Vlog
5 setting. The latter points to a qualitative gender difference, in which cant
6 appears to be playing a more important role in floor management for men than
7 for women.
8 This appears to be supported by the relation between cant and discourse
9 particles in the Lab data. We found that women’s phrases with high head cant
10 were associated with discourse particles having to do with shared
11 understanding like you know and I mean. This did not hold for men.
12 Conversely, for men but not for women, phrases with high head cant were
13 associated with conversational acknowledgements like mmhm and yeah,
14 suggesting more of a floor management function.
15 The set of gender differences we uncovered at every stage – across speaking
16 contexts, in prosodic correlations, and in particular lexical items – suggests that
17 the distribution of the communicative uses of head cant is gendered to some
18 extent. However, the relation of this feature to gender is neither simple nor direct.
19 We note that binary gender is low hanging fruit, as very little information is
20 required to assign speakers to the male or female category. Our attention to
21 gender in this study emerged initially from the previous literature, but it is
22 possible that equally interesting patterns may emerge with other macro-social
23 categorization schemes, such as class, ethnicity, or age. Ultimately, the meaning
24 of cant is not ‘male’ or ‘female,’ but qualities and orientations that differentiate
25 among and between the binary gender categories.
26 More broadly, we have shown that head cant is an interactional resource,
27 and in this capacity it interacts with both sound and text on the one hand and
28 other body movements on the other, to build higher level structures, or
29 interactional signs. Much work is needed to uncover the nature of gestural
30 signs, and their combinations, a challenge that is shared by current work in
31 variation in speech (e.g. Eckert 2016). Ultimately, this adds an entirely new
32 medium to the study of variation, and challenges us to integrate body
33 movement into our theories of variation.
34
35 6.1 Moving forward
36
Through an extended exploration of head cant, we hope this paper has
37
illustrated the value of taking a computational approach to embodiment.
38
Computational methods facilitate the analysis of larger datasets than are
39
typically employed in research examining the role of the body in interaction.
40
While micro-analyses of interaction have been and continue to be
41
instrumental to understanding the complex orchestration of multimodal
42
interactional resources in communication, large-scale analyses enable
43
researchers to consider other types of questions.
44
© 2016 John Wiley & Sons Ltd
28 VOIGT ET AL.
1 REFERENCES
2
Andersen, Gisle. 1998. The pragmatic marker like from a relevance-theoretic perspective. In
3 Andreas H. Jucker and Yael Ziv (eds.) Discourse Markers: Descriptions and Theory.
4 Amsterdam, The Netherlands: John Benjamins. 147–170.
5 Androutsopoulos, Jannis. 2010. Localizing the global on the participatory web. In Nikolas
6 Coupland (ed.) The Handbook of Language and Globalization. Chichester, U.K.: John Wiley and
7 Sons. 203–231.
8 Barsalou, Lawrence W. 2008. Grounded cognition. Annual Review of Psychology 59: 617–645.
9 Bates, Douglas, Martin Maechler, Ben Bolker and Steve Walker. 2015. Fitting linear mixed-
10 effects models using lme4. Journal of Statistical Software 67: 1–48.
11 Bee, Nikolaus, Stefan Franke and Elisabeth Andre. 2009. Relations between facial display, eye
gaze and head tilt: Dominance perception variations of virtual agents. Paper presented at
12
the 3rd International Conference on Affective Computing and Intelligent Interaction and
13
Workshops, IEEE, 10–12 September, De Rode Hoed, Amsterdam, The Netherlands.
14 Biel, Joan-Isaac and Daniel Gatica-Perez. 2013. The YouTube lens: Crowd-sourced personality
15 impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia 15: 41–55.
16 Birdwhistell, Ray L. 1952. Introduction to Kinesics: An Annotation System for Analysis of Body
17 Motion and Gesture. Louisville, Kentucky: University of Louisville.
18 Birdwhistell, Ray L. 1970. Kinesics and Context. Philadelphia, Pennsylvania: University of
19 Pennsylvania Press.
20 Boersma, Paul and David Weenink. 2015. Praat: Doing phonetics by computer [computer
21 program]. Version 6.0.08. Available at http://www.praat.org/
22 Burgess, Jean and Joshua Green. 2013. YouTube: Online Video and Participatory Culture.
Chichester, U.K.: John Wiley & Sons.
23
Cieri, Christopher. 2014. Challenges and opportunities in sociolinguistic data and metadata
24
sharing. Language and Linguistics Compass 8: 472–485.
25 Clark, Herbert H. and Susan E. Brennan. 1991. Grounding in communication. Perspectives on
26 Socially Shared Cognition 13: 127–149.
27 Clark, Herbert H. and Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking.
28 Cognition 84: 73–111.
29 Costa, Marco, Marzia Menzani and Pio Enrico Ricci Bitti. 2001. Head canting in paintings: An
30 historical study. Journal of Nonverbal Behavior 25: 63–73.
31 Cvejic, Erin, Jeesun Kim and Chris Davis. 2010. Prosody off the top of the head: Prosodic
32 contrasts can be discriminated by head motion. Speech Communication 52: 555–564.
33 Dhall, Abhinav, Roland Goecke, Jyoti Joshi, Karan Sikka and Tom Gedeon. 2014. Emotion
recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the
34
16th International Conference on Multimodal Interaction. New York: Association for
35
Computing Machinery. 461–466.
36 D’Onofrio, Annette, Katherine Hilton and Teresa Pratt. 2013. Creaky voice across discourse
37 contexts: Identifying the locus of style for creak. Paper presented at New Ways of Analyzing
38 Variation 42, 10–14 October, Carnegie Mellon University, Pittsburg, Pennsylvania.
39 Duman, Steve and Miriam A. Locher. 2008. ‘So let’s talk. Let’s chat. Let’s start a dialog’: An
40 analysis of the conversation metaphor employed in Clinton’s and Obama’s YouTube
41 campaign clips. Multilingua-Journal of Cross-Cultural and Interlanguage Communication 27:
42 193–230.
43 Eckert, Penelope. 2016. Variation, meaning and social change. In Nikolas Coupland (ed.)
44 Sociolinguistics: Theoretical Debates. Cambridge, U.K.: Cambridge University Press. 68–85.
1 Fisher, Ronald A. 1922. On the interpretation of v2 from contingency tables, and the
2 calculation of P. Journal of the Royal Statistical Society 85: 87–94.
3 Fox Tree, Jean E. 2007. Folk notions of um and uh, you know, and like. Text & Talk – an
4 Interdisciplinary Journal of Language, Discourse Communication Studies 27: 297–314.
Frobenius, Maximiliane. 2014. Audience design in monologues: How vloggers involve their
5
viewers. Journal of Pragmatics 72: 59–72.
6 Girshick, Ross, Jeff Donahue, Trevor Darrell and Jitendra Malik. 2014. Rich feature hierarchies
7 for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE
8 Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer
9 Society Conference Publishing Services. 580–587.
10 Glenberg, Arthur M. and Michael P. Kaschak. 2003. The body’s contribution to language.
11 Psychology of Learning and Motivation 43: 93–126.
12 Goffman, Erving. 1974. Frame Analysis: An Essay on the Organization of Experience. Cambridge,
13 Massachusetts: Harvard University Press.
14 Goffman, Erving. 1979. Gender Advertisements. New York: Harper & Row.
15 Goffman, Erving. 1981. Forms of Talk. Philadelphia, Pennsylvania: University of Pennsylvania
Press.
16
Grammer, Karl. 1990. Strangers meet: Laughter and nonverbal signs of interest in opposite-
17 sex encounters. Journal of Nonverbal Behavior 14: 209–236.
18 Griffith, Maggie and Zizi Papacharissi. 2009. Looking for you: An analysis of video blogs. First
19 Monday 15.
20 Hadar, Uri, T. J. Steiner, E. C. Grant and F. Clifford Rose. 1983. Kinematics of head movements
21 accompanying speech during conversation. Human Movement Science 2: 35–46.
22 Harley, Dave and Geraldine Fitzpatrick. 2009. Creating a conversational context through
23 video blogging: A case study of Geriatric1927. Computers in Human Behavior 25: 679–689.
24 Jefferson, Gail. 1984. Notes on a systematic deployment of the acknowledgement tokens
25 ‘yeah’; and ‘mm hm’. Papers in Linguistics 17: 197–216.
26 Jeon, Je Hun, Rui Xia and Yang Liu. 2010. Level of interest sensing in spoken dialog using
multi-level fusion of acoustic and lexical evidence. In Proceedings of the 11th Annual
27
Conference of the International Speech Communication Association (INTERSPEECH 2010).
28 Makuhari, Japan: International Speech Communication Association. 2806–2809.
29 Kang, Mee-Eun. 1997. The portrayal of women’s images in magazine advertisements:
30 Goffman’s gender analysis revisited. Sex Roles 37: 979–996.
31 Kazemi, Vahid and Josephine Sullivan. 2014. One millisecond face alignment with an
32 ensemble of regression trees. Proceedings of the IEEE Conference on Computer Vision and
33 Pattern Recognition.
34 Kendon, Adam. 1994. Do gestures communicate? A review. Research on Language and Social
35 Interaction 27: 175–200.
36 Kendon, Adam. 1995. Gestures as illocutionary and discourse structure markers in Southern
37 Italian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam. 2002. Some uses of the head shake. Gesture 2: 147–182.
38
Kendon, Adam. 2004. Gesture: Visible Action as Utterance. Cambridge, U.K.: Cambridge
39 University Press.
40 Kim, Minyoung, Sanjiv Kumar, Vladimir Pavlovic and Henry Rowley. 2008. Face tracking
41 and recognition with visual constraints in real-world videos. In proceedings of the IEEE
42 Conference on Computer Vision and Pattern Recognition (CPVR 2008). New York: IEEE
43 Conference Publications. 1–8.
44
© 2016 John Wiley & Sons Ltd
32 VOIGT ET AL.
1 King, Davis E. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research
2 10: 1755–1758.
3 Kollock, Peter and Marc Smith (eds.). 2002. Communities in Cyberspace. London/NewYork:
4 Routledge.
Krahmer, Emiel and Marc Swerts. 2005. How children and adults produce and perceive
5
uncertainty in audiovisual speech. Language and Speech 48: 29–53.
6 Kress, Gunther R. and Theo Van Leeuwen. 2001. Multimodal Discourse: The Modes and Media of
7 Contemporary Communication. New York: Oxford University Press.
8 Krizhevsky, Alex, Ilya Sutskever and Geoffrey E. Hinton. 2012. Imagenet classification with
9 deep convolutional neural networks. Advances in Neural Information Processing Systems 25:
10 1106–1114.
11 Kuhnke, Elizabeth. 2012. Body Language for Dummies. Chichester, U.K.: John Wiley & Sons.
12 Kuznetsova, Alexandra, Per Bruun Brockhoff and Rune Haubo Bojesen Christensen. 2013.
13 lmertest: Tests for random and fixed effects for linear mixed effect models (lmer objects of
14 lme4 package). R package version, 2(6).
15 Labov, William. 2001. Principles of Linguistic Change, II: Social Factors. Malden, Massachusetts:
Blackwell.
16
Lambertz, Kathrin. 2011. Back-channelling: The use of yeah and mm to portray engaged
17 listenership. Griffiths Working Papers in Pragmatics and Intercultural Communication 4: 11–18.
18 Lausberg, Hedda and Han Sloetjes. 2009. Coding gestural behavior with the NEUROGES-ELAN
19 system. Behavior Research Methods 41: 841–849.
20 Lee, Sinae. 2015. Creaky voice as a phonational device marking parenthetical segments in
21 talk. Journal of Sociolinguistics 19: 275–302.
22 Loehr, Daniel P. 2012. Temporal, structural, and pragmatic synchrony between intonation
23 and gesture. Laboratory Phonology 3: 71–89.
24 Matlock, Teenie, Michael Ramscar and Lera Boroditsky. 2003. The experiential basis of
25 meaning. In Richard Alterman and David Kirsh (eds.) Proceedings of the Twenty-fifth Annual
26 Conference of the Cognitive Science Society. Mahwah, New Jersey: Lawrence Erlbaum. 792–
797.
27
McClave, Evelyn Z. 2000. Linguistic functions of head movements in the context of speech.
28 Journal of Pragmatics 32: 855–878.
29 McFarland, Daniel A, Dan Jurafsky and Craig Rawlings. 2013. Making the connection: Social
30 bonding in courtship situations. American Journal of Sociology 118: 1596–1649.
31 McKenna, Stephen J., Sumer Jabri, Zoran Duric, Azriel Rosenfeld and Harry Wechsler. 2000.
32 Tracking groups of people. Computer Vision and Image Understanding 80: 42–56.
33 McNeill, David. 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago, Illinois:
34 University of Chicago Press.
35 McNeill, David. 2008. Gesture and Thought. Chicago, Illinois: University of Chicago Press.
36 Mendoza-Denton, Norma and Stefanie Jannedy. 2011. Semiotic layering through gesture and
37 intonation: A case study of complementary and supplementary multimodality in political
speech. Journal of English Linguistics 39: 265–299.
38
Mignault, Alain and Avi Chaudhuri. 2003. The many faces of a neutral face: Head tilt and
39 perception of dominance and emotion. Journal of Nonverbal Behavior 27: 111–132.
40 Mills, Janet. 1984. Self-posed behaviors of females and males in photographs. Sex Roles 10:
41 633–637.
42 Mondada, Lorenza. 2014. Bodies in action: Multimodal analysis of walking and talking.
43 Language and Dialogue 4: 357–403.
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 33
1 Mondada, Lorenza. 2016. Challenges of multimodality: Language and the body in social
2 interaction. Journal of Sociolinguistics 20: 336–366.
3 Murphy-Chutorian, Erik and Mohan Manubhai Trivedi. 2009. Head pose estimation in
4 computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31:
607–626.
5
Nevile, Maurice. 2015. The embodied turn in research on language and social interaction.
6 Research on Language and Social Interaction 48: 121–151.
7 Ochs, Elinor. 1992. Indexing gender. In Alessandro Duranti and Charles Goodwin (eds.)
8 Rethinking Context: Language as an Interactive Phenomenon. Cambridge, U.K.: Cambridge
9 University Press. 335–358.
10 Pellegrini, Stefano, Andreas Ess and Luc Van Gool. 2010. Improving data association by joint
11 modeling of pedestrian trajectories and groupings. In European Conference on Computer
12 Vision. Heidelberg, Germany: Springer Berlin Heidelberg. 452–465.
13 Poria, Soujanya, Erik Cambria, Newton Howard, Guang-Bin Huang and Amir Hussain. 2016.
14 Fusing audio, visual and textual clues for sentiment analysis from multimodal content.
15 Neurocomputing 174: 50–59.
Rautaray, Siddharth S. and Anupam Agrawal. 2015. Vision based hand gesture recognition
16
for human computer interaction: A survey. Artificial Intelligence Review 43: 1–54.
17 Rosenfelder, Ingrid, Joe Fruehwald, Keelan Evanini and Jiahong Yuan. 2011. FAVE (Forced
18 Alignment and Vowel Extraction) Program Suite. Available at http://fave.ling.upenn.edu
19 Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma and
20 Zhiheng Huang. 2015. Imagenet large scale visual recognition challenge. International
21 Journal of Computer Vision 115: 211–252.
22 Scherer, Klaus R. 2003. Vocal communication of emotion: A review of research paradigms.
23 Speech Communication 40: 227–256.
24 Schiffrin, Deborah. 1987. Discourse Markers. Cambridge, U.K.: Cambridge University Press.
25 Schuller, Bj€orn, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian A.
26 M€ uller and Shrikanth S. Narayanan. 2010. The INTERSPEECH 2010 paralinguistic
challenge. In Proceedings of the 11th Annual Conference of the International Speech
27
Communication Association (INTERSPEECH 2010). Makuhari, Japan: International Speech
28 Communication Association. 2795–2798.
29 Shan, Caifeng. 2012. Smile detection by boosting pixel differences. IEEE Transactions on Image
30 Processing 21: 431–436.
31 Sharma, Devyani. 2016. Series introduction. Journal of Sociolinguistics 20: 335.
32 Shen, Jie, Stefanos Zafeiriou, Grigorios G. Chrysos, Jean Kossaifi, Georgios Tzimiropoulos and
33 Maja Pantic. 2015. The first facial landmark tracking in-the-wild challenge: Benchmark
34 and results. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
35 1003–1011.
36 Suarez, Jesus and Robin R. Murphy. 2012. Hand gesture recognition with depth images: A
37 review. In 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human
Interactive Communication. Piscataway, New Jersey: Institute of Electrical and Electronics
38
Engineers. 411–417.
39 Tang, Siyu, Mykhaylo Andriluka and Bernt Schiele. 2014. Detection and tracking of occluded
40 people. International Journal of Computer Vision 110: 58–69.
41 Tannen, Deborah and Cynthia Wallat. 1987. Interactive frames and knowledge schemas in
42 interaction: Examples from a medical examination/interview. Social Psychology Quarterly 1:
43 205–216.
44
© 2016 John Wiley & Sons Ltd
34 VOIGT ET AL.
1 Trouvain, J€urgen and William J. Barry. 2000. The prosody of excitement in horse race
2 commentaries. In R. Cowie, E. Douglas-Cowie and M. Schr€ oder (eds.) Proceedings of the
3 International Speech Communication Association Workshop on Speech and Emotion. Belfast,
4 Ireland: Textflow. 86–91.
Viola, Paul and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple
5
features. In CVPR 2001: Proceedings of the 2001 IEEE Computer Society Conference on
6 Computer Vision and Pattern Recognition. Piscataway, New Jersey: Institute of Electrical and
7 Electronics Engineers. 511–518.
8 Voigt, Rob, Robert J. Podesva and Dan Jurafsky. 2014. Speaker movement correlates with
9 prosodic indicators of engagement. Speech Prosody 7.
10 Wang, William Yang and Julia Hirschberg. 2011. Detecting levels of interest from spoken
11 dialog with multistream prediction feedback and similarity based hierarchical fusion
12 learning. In Proceedings of the SIGDIAL 2011 Conference. Stroudsburg, Pennsylvania:
13 Association for Computational Linguistics. 152–161.
14 W€ollmer, Martin, Felix Weninger, Tobias Knaup, Bj€orn Schuller, Congkai Sun, Kenji Sagae
15 and Louis-Philippe Morency. 2013. Youtube movie reviews: Sentiment analysis in an
audio-visual context. IEEE Intelligent Systems 28: 46–53.
16
Zimman, Lal. 2015. Creak as disengagement: Gender, affect, and the iconization of voice
17 quality. Paper presented at New Ways of Analyzing Variation 44, 22–25 October, Toronto,
18 Canada.
19
20
21 Address correspondence to:
22
23 Rob Voigt
24 Stanford University – Linguistics Department
25 Margaret Jacks Hall
26 Building 460
27 Stanford, CA 94305
28 U.S.A.
29 robvoigt@stanford.edu
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
© 2016 John Wiley & Sons Ltd
THEME SERIES: INTERACTION 35
Article: 12216
Dear Author,
During the copy-editing of your paper, the following queries arose. Please
respond to these by marking up your proofs with the necessary changes/
additions. Please write your answers on the query sheet if there is insufficient
space on the page proofs. Please write clearly and follow the conventions
shown on the attached corrections sheet. If returning the proof by fax do not
write too close to the paper’s edge. Please remember that illegible mark-ups
may delay publication.
or and/or
Insert double quotation marks (As above)
or
Insert hyphen (As above)
Start new paragraph
No new paragraph
Transpose
Close up linking characters