0% found this document useful (0 votes)
21 views12 pages

Longform Transcription Conventions en US..su

This document outlines transcription conventions for long-form audio transcription, including guidelines for handling various speech types, punctuation, and formatting. It emphasizes the importance of high-quality transcription that adheres to specific rules regarding spelling, grammar, and speaker identification. The document also details the use of special symbols and markup for indicating different audio features and provides examples for clarity.

Uploaded by

N J3 C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

Longform Transcription Conventions en US..su

This document outlines transcription conventions for long-form audio transcription, including guidelines for handling various speech types, punctuation, and formatting. It emphasizes the importance of high-quality transcription that adheres to specific rules regarding spelling, grammar, and speaker identification. The document also details the use of special symbols and markup for indicating different audio features and provides examples for clarity.

Uploaded by

N J3 C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

[en_US] Transcribe Long-Form Transcription

Conventions
Note: The portion of interfering speech should also have its own separate segment. If the speech corresponds to one or more identifiable speaker,
each interfering speaker should receive a segment of type Speech and the content should be transcribed, or consist of the unintelligible tag (()). If the
interfering segment corresponds to indistinct chatter, or if the individual speakers in the segment cannot be clearly identified, the segment should be of
type Babble.

CONFIDENTIAL

Convention Version: 3.2


Segmentation Guidelines Version: 3.2
Release Date: 2021.11.15 (Change log)

1. Introduction
2. General Instructions
3. Transcription Conventions
3.1. Characters and Special Symbols
3.2. Spelling and Grammar
3.2.1. Mispronounced Words
3.2.2. Dialectal Pronunciations
3.2.3. Non-Standard Usage
3.3. Capitalization
3.4. Abbreviations, Acronym Words, Initialisms, Non-initialism Letter Sequences
3.4.1. Abbreviations
3.4.2. Acronym Words
3.4.3. Initialisms
3.4.4. Non-initialism Letter Sequences
3.4.5. Additional Examples
3.5. Contractions
3.6. Interjections
3.7. Numbers
3.8. Proper Nouns
3.8.1. Generic Human Names
3.8.2. Branded Names or Specific Names
3.9. Punctuation
3.9.1. Sentence-level Punctuation
3.9.2. Word-level Punctuation
3.10. Disfluent Speech
3.10.1. Stumbled Speech, Repetitions, and Truncated Words
3.10.2. Filler Words
3.11. Multiple Speakers
3.11.1. Non-Overlapping Interfering Speech
3.11.2. Overlapping Interfering Speech
3.12. Unintelligible Speech
3.13. Non-Target Languages
3.14. Non-Speech
3.14.1. Human vocal noises
3.14.2. Non-speech noises
3.14.3. Silence/Pauses
4. Appendix A: The Complete Set of Non-Speech Tags and Other Markup Tags
4.1. Markup Tags
4.2. Tags for Noise, Silence, and Non-overlapping Interfering Speech

1. Introduction
Transcription is the commitment of an audio signal to textual representation. This can include representing speech data as well as other sound types
such as phones ringing or music. In order to train machine intelligence transcription systems, the training data must be of high quality. In this case,
"high quality" means transcribing in a consistent and accurate manner, in careful concert with the parameters outlined in the guidelines.

2. General Instructions

Confidential. Page 1
All transcriptions must:

Meet the quality requirements set out in the Transcribe Data Quality and Delivery Requirements document;
Be in .json format and adhere to the format and structure of the schema set out in the Transcribe Multi-Segment Transcription JSON
Schema document;
Contain timestamped segments, created following the Transcribe Long-Form Transcription Segmentation Guidelines;
Contain Speech segments, with each Speech segment transcribing the speech of one and only one speaker (see the Multiple Speakers
section on how to handle Speech segments with multiple speakers);
Use UTF-8 encoding;
Adhere to the transcription conventions set out in Section 3 below.

3. Transcription Conventions
3.1. Characters and Special Symbols
Transcription should include only upper and lowercase letters, apostrophes, commas, exclamation points, hyphens, periods, question marks, spaces,
and a limited set of special mark-up symbols.

Don't use numerals (e.g., 1, IV) and special symbols (e.g., $, +, @) to transcribe spoken words. See Section 3.8.2 on details of dealing with invalid
character and symbols in stylized brand names.

"I have like zero dollars" --> I have like zero dollars. (NOT: $0)
"it was great slash weird" --> It was great slash weird. (NOT: great/weird)
"six plus six equals twelve" --> Six plus six equals twelve. (NOT: 6 + 6 = 12)
"my email is m-golden@gmailcom" --> My email is M dash golden at Gmail dot com. (NOT: m-golden@gmail.com)

Below is the set of special mark-up symbols used in the transcription to indicate certain features or events within an audio file (e.g., unintelligible
speech, code-mixing). Do not use these symbols for any reason other than as mark-up language.

Symbol(s) Name Use

<> Angle brackets Around opening and closing tags e.g., <initial>.

: Colon In conjunction with angle brackets and slash for non-target language tag e.g., <lang:Foreign></lang:Foreign>.

(()) Double parentheses Around unintelligible speech or overlapping speech of three or more speakers.

# Hashtag In front of filler words (aka, filled pauses).

/ Slash In conjunction with angle brackets for closing markup tags e.g., </initial>.

[] Square brackets Around non-speech tags such [cough].

~ Tilde To indicate truncated speech.

3.2. Spelling and Grammar


Use standard orthography rather than phonetic spelling to transcribe what the speaker says.

Spell-check all transcription files after transcription is complete. When in doubt about the spelling of a word or name, consult the American Heritage
Dictionary: https://ahdictionary.com/. To reference the names of song titles, movies, TV shows, brands, etc. use http://.com/ or, if necessary, htt p://
google.com/.

3.2.1. Mispronounced Words


Transcribe mispronunciations using the standard spelling. For example, "San Jose" mispronounced as [san-joe-say] by a non-native speaker should
still be transcribed as "San Jose".

"call your representive" --> Call your representative.

Confidential. Page 2
"to protect the anonimty of our client" --> To protect the anonymity of our client.

3.2.2. Dialectal Pronunciations


Transcribe dialectal pronunciations using the spellings of the "standard" forms, unless such dialectal pronunciations are codified in an accepted written
version of the dialect.

"give me that red dress (pronounced [rid-dris])" --> Give me that red dress. (NOT: that rid dris)
"we wish you a Merry (pronounced [mur-rey]) Christmas" --> We wish you a Merry Christmas. (NOT: Murray)
"fool (pronounced as [foo]) me once" --> Fool me once. (NOT: Foo me once)
"make a left on twelfth street (pronounced [shtreet])" --> Make a left on twelfth street. (NOT: shtreet)
"s’all well n' good darlin'"= it's all well and good darling. (NOT: 'tis, n', darlin')

3.2.3. Non-Standard Usage


Transcribe a speaker's utterances verbatim, even in cases when the speaker's utterances do not conform to the standard grammar of the language.
Do not correct grammatical "mistakes" or variations made by the speaker.

"he been done work" --> He been done work.


"we be playing basketball after work" --> We be playing basketball after work.

The same goes for non-standard or unexpected word choice. Transcribe the words as they are spoken, not as what is expected.

"the volcano said I lava you" --> The volcano said I lava you. (NOT: I love you)
"we let the bag out of the cat" --> We let the bag out of the cat. (NOT: let the cat out of the bag)

3.3. Capitalization
Transcription should follow the accepted capitalization patterns. For example, capitalize the first word of a sentence, proper names (e.g., Jeff Bezos,
France), Acronym words (e.g., POTUS), initialisms (e.g., IBM, SAT), non-initialism letter sequences (e.g. J O H N), and so on.

"I want to visit Oregon" --> I want to visit Oregon.


"I work at NASA" --> I work at NASA.
"I'm going to Mexico on Thursday" --> I'm going to Mexico on Thursday.

Please note that capitalization conventions should be followed even in email addresses.
"The email is johnwhyte@gmail.com" --> The email is John W H Y T E at Gmail dot com.

3.4. Abbreviations, Acronym Words, Initialisms, Non-initialism Letter Sequences

3.4.1. Abbreviations
An abbreviation refers to a truncated or shortened written form of a word (e.g. "oz" for "ounce", "Cal" for "California"). Many abbreviations are often
pronounced as full, non-contracted words.

Always spell out the full word when pronounced as such. Don't introduce abbreviations in the transcription.

"whoa he's 6 ft 2" --> Whoa he's six foot two! (NOT: 6 ft 2)
"talk to Professor Smith immediately" --> Talk to Professor Smith immediately. (NOT: Prof Smith)

Use an abbreviation only if the speaker explicitly pronounces the word as abbreviated. Don't add a period after an abbreviated word (unless it appears
at the end of a sentence).

"I live in Cambridge, Mass (pronounced [mass])" --> I live in Cambridge, Mass.
"Billie Jean King went to Cal (pronounced [kal]) State" --> Billie Jean King went to Cal State.
"I'll always choose UMass (pronounced [yu-mass]) over CalTech (pronounced [kal-tak])" --> I'll always choose UMass over CalTech.

The titles Ms, Mrs, Mr, and Mx (pronounced [miks]) that prefix a person's name are considered words in their own right, not abbreviations. When used
as titles, transcribe them as Ms., Mrs., Mr., and Mx. Spell all other personal titles (e.g., “doctor”, “professor”, “junior”, etc.) in full. In US English, titles
are followed by the period. When used as direct addresses (without a following name), transcribe them as spelled-out forms (e.g., mister or missus).

"Mr. Smith this way please" --> Mr. Smith, this way please.
"hey mister can you help me with this survey" --> Hey, mister, can you help me with this survey?
"doctor chen is running late" --> Doctor Chen is running late.

3.4.2. Acronym Words


Acronym words refer to meaningful words (i.e., a lexical entry that can be listed in a dictionary) or names of entities (e.g., companies, brands,
products, etc) formed by the initial letters or letter sequences of other words. They are spoken as words (e.g. NATO, TESOL) or spoken as mixtures of
individual letters plus words (e.g. JSON, IUPAG).

Confidential. Page 3
Transcribe acronym words as words in upper case letters without white spaces or periods between the letters, and without any markup tags around
them.

"I work for NASA" --> I work for NASA.


"AIDS has a substantial impact on society" --> AIDS has a substantial impact on society.
"can you pretty print this JSON file" --> Can you pretty print this JSON file?
"let me walk you through some IUPAC (pronounced [i ju pæk]) nomenclatures." --> Let me walk you through some IUPAC nomenclatures. (N
OT: I U PAC or <initial>IU</initial> PAC)
"the NAACP (pronounced [en-double-ey-see-pee]) kicked off its annual meeting in Orlando" --> The NAACP kicked off its annual meeting in
Orlando. (NOT: N double <initial>ACP</initial>)

3.4.3. Initialisms
Like acronym words, initialisms also refer to meaningful words or names of entities formed by the initial letters or letter sequences of other words. But
unlike acronym words, initialisms are spoken as a series of individual letters (e.g., EU, IBM, HTTP).

Transcribe initialisms according to their common written forms (preserving casing and punctuation). Enclose initialisms within the <initial> and <
/initial> tags.

"they fear that joining the EU will impact local businesses" --> They fear that joining the <initial>EU</initial> will impact local businesses.
"the U R L is H T T P colon two forward slashes W W W dot Gmail dot com slash" --> The <initial>URL</initial> is <initial>HTTP</initial>
colon two forward slashes <initial>WWW</initial> dot Gmail dot com slash.
"I work for IBM" --> I work for <initial>IBM</initial>.
"I like ZZ Top" --> I like <initial>ZZ</initial> Top.

Don't include inflection markers (such as -s, -ing, -ed, 's) within the <initial></initial> tags. Transcribe inflection markers immediately after the </initial>
tag.

"first, i SSHed into the server" --> First, I <initial>SSH</initial>ed into the server. (NOT: <initial>SSHed</initial>)
"folks, please place your photo IDs on this table" --> Folks, please place your photo <initial>ID</initial>s on this table. (NOT: <initial>IDs<
/initial>)

Don't include any additional tags within the <initial></initial> tags. If non-speech tags are used to represent the audio, place them before the <initial>
tag.

"I'll be taking my S (cough) AT next month" --> I'll be taking my [cough] <initial>SAT</initial> next month. (NOT: <initial>S[cough]AT</initial>)

Don't use <initial></initial> tags to enclose a letter sequence that doesn't stand for a single meaningful word. In those cases, transcribe them as non-
initialism letter sequences (see below).

"Charlotte's Web's author is E. B. White" --> Charlotte's Web's author is E. B. White. (NOT: <initial>E. B.</initial>, because E. B. together
don't form a meaningful word)

3.4.4. Non-initialism Letter Sequences


Similar to initialisms, non-initialism letter sequences are spoken as series of individual letters. However, non-initialism letter sequences don't refer
to meaningful words or names of entities. They are spelled out letter-strings or single letters within an utterance (e.g., a spelled out name, email
address, chemical notations, name initials).

Transcribe non-initialism letter sequences as individual upper case letters, with each letter separated by a space, and without any markup tags around
the sequences.

"my name is John – jay, oh, eich, en" --> My name is John J O H N.
"the domain nyt.net (pronounced [en-wai-tee-dot-net]) is still available. --> The domain N Y T dot net is still available.

Transcribe single letters within an utterance as non-initialism letter sequences.

"George W. has a Scottish terrier" --> George W. has a Scottish Terrier. (NOT: <initial>W.</initial>)
"there is a difference between B2B and B2C marketing" --> There is a difference between B to B and B to C marketing. (NOT: <initial>B2B<
/initial> and <initial>B2C</initial>)

Transcribe chemical notations as non-initialism letter sequences.

"let's write the equation for NaOH + H2O" --> Let's write the equation for N A O H plus H two O.
"how do you draw the Lewis structure for H2SO4 sulfuric acid" --> How do you draw the Lewis structure for H two S O four, sulfuric acid?

3.4.5. Additional Examples

Common Written Form Pronunciation Classification Expected Transcription

ASAP ey sap Acronym word ASAP

ASAP ey es ey pee Initialism <initial>ASAP</initial>

a.m. (i.e. before noon) ey em Initialism <initial>a.m.</initial>

Confidential. Page 4
B2B (for business-to-business) bee to bee Single letters B to B

cis@email.com see ay es at email dot com Non-initialism letter sequence C I S at email dot com

FAQ fak Acronym word FAQ

FAQ ef ay kyu Initialism <initial>FAQ</initial>

FAQs ef ay kyus Initialism + inflection <initial>FAQ</initial>s

MI5 em ai faiv Initialism <initial>MI</initial> five

OMG oe em gee Initialism <initial>OMG</initial>

PS5 pee es faiv Initialism <initial>PS</initial> five

lb (or pound) pound Abbreviation pound

R&D ar and dee Single letters R and D

Ste. (or Suite) sweet Abbreviation Suite

VIP vee ai pee Initialism <initial>VIP</initial>

OK; okay o kei -- OK; okay. (For this word in English, the initialism tag is not needed).

3.5. Contractions
Standard contractions must be transcribed as they are pronounced (e.g., isn't, where's, y'all). Include the apostrophe in the spelling.

Transcribe the following contractions as a single word:

gimme
gonna
gotta
lemme
wanna
whatcha (Note: The variant spelling of "watcha" can also be used.)
kinda

Note: "all right" should be transcribed as two words, not as "alright".

3.6. Interjections
Interjections are words or expressions that speakers use within an utterance to express affirmation, surprise, or negation. Each language has its own
specific set of interjections that speakers can use. When transcribing interjections, use language-specific standardized spellings. Interjections do not
require any special mark-up symbols.

For English, we transcribe only the following interjections:

eee mm uh-oh
ew mhm whoa
huh nah whew
hmm oh yay
jeez uh-huh yep

Notes:

Interjections are not to be confused with filler words. See Section 3.10.2 for guidelines on filler words.
In particularly, the interjection "hmm" is not to be confused with the filler word "#hm". Use context to disambiguate the two different uses.

3.7. Numbers

Confidential. Page 5
Spell out numbers in full using the alphabet according to how the speaker says them. This applies to both cardinal (e.g., 0, 215) and ordinal numbers
(e.g., 1st, 5th).

"5" --> Five


"5th" --> Fifth
"306" --> Three hundred and six / Three oh six / Three zero six (Depending on how it was pronounced).
"play radio 109.4 FM" --> Play radio one oh nine point four <initial>FM</initial>. / Play radio one zero nine point four <initial>FM</initial>.
(Depending on how it was pronounced).
"the serial number is rtma98" --> The serial number is R T M A nine eight. / The serial number is R T M A ninety-eight. (Depending on how it
was pronounced).
"the number is 1 (888) 280-4331 (pronounced [one triple eight two eight oh dash four three three one])" --> The number is one triple eight two
eight oh dash four three three one.

When spelling out numbers, use hyphens as required by the rules of the language. In US English, numbers from twenty-one through ninety-nine are
spelled with hyphens. Others are not hyphenated.

"twenty-five"
"three hundred"
"five hundred fifty-two"
"nineteen forty-five"

3.8. Proper Nouns

3.8.1. Generic Human Names


Transcribe generic human names according to standard or common spellings for names in the target locale.

If a name has multiple spellings, use the most frequent spelling. If you are not sure, make your best guess. Strive for consistency within a file (i.e. use
the same spelling when referring to the same individual). Do not add the <lang:Foreign> tags to names, even if they sound foreign to the locale, in
such cases transliterate following common rules. See more details in section 3.13 (Non-Target Languages).

Katherine / Catherine are two common spellings of the same human name; they are equally acceptable.
Katharyn is an uncommon spelling of the same human name above; it should be avoided.

3.8.2. Branded Names or Specific Names


When a specific or branded name contains punctuation (e.g. ! and :) or characters not allowed in the conventions (e.g., $, +), there are two strategies
to transcribe them.

1. Omit the punctuation, special characters, or symbols as long as the pronunciation is not impacted.

Yahoo! --> Yahoo


Mission: Impossible --> Mission Impossible

2. Replace the punctuation, special characters, or symbols with accepted characters or words in order to preserve the pronunciation.

Ke$ha --> Kesha


P!nk --> Pink
AT&T --> A T and T

Follow the capitalization in the official spelling of specific or branded names.

eBay
iPad

Keep the hyphens in proper nouns only if they are parts of the official spellings.

"let's go to Chick-fil-A" --> Let's go to Chik-fil-A.


"the C-SPAN network covers political events in the United States." --> The C-SPAN network covers political events in the United States.

Spell out numbers used in proper names in full using the alphabet according to how the speaker says them.

"U2 is my favorite band" --> U two is my favorite band.


"MI5 has no powers of arrest" --> <initial>MI</initial> five has no powers of arrest.

Branded names of non-English origin that are fully adopted into common American English usage should be transcribed using the standard
orthography of English. See Section 3.13 on transcribing non-target languages. Transcriptionists should not spend too much time deciding whether to
classify something as non-target; if you are not sure, default to no tagging.

Confirm the spellings of proper nouns (names of brands, companies, celebrities, well-known fictional characters) through brief research.

3.9. Punctuation

Confidential. Page 6
Only apostrophes, commas, exclamation points, hyphens, periods, question marks should be used as punctuation marks. Don't use any other English
punctuations (e.g., semi-colons, and quotation marks).

Use these punctuations as required by the grammar rules.

3.9.1. Sentence-level Punctuation

Sentence-level Punctuation

Periods Use a period only at the end of a complete sentence that is a statement.

That city is safe.

Note: periods are sometimes used at the word-level. See word-level punctuations for details.

Question Use a question mark only after a direct question or a tag question.
Marks
Isn't that simple?
You know the answer, don't you?

Exclamation Use an exclamation point at the end of a sentence when you feel or hear an emphatic stress or intonation. An exclamation point
Points usually marks an outcry or an emphatic or ironic comment.

That's the biggest pumpkin I have ever seen!


When will I ever learn!

Commas Use commas to break up long stretches of speech. This is to facilitate reader comprehension. Below are some suggestions of
when a comma should be used:

To separate items in a list of three or more, using the serial comma (a.k.a the Oxford comma, which comes before the
conjunction which joins the last two elements of a list)::

I enjoy skydiving, snowboarding, and mountain biking.


To set off a direct address:

Meredith, listen to me carefully.


I'm not calling you, my friends, just to whine about my life.
To break up compound and complex sentences:

I would like to join you, but I'm afraid I have class at that time.
Marcos and I couldn't go to the jazz concert, so we watched it on Netflix instead.
To set off introductory words and phrases:

Therefore, they cancelled their trip.


After taking a break, the team resumed their meeting.
Around parenthetical phrases:

That report on the New York Times was, to say the least, a bombshell.
Getting a hotel by the sea, like the one we stayed last year, would be superb.

3.9.2. Word-level Punctuation

Apostrophes Use apostrophes in contractions, possessives of individual letters, possessive "s", or as part of a person's name.

That's where it's at.


Project Q's timeline
Sinead O'Connor
Eleven o'clock
Read Jess' email.

Confidential. Page 7
Hyphens Use hyphens according to standard orthographic rules of the language. If it is not clear if a compound word should be spelled
with a hyphen or not, Reference the American Heritage Dictionary as a reference.

Here are a few examples of English compound words that can (or sometimes must) use hyphens:

A-line
D-day
ex-boyfriend, ex-drummer
extra-loud
self-aware
T-shirt
U-turn
V-neck
X-ray

For hyphens in numbers, see Section 3.7. For hyphens in branded names, see Section 3.8.2.

Periods Use periods with titles (such Mrs., Ms., Mx.) and lowercase abbreviations (such as e.g., a.m., a.k.a.). Most established
abbreviations can be found in a good dictionary. When in doubt, consult the dictionary to check if the period is needed for an
abbreviation.

Ms. Beesly

Use periods for initials standing for given names.

E. B. White
George W. Bush

Don't use periods with abbreviations that appear in full capitals, even if lowercase letters appear within the abbreviation.

"she is interviewing for the VP role" --> She is interviewing for the <initial>VP</initial> role.
"she is studying for a PhD in sociology" --> She's studying for a <initial>PhD</initial> in sociology.

Lowercase abbreviations, when pronounced as series of letters, should also be enclosed with the <initial></initial> tags.

i.e. (pronounced [ai-ee]) --> <initial>i.e.</initial>


a.k.a. (pronounced [ey-kay-ey]) --> <initial>a.k.a.</initial>

When transcribing a language other than English, use punctuation symbols and rules that are appropriate for that language. This could happen when
a speaker switches to a foreign language in the middle of a segment. In this case, the foreign punctuation symbols should be within the foreign
language tags <lang:Foreign></lang:Foreign> described in Section 3.13.

Hey, y'all. <lang:Spanish>¡Hola! ¿Cómo estás?</lang:Spanish> Sorry I'm late.

Note: Some punctuation use is stylistic/subjective. Differences of opinion are not necessarily errors.

3.10. Disfluent Speech


Disfluent speech refers to any interruption of the normal flow of speech. Speakers may stumble over their words, repeat themselves, utter truncated
words, restart phrases or sentences, and use hesitation sounds (i.e. filler words).

3.10.1. Stumbled Speech, Repetitions, and Truncated Words


Make your best effort to transcribe stumbled speech and repetitions according to what you hear after listening to the segment a few times.

"Directions to the… to the… the hotel" --> Directions to the to the the hotel.

Use tildes to indicate truncated words, whether at the beginning or the end.

"Ale… alexa … stop the mu… the music" --> Ale~ Alexa, stop the mu~ the music.
"...lexa play Janet Jackson… no wait…" --> ~lexa, play Janet Jackson. No, wait.
"N… n… no. It's Ch… Chom… Chomsky who said that" --> N~ n~ no. It’s Ch~ Chom~ Chomsky who said that.

3.10.2. Filler Words


Filler words are "words" that speakers use to indicate hesitation or fill a pause in order to maintain control of a conversation while thinking of what to
say next.

Each language has a limited set of filler words that speakers can use. For English, transcribe only the following fillers, preceded by the hashtag:

#ah
#er
#hm
#uh

Confidential. Page 8
#um

Don't alter the spelling of filler words to reflect how the speaker pronounces the word. If the speaker says a filler word that does not match any of the
listed filler words, transcribe the filler word that is closest in pronunciation.

Notes:

Filler words are not to be confused with interjections. See Section 3.6 for guidelines on interjections.
In particular, the filler word "#hm " is not to be confused with the interjection "hmm". Use context to disambiguate the two different uses.

3.11. Multiple Speakers


Segment multiple speakers separately. In each Speech segment, transcribe the speech of one and only one targeted speaker. When speakers
overlap, each identified speaker should receive their own Speech segment which should only transcribe the words spoken by that speaker.

Proper and careful segmentation and speaker identification steps prior to transcription will minimize the number of segments with extensive interfering
speech. But when a resulting Speech segment still contains interfering speech from other speakers, such as from people standing nearby or in the
same room as the main speaker, or from other participants in the conversation, represent them following the instructions below.

3.11.1. Non-Overlapping Interfering Speech


When a Speech segment contains non-overlapping speech from other interfering speakers, insert the [other-speech] tag for each instance of
interfering speech in the location where it occurs in the transcription, regardless of intelligibility. Don’t transcribe interfering speech word-for-word.

The [other-speech] tag applies only to interfering speech that does not overlap (or only minimally overlaps) with speech from the main speaker being
transcribed.

"if you mention the size of my office (audible speech from an interferer with minimal overlap) I will scream" --> If you mention the size of my
office, [other-speech] I will scream.

In other words, the [other-speech] tag represents in the transcription content the portion of the audio signals that correspond to speech not coming
from the target speaker.

Note: The portion of interfering speech should also have its own separate segment. If the speech corresponds to one or more identifiable speaker,
each interfering speaker should receive a segment of type Speech and the content should be transcribed, or consist of the unintelligible tag (()). If the
interfering segment corresponds to indistinct chatter, or if the individual speakers in the segment cannot be clearly identified, the segment should be of
type Babble.

3.11.2. Overlapping Interfering Speech


When there is intelligible overlapping speech between multiple speakers, transcribe the speech of each overlapping speaker as separate speech
segments. When transcribing Speaker 1, don't transcribe the speech from other interfering speakers word-for-word, and vice versa.

In each Speech segment, enclose within the <overlap></overlap> tags the portion of transcribed speech that overlaps with other interfering speech,
including the necessary punctuations. In other words, an <overlap>transcribed_speech</overlap> string represents the portion of speech from the
targeted speaker that overlaps with other speech signals.

Don’t break up a word with the <overlap></overlap> tags (and initialisms are treated as words). If the overlap begins in middle of a word, place the
<overlap> tag before the word. If the overlap ends in the middle of a word, place the </overlap> tag after the word. When a segment contains the
opening <overlap> tag, it must also contain the closing </overlap> tag.

Example:

Segment Start time End time Speaker Transcription Content

1 2536.778 2539.486 host01 I know. Okay, is there still <overlap>any questions ((so I want to one by one))? [laugh]</overlap>

2 2538.979 2539.486 guest01 <overlap>No, no, no, no, it's okay it's okay</overlap>

Notes:

When transcribing overlapping speech, the transcriptionist should not spend too much time trying to pick out the speech of the targeted
speaker from other overlapping speakers in a given segment. If the transcriptionist cannot easily distinguish the speech of each identified
speaker in a multi-speaker segment, consider redoing the segmentation to ease transcription. For example
split the stretch of overlapping speech into a couple of shorter Speech segments, each targeting a more manageable set of
overlapping speakers; or
replace the overlapping unintelligible Speech segment(s) with a single Babble segment, thereby removing the need to transcribe the
speech altogether.
For applying the <overlap></overlap> tags in conjunctions with initialisms and non-target languages, see Section 3.4.3 and Section 3.12 resp
ectively.

3.12. Unintelligible Speech

Confidential. Page 9
Use double parentheses (()) to mark stretches of speech that is difficult or impossible to understand or transcribe (such as when a speaker is speaking
too softly or when a speaker is speaking over another foreground speaker). There should be a space before and after the double parentheses, but not
within the parentheses themselves.

"Alexa play ???? on spotify" --> Alexa, play (()) on Spotify.

If the transcriptionist has a guess about the speaker's words, transcribe what they think they hear within the double parentheses.

"Alexa read ????? from audible" --> Alexa, read ((Cat In The Hat)) from Audible.
"Alexa turn the ????" --> Alexa, turn the ((lights off)).

3.13. Non-Target Languages


When a speaker switches to a language other than English, place the tag <lang:Foreign> at the location when the switch between languages begins
and </lang:Foreign> when the switch ends. When a segment contains the opening <lang:Foreign> tag, it must also contain the closing </lang:
Foreign> tag.

If the transcriptionist can unambiguously identify the non-target language, replace "Foreign" with the language name in the tags. Capitalize the first
letter of the language name.

Transcribe the speech of the non-target language, using the standard orthography of the non-target language, if the transcriptionist understands the
language. Otherwise, transcribe the non-target language as (()).

"you have to finish todo esto, porque. I have other things to do" --> You have to finish <lang:Spanish>todo esto, porque</lang:Spanish>. I
have other things to do.
"I'd like to tell her que ya no la quiero" --> I'd like to tell her <lang:Foreign>(())</lang:Foreign>.

Words of non-English origin adopted into common American English usage (i.e. loanwords or borrowings) should be transcribed using the standard
orthography of English. Don't use the <lang:Foreign></lang:Foreign> tags around loanwords that have been grammaticalized and fully adopted into
common American English usage. If it is unclear whether a word is a loanword or not, consult a dictionary like the American Heritage Dictionary: https:/
/www.ahdictionary.com/. A word that is listed in the dictionary is a strong ground to consider it an established loanword, even if it is of foreign origin.

Loanwords such as those underlined in the examples below don't require the <lang:Foreign></lang:Foreign> tag.

There was a tsunami in Indonesia. (NOT: <lang:Foreign>tsunami</lang:Foreign>).


Alexa, recipe for tacos. (NOT: <lang:Spanish>tacos</lang:Spanish>.

When proper names of people or other entities are used in an utterance which is otherwise spoken in English, the <lang:Foreign></lang:Foreign> tag
should not be used, regardless of whether the name is pronounced with naturalized English phonology or with foreign phonology. If the transcriptionist
cannot confidently transcribe the name in English orthography, the unintelligible tag (()) should be used.

"It was demonstrated by Ramanujan in 1913." -> It was demonstrated by Ramanujan in nineteen thirteen.
A name of foreign origin is used in a context where the speaker is still clearly speaking English. The <lang:Foreign></lang:Foreign> tag
should not be used, regardless of whether the name Ramanujan is pronounced with English or Hindi phonology.
"I think Zhangjie won't be able to make it." -> I think (()) won't be able to make it.
A name of foreign origin is used in a context where the speaker is still clearly speaking English, so the <lang:Foreign></lang:Foreign> tag
should not be used. The name is not familiar to the transcriptionist who is not sure how to adequately transcribe it in English orthography, so
the unintelligible tag (()) is used.
"Of course I want to go, me encanta Juanes." -> Of course I want to go, <lang:Spanish>me encanta Juanes.</lang:Spanish>
A name of foreign origin is used in a context where the speaker has code-switched to another language. The foreign name should be
contained in the <lang:Foreign></lang:Foreign> tag for the code-switched portion of the utterance.

Don't break up a word with the foreign language tags. This is rare in English, but in cases where a speaker mixes languages within a single word,
such as having the root word in the non-target language but the affix in the target language:

1. Transcribe the word as it was pronounced using the respective standard orthography of each language.
2. Enclose both the root and the affix within the <lang:Foreign></lang:Foreign> tags.

Non-target language tags can be used in conjunctions with other markup tags (e.g. <initial></initial> and <overlap></overlap>):

"The story is set in Belarus after the collapse of the (pronounced [sssr]), well that's USSR in Russian." --> "The story is set in Belarus after
the collapse of the <lang:Russian><initial></initial></lang:Russian>. Well, that's <initial>USSR</initial> in Russian."
"I'll sometimes start a sentence in English y termino-(another intelligible speaker begins talking)-en español (end of segment)" --> I'll
sometimes start a sentence in English <lang:Spanish>y termino <overlap>en español</overlap></lang:Spanish>."

3.14. Non-Speech
Indicate non-speech noises in the transcription by inserting the tags in square brackets in the location where it occurs. Only the set of tags included in
the subsections below are allowed.

Don't insert a non-speech tag in the middle of a word. If a non-speech sound occurs in the middle of a word, add the tag exactly before the word in
which it occurred.

Confidential. Page 10
"I will abso-(ring)-lutely open it" --> I will [ring] absolutely open it.

If a non-speech sound occurs repeatedly, represent it only once.

"wait … click click click click there" --> Wait [click] there.

3.14.1. Human vocal noises

Tags Descriptions

[breath] Inhalation and exhalation between words, yawning

[cough] Coughing, throat clearing, sneezing

[cry] Crying/sobbing

[laugh] Laughing, chuckling

[lipsmack] Lipsmacks, tongue-clicks

3.14.2. Non-speech noises

Tags Descriptions

[applause] Clapping.

[beep] The beep sound that replaces profanity or classified information.

[click] Machine or phone click.

[dtmf] Noise made by pressing a telephone keypad.

[ring] Telephone ring.

[sta] Continuous static.

[music] Music that is one or more seconds long without anyone speaking in the foreground. This includes on-hold music, songs, or singing.
Note: Don't use this tag for music playing in the background while someone's speaking.

[noise] Other miscellaneous noises not covered on the list above (e.g., screaming, raining, punching, etc).

3.14.3. Silence/Pauses
Despite the best effort to create tight segments according to our guidelines, a speech segment may occasionally contain noticeable silence or pause,
with no actual speech.

Use the [no-speech] tag to indicate pauses or silence of one or more seconds, even in cases when there are some foreground noises mixed in with
the pause.

"they're not (pause) (breath) (pause) coming" --> They're not [no-speech] coming.

4. Appendix A: The Complete Set of Non-Speech Tags and Other Markup


Tags
The section lists all the non-speech tags and other markup tags introduced in the Transcription Conventions section for ease of reference. See the
Transcription Conventions section for the exact use case and example(s) of each tag.

4.1. Markup Tags


<initial></initial>

<lang:Foreign></lang:Foreign>

Confidential. Page 11
<lang:X></lang:X>

where X can be replaced by any commonly accepted language names with the first letter capitalized (e.g., Arabic, Korean, Spanish)

<overlap></overlap>

4.2. Tags for Noise, Silence, and Non-overlapping Interfering Speech


[applause]

[beep]

[breath]

[click]

[cough]

[cry]

[dtmf]

[laugh]

[lipsmack]

[music]

[no-speech]

[noise]

[other-speech]

[ring]

[sta]

Confidential. Page 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy