0% found this document useful (0 votes)
113 views11 pages

Transcription Requirements AA

Its a requirement document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
113 views11 pages

Transcription Requirements AA

Its a requirement document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
SCHEDULE A. SERVICES TO BE PROVIDED BY THE PROVIDER Minimum accuracy scores for passing validation are 98% for word accuracy at the word level and 95% for tag accuracy at the tag level. For non-tokenized languages, an equivalent accuracy will be targeted. If the transcriptions fail validation, transcriptions should be reworked and then re-validated until they pass the accuracy threshold. All transcript files must be delivered in .json format in accordance with the specified metadata requirements, Do not include empty segments (i.e. nothing to be transcribed) or segments containing only non-speech tags (e.g, {no-speech] or [laugh] tags) in the final transcription, For each conversation, two or more speakers will be recorded on separate channels. In such cases, each conversation's entire duration will be counted as the duration for billing. For example, if two speakers record a conversation of 30 minutes, the duration for billing will be 30 minutes and not total of their individual channel recording transcrip The audio must be segmented and timestamped (to the millisecond) based on speaker turns, Each segment should last no longer than 15 seconds or logical conclusion of the sentence. ‘Transcription Guidelines 1. Introduction Transcription is the commitment of an audio signal to textual representation, This can include speech data, such as conversation, as well as non-speech sounds, such as phones ringing, In order to train machine intelligence transcription systems, the training data must be high quality. In this case, "high quality" means transcribing in a consistent manner, in careful concert with the parameters outlined in these guidelines. These have been created with an eye to producing data that can be re-used across multiple projects and devices. 2. Transcription Conventions 2.1 General principles ‘These general principles are expanded below in each relevant section. ‘Transcriptions must be produced strictly and solely by human transcribers. Use of any Automatic Speech Recognition (ASR) systems (online or otherwise) is strictly prohibited. Ifit turns out that the transcriptions were generated by an ASR system (either fully, partially, or even as a starting point), the transcriptions will be rejected. Transcription should represent all words as spoken — including hesitations, filler words, and false starts. Transcription must be orthographic, not phonetic, Refer to American Heritage Dictionary for reference: htts://ahdictionary.com/, ‘¢ Transcription should include only upper and lowercase letters, apostrophes, tildes, hyphens, periods, question marks, commas, exclamation points, and spaces. No numbers or other special characters, © Segments should not last longer than 15 seconds. Ifa single speaker talks for more than 15 seconds, segment based on sentence level or pauses in speech. Longer speech segments are strongly preferred over short speech segments, Represent unintelligible words with double parentheses without spaces: ()) Non-speech events are represented with square brackets: [] 2.2 Speech event transcription 2.2.1 Use orthographic spelling At the word level, transcriptions must be orthographic, not phonetic. Dialectal pronunciations should be represented in the standard orthographic form listed in the dictionary. For instance, dialectal variations such as "darlin" for "darling" or "wes' side" for "west side” should be transcribed as "darling" and "west side" respectively. Mispronunciations should also be represented in their correct orthographic forms. ‘© "Issall well n' good darlin’ © "Call your representive.” "It’s all well and good darling.” ="Call your representative.” However, if a word is deliberately mispronounced, such as for comedic effect, do represent the variation in the transcription. @ "The volcano said: | lava you." = "The volcano said I lava you." If the spelling of a word is unclear, use the American Heritage Dictionary as a standard reference: htts://ahdictionarv.com/,. To reference the names of song titles, movies, TV shows, brands, etc. use hittp://zoozle.com/. At the sentence level, transcribe a speaker's utterances verbatim, even in cases when the speaker's utterances do not conform to the standard grammar of the language, Do not correct grammatical “mistakes” or variations made by the speaker, @ "He been done work." = "He been done work." "We be playing basketball after work." = "We be playing basketball after work.” 2.2.1.1 Contractions Standard contractions must be transcribed as pronounced, including the apostrophe, such as "isn't", "where's", "you're", "y'all". Transcribe the following as a single word: imme gonna gotta lemme watcha kinda 2.2.1.2 Abbreviations Do not introduce abbreviations in the transcription. Always spell out the full word when pronounced as such. Transcribe abbreviations only ifthe abbreviation is explicitly articulated by the speaker, Do not add a period after abbreviated words (unless it's at the end of a sentence). "He's 6 ft 21" = "He's six foot two." "Ilive in Cambridge, Mass." =" live in Cambridge, Mass." "Billie Jean King went to Cal State,” = "Billie Jean King went to Cal State.” "Talk to Doctor Smith immediately." = "Talk to Doctor Smith immediately.’ Note that in English, the titles Ms, Mrs, Mr, and Mx that prefix a person's name are not abbreviations (they are listed as nouns in the dictionary) and should therefore be transcribed as such. However, use the spelled-out forms mister or ‘missus when these titles are used without a name, as in direct address. ‘© "Mr. Smith this way please." is way please." "Hey mister can you help me with this survey?" = "Hey, mister, can you help me with this survey?" 2.2.1.3 Stumbled speech and corrections Represent all speech, including false starts, corrections, stuttering, etc. Truncated words are represented with a tilde as described in the section on "tildes" below. © "Directions to the... to the... the hotel” rections to the to the the hotel.” * "Ale... Alexa play Janet Jackson... no wait..."= "Ale~ Alexa, play Janet Jackson. No, wait." "Noses 0, It's Ch... Chom... Chomsky who said that.” = "N~n~ no, It’s Ch Chom~ Chomsky who said that." 2.2.1.4 Filler words Filler words are "words" that speakers use to indicate hesitation or to maintain control of a conversation while thinking of what to say next. Each language has a limited set of filler words that speakers can use, When transcribing interjections, use language-specific filler words. The spelling of filler words should not be altered to reflect how the speaker pronounces the word, and each filler word should be preceded with a hashtag (#).For English, we transcribe only the following fillers: ub um tah fer hm 2.2.1.5 interjections Interjections are words or expressions that speakers use within an utterance to express affirmation, surprise, or negation, Each language has its own specific set of interjections that speakers can use. When transcribing interjections, use language-specific standardized spellings. Interjections do not require any special symbol.For English, we transcribe only the following interjections: © eee © mm © uh-oh ow © mhm. © whoa © huh © nah © whew ohm © oh © yay © jeez © uh-huh © yep 2.2.1.6 Overlapping speech IF there is overlapping speech where multiple speakers are talking at the same time, then only transcribe the most dominant voice that you can clearly understand, IFall or multiple voices are dominant and it's difficult to isolate one person's voice over the others, then simply tag the overlapping speech as [overlap] and refrain from transcribing any speaker's speech Note: Within in a single channel audio file where only one speaker is the target of the recording, overlapping speech might stil occur (e.g, as background noise when there are other people nearby or in the same room speaking]. In these cases, transcribe only the speech of the target speaker. 2.2.1.7 Letters spoken as letters When a proper name is spelled out, transcribe the spoken letters as capital letters, separated by a space. "My name is John — jay, oh, eich, en". ="My names John J O HN." This does not apply to initialisms (e.g, IBM, FBI, etc.) More on transcribing initialism to follow in Section 2.2.5. 2.2.2 Punctuation Use punctuation as required by the grammar rules. When transcribing a language other than English, use punctuation symbols and rules that are appropriate for that language. For example, in Spanish, ¢? is used as in standard orthography. Use end-punctuations (full stop, question mark, exclamation mark) to indicate the end of a complete sentence. Use punctuation symbols that are essential part of the word, such as apostrophes and hyphens. Use commas to break up long stretches of speech. This is to facilitate reader comprehension. AVOID: semi-colons, quotation marks. icult ones to Of the list of permissible punctuations, we expect that commas and exclamation marks will be the most implement. We understand that you will have to make some relatively subjective and stylistic decisions on the use of the comma and exclamation mark, and disagreements are not necessarily errors. 2.2.2.1 Commas Use a comma when itis necessary to make a transcript more readable. Below are some suggestions of when a comma should be used: ‘To separate items in a lst of three or more, using the serial (aka Oxford) comma {ie., the comma before the conjunction that joins the last two elements: \ enjoy skydiving, snowboarding, and mountain biking * Toset offa direct address: © Maryam, listen to me carefully. I'm not calling you, my friends, just to whine about my life. To break up compound and complex sentences: © I would like to join you, but 'm afraid I have class at that time. © Marcos and I couldn't go to the jazz concert, so we watched it on TV instead. * Toset offintroductory words and phrases: © Therefore, they cancelled their trip, © After taking a break, the team resumed their meeting, Around parenthetical phrases: © That report on the New York Times was, tosay the least, a bombshell © Getting a hotel by the sea, like the one we stayed last year, would be superb. 2.2.2.2 Exclamation marks Use exclamation marks at the end of a sentence when you feel or hear an emphatic stress or intonation, Exclamation point usually marks an outcry or an emphatic or ironic comment, That's the biggest pumpkin | have ever seen! © When will lever learn! 2.2.2.3 Apostrophes Use apostrophes in contractions, possessives of individual letters, possessive "s", or as part of a person’s name, "That's where it's at" = "That's where it's at." Eleven o'clock." "Read Jess’ email.” 2.2.2.4 Hyphens Use hyphens according to standard orthographic rules of the language. Ifit snot clear if a compound word should be spelled with a hyphen or not, use the American Heritage Dictionary as a reference. Here are a few examples of English compound words that can/must use hyphens: attine d-day ex-boyfriend, ex-drummer, ex-girlfriend, ex-husband, ex-wife extrafoud selFaware t-shirt u-turn veneck xray For product names, only use hyphens if they are parts of the official product names. @ "Let's go to Chick-fil-A" = "Let's go to Chik-fil-A."| 2.2.2.5 Tildes Use tildes to indicate truncated words, whether at the beginning or the end, Use tildes also to represent false starts and stuttering, "“exa, stop the mu" the music.” = "Ale~ Alexa, stop the mu~ the music, 2.2.2.6 Special symbols Special symbols should never be used in the transcription. The only ones allowed are apostrophes, tildes, hyphens, and spaces as part of the transcription convention. Everything else should be spelled out. When one of the allowed special characters is used in speech, transcribe it as it was pronounced. © "Ihave like $0" = "Ihave like zero dollars.” © "Itwas great/weird” = "It was great slash weird.” "= "sb plus six equals twelve." © "My email is m-golden@..." = "My email is M dash golden at.” 2.2.3 Capitalization Capitalization should follow orthographic conventions. Capitalize the first word of a sentence. Proper names include human names (Jeff Bezos), place names (France), product names (iPad, Xbox), company names (eBay), acronyms (POTUS), initialisms (IMB), and so on. # "Iwant to visit Oregon” = "I want to visit Oregon.” © "Iwork at NASA" ="Iwork at NASA." "Ym going to Mexico on Thursday 'm going to Mexico on Thursday.’ 2.2.4 Numbers Numbers should never be represented numerically. They should always be written out alphabetically, Ordinal numbers should be represented as pronounced. "5 ng "306" = "three hundred and six", "three oh six", or "three zero six", depending on how it was pronounced, "Play radio 109.4 FM" = "play radio one oh nine point four F. M." "Beverly Hills, 90210" = "Beverly Hills nine oh two one oh’ When spelling out numbers, use hyphens as required by the rules of the language. In English, numbers from twenty-one through ninety-nine are spelled with hyphens. Others are not hyphenated. “twenty-five” "three hundred” "five hundred fifty-two" “nineteen forty-five” 2.2.5 Acronyms and initialisms Acronyms refer to terms based on the initial letters of their various elements and are spoken as words. They should be transcribed as words in upper case without white spaces or periods between the letters. "AIDS has a great impact on society." = "AIDS has a great impact on society." Initialisms refer to terms spoken as series of letters (e.g,, IBM, IMDB, HTTP). Initialisms should be written as upper case letters enclosed within the and tags. Note the space around the tags. Use periods only for initials standing for given names (e.g, E.B. White, George W. Bush). Otherwise, no period is needed in initialisms. "'Lwork for IBM." = "I work for IBM ." "I like 22 Top." ="Ilike 22 Top. "http://www.google.com” = " HTTP colon slash slash WWW dot google dot © "George W Bush paints now’ = "George W. Bush paints now." Transcribe a plural initialism with an "s" following the end tag . Transcribe a possessive on an initialism with an apostrophe and an "s" after the end tag , "The SATs are nerve-wracking.” = "The SAT s are nerve wracking.” "George W's dog was a Scottish Terrier.” = "George W. 's dog was a Scottish Terrier.” : OK should be transcribed as "okay". "He's OK." ="He's okay.” Proper names that are spelled out are not initialisms and don't require the tags, See Section 2.2.1.7 above for an example. 2.2.6 Unintelligible words and phrases Ifa word cannot be understood within a larger phrase, transcribe all segments that are understandable, and use double parentheses (()) to mark the unintelligible word. There should be a space before and after the double parentheses, but not within the parentheses themselves. * "Hey Google play ???? on spotify.” = "Hey Google, play ()) on Spotify.” Ifyou have a guess of what the word/phrase might be but are not sure, include the guess within the double parentheses. "Hey Google read 2772? from audible.’ "Hey Google turn the 222?" = "Hey Google, turn the ( "Hey Google, read ((Cat In The Hat) from Audible.” lights off)” For entire segments which are unintelligible use ((). 2.2.7 Multiple Languages (code-switching) When a speaker switches languages, place the tag at the location when the switch between languages begins and when the switch ends. If the nature of the language is unambiguously recognized by the transcriptionist, replace "Foreign" with the name of the new language. If the content of new language is intelligible to the transcriptionist, provide a transcription using the correct orthography of the foreign language. In cases when the transcriptionist is unable to correctly transcribe the foreign language, add the double parentheses (()). There should be a space before and after each foreign language tag. © "You have to finish todo esto, porque. | have other things to do. porque . | have other things to do.” © "I'd like to tell her que yo no fa quiero." = "I'd like to tell her (()) .” "You have to finish todo esto, In cases when a speaker switches from a target language to a foreign language but continues to use grammatical affixes of the target language with the foreign word stem, include the target language affix within the foreign language tags. For example, when transcribing Tamil data, if a speaker switches to the English word "engineering" but with a Tamil suffix 0, transcribe it as engineerings . Some loanwords have been grammaticalized in English and should be transcribed as normal English words without the tag, IFitis unclear whether a word is a loanword or not, consult a dictionary like the American Heritage Dictionary: hittos://www.ahdictionarv.com/. A word that is listed in the dictionary is a strong ground to consider it an established loanword, even ifit is of foreign origin. "There was a tsunami in Indonesia.” = "There was a tsunami in Indonesia.” "Alexa... where is the nearest taco bell?" = Alexa, where is the nearest Taco Bell?” "Alexa... recipe for tacos" = "Alexa, recipe for tacos.” "Remind me to spritz the flowers at eight.” = "Remind me to spritz the flowers at eight.” Ifa recording consists of nothing but foreign speech, add the tag and refrain from annotating, 2.3 Non-speech (acoustic event) transcription, 2.3.1 Non-speech sound inventory Insert the following labels in the location where it occurs. If it happens in the middle of a word, add the tag exactly before the word in which it occurred. ‘ lipsmack] Lipsmacks, tongue-clicks [breath] Inhalation and exhalation between words, yawning [cough] Coughing, throat clearing, sneezing [laugh] Laughing, chuckling ‘© [click] Machine or phone click © [ring] Telephone ring ‘© _[dtmf] Noise made by pressing a telephone keypad [sta] At the start of continuous background noise (static) © [cry] Crying/sobbing [prompt] IVR prompts or voice recordings commonly found at the beginning of calls Ifthe sound occurs repeatedly, represent it only once. 2 "Wait click click click click there" = "Wait [click] there." Do not split words to insert a non-speech sound tag, even ifit occurs this way in the audio. "Iwill abso-ring-lutely open it” = "I will [ring] absolutely open it Use the [noise] tag for all other non-speech sounds not covered by the list of non-speech tags (e.g., screaming, raining, punching, etc). For additional non-speech tags for 16kHz data, see section 3.2. 2.3.2 No speech A time-stamped speech segment may contain periods with no speech. For any period greater than one second in which there is no speech, add the label [no-speech]. Even if there are some foreground sounds, just use the [no-speech] tag if there is no actual speech for more than one second, “silence breath silence" = "[no-speech]" ‘Note: In single channel audio files, you can only hear one side of the conversation ata time. Asa result, there will be segments in theses audio files that contain either no speech or only non-speech sounds (e.g. laughing, breathing, etc) These silent segments do not need to be transcribed, They should be removed from the transcription file entirely, 2.3.3 Music only IF there is music playing in the foreground or in the background and there is no other information to transcribe, such as if a customer is put on hold and there's hold music playing, transcribe it with the [music] label. 3, Additional Transcription Conventions 3.1 Speaker Labelling Each identifiable speaker must have a unique speaker label. The speaker label must be consistent throughout the entire file, When applicable and if known, provide the following data for each identifiable speaker in the “speakers” metadata field at the end of each transcription file: © role © gender © native dialect For the speakerld field in each given segment: enter the appropriate speaker label when you can accurately identify the speaker; enter “unknown” when you cannot accurately identify the speaker; ‘© enter “multiple” when the segment contains overlapping speech that is not transcribed, ie., the content of the segmentis marked only with the [overlap] label. Note: Specific projects might not require speaker labelling, Please refer to specific project requirements or consult the project lead for further information 3.2. Non-speech sound inventory Use all the non-speech tags that are mentioned in Section 2.3.1 plus the following tag(s): [applause] Clapping to show approval or praise. Add it exactly at the location where it occurred. 3.3 Musiconly Insert the [music] tag when there is only music, songs, or singing for more than 2 seconds. There is no need to transcribe lyrics in songs or singing, However, when background music is played simultaneously with speech, do transcribe the speech and don’tuse the [music] tag. Example for overiap - ‘Speaker labelling (no overlap) “segments®: [ "start’: 0, end’: 07, "segmentid’: "00001", “primaryType": “Speech”, "loudnessLevel’: 'Normal’, "language": “en_US”, "speakerid’: "1", “transcriptionData’: { “content”: "[music] Please share your thoughts on current economic scenario?” } } { "start’: 07, end’: 11, "segmentid”: "00002", “primaryType": “Speech”, "loudnessLevel": "Normal", “language”: “en_US”, "speakerid’: '2', “transcriptionData’: { "content": "| believe we are doing great right now." } ds { “start: 11, end’: 15, "segmentld’: "00003", "primaryType": "Speech’, "loudnessLevel’: 'Normal’, "language": "en_US", "speakerid’: "1", “transcriptionData’: { “content”: "why do you think so?” } i ‘Speaker labelling (with overlap in segment 0002) “segments®: [ q "start™ 0, end’: 7, "segmentid’: "00001", "primaryType": "Speech’, "loudnessLevel": "Normal’, "language": "en_US", "speakerid’: "1", “transcriptionData’: { "content": “[music] Please share your thoughts on current economic scenario?" } } q "start": 07, "end": 11, "segmentid’: "00002", “primaryType": “Speech”, “loudnessLevel": "Normal’, "language": “en_US' "speakerid’: "Mutiple’, “wanscriptionData’: { "content": ‘[overtap|" } b { “start”: 11, end”: 15, "segmentd’: "00003", “primaryType": "Speech’, "loudnessLevel": "Normal’, “language': ‘en_US, “speakerld™: "1", “transcriptionDate’: { “content: ‘why do you think so?" q ih

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy