0% found this document useful (0 votes)
1K views12 pages

Voice Morphing

Voice morphing is a technique that modifies a source speaker's speech to sound like a target speaker. It works by interpolating between the source and target speaker's LPC coefficients and pitch shifting the source speaker's residue signal. Potential applications include changing voices in public announcements or adding special effects to films, but limitations include normalization problems and incomplete voice databases.

Uploaded by

Himadri Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views12 pages

Voice Morphing

Voice morphing is a technique that modifies a source speaker's speech to sound like a target speaker. It works by interpolating between the source and target speaker's LPC coefficients and pitch shifting the source speaker's residue signal. Potential applications include changing voices in public announcements or adding special effects to films, but limitations include normalization problems and incomplete voice databases.

Uploaded by

Himadri Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Voice

Morphing
What is Voice Morphing ??

• Voice morphing is a technique for modifying a


(source) speaker's speech to sound as if it were
spoken by a different (target) speaker.

• In Simpler terms it is being able to change the


speech of one speaker to that of another speaker.

• Applications for Voice Morphing range from


recreational ones to security ones.
Time Domain Plots of Source and Target featuring the Pitch
How to Morph Voice ??

• We need to effectively change the pitch from that of


a male speaker to that of a female speaker. If we
reminisce the excitation signal has information about
the speaker.

• We find the LPC coefficients for the Source and


Target Signals and using these coefficients we are
going to interpolate between the two Signals.

• We get the New LPC coefficients using the formula

new lpc coeff = [const*(lpc source) + (1-


const)(lpc target)]

• 0 <= const <= 1


How to Morph Speech ?? (contd…)

• The pitch of a female speaker will be close to twice that of


the male speaker. In our example the pitch of the male
speaker is 141Hz and that of the female speaker is 210Hz.

• So we need to develop some time stretching algorithm so


that we can implement pitch shifting. We obtain the residue
of the source signal and stretch it according to the value of
the const. The const indicates what is the position of
morphed signal in between the source and target.

• For example if const = 0.2 then the morphed signal will be


closer in pitch to the source signal and a value of 0.8 for
const will result in a pitch that is closer to the target signal.
How do we shift the Pitch ??

• We break the residue signal into small windows and introduce fade in and
fade out for each block. We recombine everything to form the pitch shifted
signal. Based on the alpha we can time stretch the residue according to our
requirements.

How do we Morph finally ??

• We now have the pitch shifted residue signal and the new
LPC coefficients. We should resample the pitch shifted
signal so that it is played at a faster rate. [Remember when
we pitch shift then the residue will last longer]. If we
inverse filter the resampled pitch shifted residue then we
can effect morphing.
Applications

• In public speech systems we can make the sound to


be of a popular public speaker. We can implement
that in many places like railway announcements.

• Video and image morphing is extensively used for


film and graphical special effects.

• In text to speech system converts normal language


text into speech; other systems render symbolic
linguistic representations like phonetic transcription
into speech.
Limitations

• Voice detection is done via sophisticated 3d


rendering but there are a lot of normalizing
problems.

• Some applications require extensive sound libraries.

• The different langauge requires different phonetics


and thus updating or extending is tedious.

• It is very seldom complete (we may not be able add


every small talk, every phonetics into the database.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy