Statistical Machine Translation: The Basic, The Novel, and The Speculative
Statistical Machine Translation: The Basic, The Novel, and The Speculative
4 April 2006
The Basic
• Translating with data
– how can computers learn from translated text?
– what translated material is out there?
– is it enough? how much is needed?
• Statistical modeling
– framing translation as a generative statistical process
• EM Training
– how do we automatically discover hidden data?
• Decoding
– algorithm for translation
The Novel
• Automatic evaluation methods
– can computers decide what are good translations?
• Phrase-based models
– what are atomic units of translation?
– the best method in statistical machine translation
• Discriminative training
– what are the methods that directly optimize translation performance?
The Speculative
• Syntax-based transfer models
– how can we build models that take advantage of syntax?
– how can we ensure that the output is grammatical?
• Factored translation models
– how can we integrate different levels of abstraction?
Parallel Data
• Lots of translated text available: 100s of million words of translated text for
some language pairs
– a book has a few 100,000s words
– an educated person may read 10,000 words a day
→ 3.5 million words a year
→ 300 million a lifetime
→ soon computers will be able to see more translated text than humans read
in a lifetime
⇒ Machine can learn how to translated foreign languages
Translation Language
Model Model
Decoding Algorithm
Word-Based Models
Mary did not slap the green witch
n(3|slap)
Mary not slap slap slap the green witch
p-null
Mary not slap slap slap NULL the green witch
t(la|the)
Maria no daba una botefada a la verde bruja
d(4|4)
Maria no daba una bofetada a la bruja verde
Phrase-Based Models
Morgen fliege ich nach Kanada zur Konferenz
Syntax-Based Models
VB VB
he adores VB TO he TO VB adores
listening TO MN MN TO listening
to music music to
VB VB
insert
PRP VB2 VB1 PRP VB2 VB1
MN TO listening no MN TO kiku no
translate
music to ongaku wo
take leaves
Kare ha ongaku wo kiku no ga daisuki desu
[from Yamada and Knight, 2001]
Language Models
• Language models indicate, whether a sentence is good English
– p(Tomorrow I will fly to the conference) = high
– p(Tomorrow fly me at a summit) = low
→ ensures fluent output by guiding word choice and word order
• Standard: trigram language models
p(Tomorrow|START) ×
p(I|START,Tomorrow) ×
p(will|Tomorrow,I) ×
...
p(Canada|conference,in) ×
p(END|in,Canada) ×
• Often estimated using additional monolingual data (billions of words)
11
Automatic Evaluation
• Why automatic evaluation metrics?
– Manual evaluation is too slow
– Evaluation on large test sets reveals minor improvements
– Automatic tuning to improve machine translation performance
• History
– Word Error Rate
– BLEU since 2002
• BLEU in short: Overlap with reference translations
Automatic Evaluation
• Reference Translation
– the gunman was shot to death by the police .
• System Translations
– the gunman was police kill .
– wounded police jaya of
– the gunman was shot dead by the police .
– the gunman arrested by police kill .
– the gunmen were killed .
– the gunman was shot to death by the police .
– gunmen were killed by police ?SUB>0 ?SUB>0
– al by the police .
– the ringer is killed by the police .
– police killed the gunman .
• Matches
– green = 4 gram match (good!)
– red = word not matched (bad!)
13
Automatic Evaluation
3.5 3.5
Human Score
Human Score
3 3
2.5 2.5
2 2
0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.38 0.4 0.42 0.44 0.46 0.48 0.5 0.52
Bleu Score Bleu Score
[from Callison-Burch et al., 2006, EACL]
• DARPA/NIST MT Eval 2005
– Mostly statistical systems (all but one in graphs)
– One submission manual post-edit of statistical system’s output
→ Good adequacy/fluency scores not reflected by BLEU
15
3.5
SMT System 2
2.5
2
0.18 0.2 0.22 0.24 0.26 0.28 0.3
Bleu Score
17
Competitions
• Progress driven by MT Competitions
– NIST/DARPA: Yearly campaigns for Arabic-English, Chinese-English,
newstexts, since 2001
– IWSLT: Yearly competitions for Asian languages and Arabic into English,
speech travel domain, since 2003
– WPT/WMT: Yearly competitions for European languages, European
Parliament proceedings, since 2005
• Increasing number of statistical MT groups participate
• Competitions won by statistical systems
19
Euromatrix
• Proceedings of the European Parliament
– translated into 11 official languages
– entry of new members in May 2004: more to come...
• Europarl corpus
– collected 20-30 million words per language
→ 110 language pairs
• 110 Translation systems
– 3 weeks on 16-node cluster computer
→ 110 translation systems
• Basis of a new European Commission funded project
21
Clustering Languages
fi
el
de nl
sv da en
pt es fr it
[from Koehn, 2005, MT Summit]
• Clustering languages based on how easy they translate into each other
⇒ Approximation of language families
Translation examples
• Spanish-English
(1) the current situation , unsustainable above all for many self-employed drivers and in the
area of agriculture , we must improve without doubt .
(2) in itself , it is good to reach an agreement on procedures , but we have to ensure that this
system is not likely to be used as a weapon policy .
• Finnish-English
(1) the current situation , which is unacceptable , in particular , for many carriers and
responsible for agriculture , is in any case , to be improved .
(2) agreement on procedures in itself is a good thing , but there is a need to ensure that the
system cannot be used as a political lyömäaseena .
• English reference
(1) the current situation , which is intolerable , particularly for many independent haulage firms
and for agriculture , does in any case need to be improved .
(2) an agreement on procedures in itself is a good thing , but we must make sure that the
system cannot be used as a political weapon .
23
Backtranslations
• Checking translation quality by back-transliteration
• The spirit is willing, but the flesh is weak
• English → Russian → English
• The vodka is good but the meat is rotten
25
Backtranslations II
• Does not correlate with unidirectional performance
Available Data
• Available parallel text
– Europarl: 30 million words in 11 languages http://www.statmt.org/europarl/
– Acquis Communitaire: 8-50 million words in 20 EU languages
– Canadian Hansards: 20 million words from Ulrich Germann, ISI
– Chinese/Arabic to English: over 100 million words from LDC
– lots more French/English, Spanish/French/English from LDC
• Available monolingual text (for language modeling)
– 2.8 billion words of English from LDC
– 100s of billions, trillions on the web
27
French
0.25
German
0.20
Finnish
0.15
29
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
Decoding Process
Maria no dio una bofetada a la bruja verde
31
Decoding Process
Maria no dio una bofetada a la bruja verde
Mary
Decoding Process
Maria no dio una bofetada a la bruja verde
Mary
33
Decoding Process
Maria no dio una bofetada a la bruja verde
Decoding Process
Maria no dio una bofetada a la bruja verde
35
Decoding Process
Maria no dio una bofetada a la bruja verde
Decoding Process
Maria no dio una bofetada a la bruja verde
• Reordering
37
Decoding Process
Maria no dio una bofetada a la bruja verde
• Translation finished
Translation Options
Maria no dio una bofetada a la bruja verde
39
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e:
f: ---------
p: 1
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e: e: Mary
f: --------- f: *--------
p: 1 p: .534
41
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e: witch
f: -------*-
p: .182
e: e: Mary
f: --------- f: *--------
p: 1 p: .534
43
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e: e: Mary
f: --------- f: *--------
p: 1 p: .534
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e: witch e: slap
f: -------*- f: *-***----
p: .182 p: .043
45
Hypothesis Expansion
Maria no dio una bofetada a la bruja verde
e: witch e: slap
f: -------*- f: *-***----
p: .182 p: .043
47
Hypothesis Recombination
p=0.092
did not
give
p=0.164 p=0.044
Hypothesis Recombination
p=0.092
did not
give
p=0.164
49
Hypothesis Recombination
p=0.092 p=0.017
did not give
Joe
p=1 p=0.534 p=0.092
Mary did not give
did not
give
p=0.164
Hypothesis Recombination
p=0.092
did not give
Joe
p=1 p=0.534 p=0.092
Mary did not give
did not
give
p=0.164
51
Pruning
• Hypothesis recombination is not sufficient
⇒ Heuristically discard weak hypotheses early
• Organize Hypothesis in stacks, e.g. by
– same foreign words covered
– same number of foreign words covered (Pharaoh does this)
– same number of English words produced
• Compare hypotheses in stacks, discard bad ones
– histogram pruning: keep top n hypotheses in each stack (e.g., n=100)
– threshold pruning: keep hypotheses that are at most α times the cost of
best hypothesis in stack (e.g., α = 0.001)
Hypothesis Stacks
1 2 3 4 5 6
53
Comparing Hypotheses
• Comparing hypotheses with same number of foreign words covered
Maria no dio una bofetada a la bruja verde
to the
55
to cost = 0.0299
57
Pharaoh
• A beam search decoder for phrase-based models
– works with various phrase-based models
– beam search algorithm
– time complexity roughly linear with input length
– good quality takes about 1 second per sentence
• Very good performance in DARPA/NIST Evaluation
• Freely available for researchers http://www.isi.edu/licensed-sw/pharaoh/
59
61
Trace
• Running the decoder with switch “-t”
% echo ’das ist ein kleines haus’ | pharaoh -f pharaoh.ini -t
[...]
this is |0.014086|0|1| a |0.188447|2|2| small |0.000706353|3|3|
house |1.46468e-07|4|4|
Reordering Example
• Sometimes phrases have to be reordered:
% echo ’ein kleines haus ist das’ | pharaoh -f pharaoh.ini -t -d 0.5
[...]
this |0.000632805|4|4| is |0.13853|3|3| a |0.0255035|0|0|
small |0.000706353|1|1| house |1.46468e-07|2|2|
63
Hypothesis Accounting
• The switch “-v” allows for detailed run time information:
% echo ’das ist ein kleins haus’ | pharaoh -f pharaoh.ini -v 2
[...]
HYP: 114 added, 284 discarded below threshold, 0 pruned, 58 merged.
BEST: this is a small house -28.9234
Translation Options
• Even more run time information is revealed with “-v 3”:
[das;2]
the<1>, pC=-0.916291, c=-5.78855
it<2>, pC=-2.30259, c=-8.0761
this<3>, pC=-2.30259, c=-8.00205
[ist;4]
is<4>, pC=0, c=-4.92223
’s<5>, pC=0, c=-6.11591
[ein;7]
a<8>, pC=0, c=-5.5151
an<9>, pC=0, c=-6.41298
[kleines;9]
small<10>, pC=-1.60944, c=-9.72116
little<11>, pC=-1.60944, c=-10.0953
[haus;10]
house<12>, pC=0, c=-9.26607
[das ist;5]
it is<6>, pC=-1.60944, c=-10.207
this is<7>, pC=-0.223144, c=-10.2906
65
Hypothesis Expansion
• Start of beam search: First hypothesis (das → the)
creating hypothesis 1 from 0 ( ... </s> <s> )
base score 0
covering 0-0: das
translated as: the => translation cost -0.916291
distance 0 => distortion cost 0
language model cost for ’the’ -2.03434
word penalty -0
score -2.95064 + futureCost -29.4246 = -32.3752
new best estimate for this stack
merged hypothesis on stack 1, now size 1
67
Hypothesis Expansion
• Another hypothesis (das ist → this is)
creating hypothesis 12 from 0 ( ... </s> <s> )
base score 0
covering 0-1: das ist
translated as: this is => translation cost -0.223144
distance 0 => distortion cost 0
language model cost for ’this’ -3.06276
language model cost for ’is’ -0.976669
word penalty -0
score -4.26258 + futureCost -24.5023 = -28.7649
new best estimate for this stack
merged hypothesis on stack 2, now size 2
Hypothesis Expansion
• Hypothesis recombination
creating hypothesis 27 from 3 ( ... <s> this )
base score -5.36535
covering 1-1: ist
translated as: is => translation cost 0
distance 0 => distortion cost 0
language model cost for ’is’ -0.976669
word penalty -0
score -6.34202 + futureCost -24.5023 = -30.8443
worse than existing path to 12, discarding
69
Hypothesis Expansion
• Bad hypothesis that falls out of the beam
creating hypothesis 52 from 6 ( ... <s> a )
base score -6.65992
covering 0-0: das
translated as: this => translation cost -2.30259
distance -3 => distortion cost -3
language model cost for ’this’ -8.69176
word penalty -0
score -20.6543 + futureCost -23.9095 = -44.5637
estimate below threshold, discarding
71
Beam Size
• Trade-off between speed and quality via beam size
% echo ’das ist ein kleines haus’ | pharaoh -f pharaoh.ini -s 10 -v 2
[...]
collected 12 translation options
HYP: 78 added, 122 discarded below threshold, 33 pruned, 20 merged.
BEST: this is a small house -28.9234
Beam size Threshold Hyp. added Hyp. discarded Hyp. pruned Hyp. merged
1000 unlimited 634 0 0 1306
100 unlimited 557 32 199 572
100 0.00001 144 284 0 58
10 0.00001 78 122 33 20
1 0.00001 9 19 4 0
Limits on Reordering
• Reordering may be limited
– Monotone Translation: No reordering at all
– Only phrase movements of at most n words
• Reordering limits speed up search
• Current reordering models are weak, so limits improve translation quality
73
did not
give
p=0.164
75
XML Markup
Er erzielte <NUMBER english=’17.55’>17,55</NUMBER> Punkte .
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
77
Statistical Modeling
Mary did not slap the green witch
79
81
Parallel Corpora
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
83
EM Algorithm
• Incomplete data
– if we had complete data, would could estimate model
– if we had model, we could fill in the gaps in the data
• EM in a nutshell
1. initialize model parameters (e.g. uniform)
2. assign probabilities to the missing data (the connections)
3. estimate model parameters from completed data
4. iterate steps 2 and 3
EM Algorithm (2)
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
85
EM Algorithm (3)
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
EM Algorithm (4)
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
87
EM Algorithm (5)
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
• Convergence
• Inherent hidden structure revealed by EM
EM Algorithm (6)
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
p(la|the) = 0.453
p(le|the) = 0.334
p(maison|house) = 0.876
p(bleu|blue) = 0.563
...
89
Flaws of Word-Based MT
• Multiple English words for one German word
• Phrasal translation
object/subject reordering
German: Den Vorschlag lehnt die Kommission ab .
Gloss: the proposal rejects the commission off .
Correct translation: The commission rejects the proposal .
MT output: The proposal rejects the commission .
91
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
Word Alignment
• Notion of word alignment valuable
• Shared task at NAACL 2003 and ACL 2005 workshops
bofetada bruja
Maria no daba una a la verde
Mary
did
not
slap
the
green
witch
93
Mary Mary
did did
not not
slap slap
the the
green green
witch witch
intersection
bofetada bruja
Maria no daba una a la verde
Mary
did
not
slap
the
green
witch
95
Mary
did
not
slap
the
green
witch
Growing Heuristic
GROW-DIAG-FINAL(e2f,f2e):
neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1))
alignment = intersect(e2f,f2e);
GROW-DIAG(); FINAL(e2f); FINAL(f2e);
GROW-DIAG():
iterate until no new points added
for english word e = 0 ... en
for foreign word f = 0 ... fn
if ( e aligned with f )
for each neighboring point ( e-new, f-new ):
if ( ( e-new not aligned and f-new not aligned ) and
( e-new, f-new ) in union( e2f, f2e ) )
add alignment point ( e-new, f-new )
FINAL(a):
for english word e-new = 0 ... en
for foreign word f-new = 0 ... fn
if ( ( e-new not aligned or f-new not aligned ) and
( e-new, f-new ) in alignment a )
add alignment point ( e-new, f-new )
97
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
Phrase-Based Translation
Morgen fliege ich nach Kanada zur Konferenz
99
Phrase-Based Systems
• A number of research groups developed phrase-based systems
– RWTH Aachen – Univ. of Southern California/ISI – CMU
– IBM – Johns Hopkins U. – Cambridge U. – U. of Catalunya
– ITC-irst – Edinburgh U. – U. of Maryland – U. Valencia
• Systems differ in
– training methods
– model for phrase translation table
– reordering models
– additional feature functions
• Currently best method for SMT (MT?)
– top systems in DARPA/NIST evaluation are phrase-based
– best commercial system for Arabic-English is phrase-based
101
Mary
did
not
slap
the
green
witch
• Collect all phrase pairs that are consistent with the word alignment
103
Mary
did
not
slap
the
green
witch
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green)
105
Mary
did
not
slap
the
green
witch
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green),
(Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the),
(bruja verde, green witch)
Mary
did
not
slap
the
green
witch
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green),
(Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the),
(bruja verde, green witch), (Maria no daba una bofetada, Mary did not slap),
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch)
107
Mary
did
not
slap
the
green
witch
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green),
(Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the),
(bruja verde, green witch), (Maria no daba una bofetada, Mary did not slap),
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch),
(Maria no daba una bofetada a la, Mary did not slap the),
(daba una bofetada a la bruja verde, slap the green witch)
Mary
did
not
slap
the
green
witch
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green),
(Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the),
(bruja verde, green witch), (Maria no daba una bofetada, Mary did not slap),
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch),
(Maria no daba una bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde,
slap the green witch), (no daba una bofetada a la bruja verde, did not slap the green witch),
(Maria no daba una bofetada a la bruja verde, Mary did not slap the green witch)
109
Reordering
• Monotone translation
– do not allow any reordering
→ worse translations
• Limiting reordering (to movement over max. number of words) helps
• Distance-based reordering cost
– moving a foreign phrase over n words: cost ω n
• Lexicalized reordering model
111
e3
d
e4
s
e5
d
e6
Training
? ?
113
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
Log-Linear Models
• IBM Models provided mathematical justification for factoring components
together
pLM × pT M × pD
• These may be weighted
λLM λ λ
pLM × pTTMM × pDD
• Many components pi with weights λi
⇒ i pλi i = exp( i λilog(pi))
Q P
115
Knowledge Sources
• Many different knowledge sources useful
– language model
– reordering (distortion) model
– phrase translation model
– word translation model
– word count
– phrase count
– drop word feature
– phrase pair frequency
– additional language models
– additional features
117
1 3
2 6
3 5
4 2
5 4
6 1
1 find
score translations 2
feature weights
3
4 that move up
5
6 good translations
119
Learning Task
• Task: find weights, so that feature vector of the correct translations ranked
first
TRANSLATION LM TM WP SER
121
123
• Decoding
• Statistical Modeling
• EM Algorithm
• Word Alignment
• Phrase-Based Translation
• Discriminative Training
• Syntax-Based Statistical MT
Syntax-based SMT
• Why Syntax?
• Yamada and Knight: translating into trees
• Wu: tree-based transfer
• Chiang: hierarchical transfer
• Collins, Kucerova, and Koehn: clause structure
• Koehn: factored translation models
• Other approaches
125
foreign english
semantics semantics
foreign english
syntax syntax
foreign english
words words
127
S ?
NP S
VP VP
NP PP NP VP
the house of the man is small the house is the man is small
foreign english
semantics semantics
foreign english
syntax syntax
foreign english
words words
129
he adores VB TO he TO VB adores
listening TO MN MN TO listening
to music music to
VB VB
insert
PRP VB2 VB1 PRP VB2 VB1
MN TO listening no MN TO kiku no
translate
music to ongaku wo
take leaves
Kare ha ongaku wo kiku no ga daisuki desu
[from Yamada and Knight, 2001]
Reordering Table
Original Order Reordering p(reorder|original)
PRP VB1 VB2 PRP VB1 VB2 0.074
PRP VB1 VB2 PRP VB2 VB1 0.723
PRP VB1 VB2 VB1 PRP VB2 0.061
PRP VB1 VB2 VB1 VB2 PRP 0.037
PRP VB1 VB2 VB2 PRP VB1 0.083
PRP VB1 VB2 VB2 VB1 PRP 0.021
VB TO VB TO 0.107
VB TO TO VB 0.893
TO NN TO NN 0.251
TO NN NN TO 0.749
131
Decoding as Parsing
• Chart Parsing
PRP
he
kare ha ongaku wo kiku no ga daisuki desu
Decoding as Parsing
• Chart Parsing
PRP NN TO
he music to
kare ha ongaku wo kiku no ga daisuki desu
133
Decoding as Parsing
PP
PRP NN TO
he music to
kare ha ongaku wo kiku no ga daisuki desu
Decoding as Parsing
PP
PRP NN TO VB
he music to listening
kare ha ongaku wo kiku no ga daisuki desu
• Combine entries
135
Decoding as Parsing
VB2
PP
PRP NN TO VB
he music to listening
kare ha ongaku wo kiku no ga daisuki desu
Decoding as Parsing
VB2
PP
PRP NN TO VB VB1
he music to listening adores
kare ha ongaku wo kiku no ga daisuki desu
137
Decoding as Parsing
VB
VB2
PP
PRP NN TO VB VB1
he music to listening adores
kare ha ongaku wo kiku no ga daisuki desu
139
141
Syntax Trees
143
145
Types of Rules
• Word translation
– X → maison k house
• Phrasal translation
– X → daba una bofetada | slap
• Mixed non-terminal / terminal
– X → X bleue k blue X
– X → ne X pas k not X
– X → X1 X2 k X2 of X1
• Technical rules
– S→SXkSX
– S→XkX
147
Mary
did
not
slap
the
green
witch
X → X verde k green X
Mary
did
not
slap
the
green
witch
X → a la X k the X
149
151
Clause Structure
S PPER-SB Ich I
VAFIN-HD werde will
VP-OC PPER-DA Ihnen you
NP-OA ART-OA die the MAIN
ADJ-NK entsprechenden corresponding CLAUSE
NN-NK Anmerkungen comments
VVFIN aushaendigen pass on
$, , ,
S-MO KOUS-CP damit so that
PPER-SB Sie you
VP-OC PDS-OA das that
ADJD-MO eventuell perhaps SUB-
PP-MO APRD-MO bei in ORDINATE
ART-DA der the CLAUSE
NN-NK Abstimmung vote
VVINF uebernehmen include
VMFIN koennen can
$. . .
153
System BLEU
baseline system 25.2%
with manual rules 26.8%
155
Improved Translations
• we must also this criticism should be taken seriously .
→ we must also take this criticism seriously .
• i am with him that it is necessary , the institutional balance by means of a political revaluation
of both the commission and the council to maintain .
→ i agree with him in this , that it is necessary to maintain the institutional balance by means of
a political revaluation of both the commission and the council .
• perhaps it would be a constructive dialog between the government and opposition parties ,
social representative a positive impetus in the right direction .
→ perhaps a constructive dialog between government and opposition parties and social
representative could give a positive impetus in the right direction .
surface surface
stem stem
part-of-speech
morphology
⇒ part-of-speech
morphology
word class word class
... ...
• Goals
– Generalization, e.g. by translating stems, not surface forms
– Additional information within model (using syntax for reordering, language
modeling)
157
part-of-speech part-of-speech
morphology ⇒ morphology
• Generate surface form on target side
surface
⇑
stem
part-of-speech
morphology
159
161