Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
3 4 3 4 3 3 4
: exp( (y ,x) (y ,y ,x)) (y ,y ,x) c c + =
1 1 2 1 2 1 1 2
: exp( (y ,x) (y ,x) (y ,y ,x)) (y ,y ,x) c c + + =
2 3 2 3 2 2 3
: exp( (y ,x) (y ,y ,x)) (y ,y ,x) c c + =
Modeling the label bias problem
In a simple HMM, each state generates its designated symbol with probability
29/32 and the other symbols with probability 1/32
Train MEMM and CRF with the same topologies
A run consists of 2,000 training examples and 500 test examples, trained to
convergence using Iterative Scaling algorithm
CRF error is 4.6%, and MEMM error is 42%
MEMM fails to discriminate between the two branches
CRF solves label bias problem
MEMM vs. HMM
The HMM outperforms the MEMM
MEMM vs. CRF
CRF usually outperforms the MEMM
CRF vs. HMM
Each open square represents a data set with < 1/2, and a solid circle indicates
a data set with 1/2; When the data is mostly second order ( 1/2), the
discriminatively trained CRF usually outperforms the HMM
POS tagging Experiments
POS tagging Experiments (contd)
Compared HMMs, MEMMs, and CRFs on Penn treebank POS tagging
Each word in a given input sentence must be labeled with one of 45 syntactic tags
Add a small set of orthographic features: whether a spelling begins with a number
or upper case letter, whether it contains a hyphen, and if it contains one of the
following suffixes: -ing, -ogy, -ed, -s, -ly, -ion, -tion, -ity, -ies
oov = out-of-vocabulary (not observed in the training set)
Summary
Discriminative models are prone to the label bias problem
CRFs provide the benefits of discriminative models
CRFs solve the label bias problem well, and demonstrate good
performance
Thanks for your attention!
Special thanks to
Prof. Dietterich & Tadepalli!