0% found this document useful (0 votes)

43 views28 pages

Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data

This document summarizes the key points of the ICML 2001 paper "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data". It introduces conditional random fields (CRFs), a discriminative framework for building probabilistic models to label and segment sequence data. CRFs address label bias issues that can occur in maximum entropy Markov models (MEMMs) by assigning probabilities to entire sequences rather than per-state. Experimental results show CRFs outperform HMMs and MEMMs on tasks such as part-of-speech tagging due to their ability to incorporate diverse and overlapping features while avoiding label bias problems.

Uploaded by

clliu168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views28 pages

Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data

Uploaded by

clliu168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

ICML 2001

Conditional Random Fields:

Probabilistic Models for Segmenting and
Labeling Sequence Data
John Lafferty, Andrew McCallum, Fernando Pereira

Presentation by Rongkun Shen
Nov. 20, 2003
Sequence Segmenting and Labeling
Goal: mark up sequences with content tags

Application in computational biology
DNA and protein sequence alignment
Sequence homolog searching in databases
Protein secondary structure prediction
RNA secondary structure analysis

Application in computational linguistics & computer science
Text and speech processing, including topic segmentation, part-of-speech
(POS) tagging
Information extraction
Syntactic disambiguation
Example: Protein secondary structure prediction
Conf: 977621015677468999723631357600330223342057899861488356412238
Pred: CCCCCCCCCCCCCEEEEEEECCCCCCCCCCCCCHHHHHHHHHHHHHHHCCCCEEEEHHCC
AA: EKKSINECDLKGKKVLIRVDFNVPVKNGKITNDYRIRSALPTLKKVLTEGGSCVLMSHLG
10 20 30 40 50 60

Conf: 855764222454123478985100010478999999874033445740023666631258
Pred: CCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCCCCCCCCHHHHHHCCC
AA: RPKGIPMAQAGKIRSTGGVPGFQQKATLKPVAKRLSELLLRPVTFAPDCLNAADVVSKMS
70 80 90 100 110 120

Conf: 874688611002343044310017899999875053355212244334552001322452
Pred: CCCEEEECCCHHHHHHCCCCCHHHHHHHHHHHHHCCEEEECCCCCCCCCCCCCCCCHHHH
AA: PGDVVLLENVRFYKEEGSKKAKDREAMAKILASYGDVYISDAFGTAHRDSATMTGIPKIL
130 140 150 160 170 180
Generative Models
Hidden Markov models (HMMs) and stochastic grammars
Assign a joint probability to paired observation and label sequences
The parameters typically trained to maximize the joint likelihood of train
examples
Generative Models (contd)
Difficulties and disadvantages
Need to enumerate all possible observation sequences
Not practical to represent multiple interacting features or long-range
dependencies of the observations
Very strict independence assumptions on the observations
Conditional Models
Conditional probability P(label sequence y | observation sequence x) rather
than joint probability P(y, x)
Specify the probability of possible label sequences given an observation
sequence

Allow arbitrary, non-independent features on the observation sequence X

The probability of a transition between labels may depend on past and
future observations
Relax strong independence assumptions in generative models
Discriminative Models
Maximum Entropy Markov Models (MEMMs)
Exponential model
Given training set X with label sequence Y:
Train a model that maximizes P(Y|X, )
For a new data sequence x, the predicted label y maximizes P(y|x, )
Notice the per-state normalization
MEMMs (contd)
MEMMs have all the advantages of Conditional Models

Per-state normalization: all the mass that arrives at a state must be
distributed among the possible successor states (conservation of score
mass)

Subject to Label Bias Problem

Bias toward states with fewer outgoing transitions
Label Bias Problem
P(1 and 2 | ro) = P(2 | 1 and ro)P(1 | ro) = P(2 | 1 and o)P(1 | r)
P(1 and 2 | ri) = P(2 | 1 and ri)P(1 | ri) = P(2 | 1 and i)P(1 | r)

Since P(2 | 1 and x) = 1 for all x, P(1 and 2 | ro) = P(1 and 2 | ri)
In the training data, label value 2 is the only label value observed after label value 1
Therefore P(2 | 1) = 1, so P(2 | 1 and x) = 1 for all x

However, we expect P(1 and 2 | ri) to be greater than P(1 and 2 | ro).

Per-state normalization does not allow the required expectation
Consider this MEMM:
Solve the Label Bias Problem
Change the state-transition structure of the model

Not always practical to change the set of states

Start with a fully-connected model and let the training
procedure figure out a good structure
Prelude the use of prior, which is very valuable (e.g. in information
extraction)
Random Field
Conditional Random Fields (CRFs)
CRFs have all the advantages of MEMMs without
label bias problem
MEMM uses per-state exponential model for the conditional probabilities of
next states given the current state
CRF has a single exponential model for the joint probability of the entire
sequence of labels given the observation sequence
Undirected acyclic graph
Allow some transitions vote more strongly than others
depending on the corresponding observations
Definition of CRFs
X is a random variable over data sequences to be labeled
Y is a random variable over corresponding label sequences
Example of CRFs
Graphical comparison among
HMMs, MEMMs and CRFs
HMM MEMM CRF
Conditional Distribution
1 2 1 2
( , , , ; , , , ); and
n n k k
u =
x is a data sequence
y is a label sequence
v is a vertex from vertex set V = set of label random variables
e is an edge from edge set E over V
f
k
and g
k
are given and fixed. g
k
is a Boolean vertex feature; f
k
is a
Boolean edge feature
k is the number of features
are parameters to be estimated
y|
e
is the set of components of y defined by edge e
y|
v
is the set of components of y defined by vertex v
If the graph G = (V, E) of Y is a tree, the conditional distribution over the
label sequence Y = y, given X = x, by fundamental theorem of random
fields is:
(y | x) exp ( , y | , x) ( , y | , x)
u

e e
| |
+
|
\ .
k k e k k v
e E,k v V ,k
p f e g v
Conditional Distribution (contd)
CRFs use the observation-dependent normalization Z(x) for the
conditional distributions:
Z(x) is a normalization over the data sequence x
(y | x) exp ( , y | , x) ( , y |
1
(x)
, x)
u

e e
| |
= +
|
\ .
k k e k k v
e E,k v V ,k
p f e g v
Z
Parameter Estimation for CRFs
The paper provided iterative scaling algorithms

It turns out to be very inefficient

Prof. Dietterichs group applied Gradient Descendent
Algorithm, which is quite efficient
Training of CRFs (From Prof. Dietterich)
log ( | )
( , y | , x) ( , y | , x) log (x)
u

u u
e e
| | c c
= +
|
c c
\ .
k k e k k v
e E,k v V ,k
p y x
f e g v Z
log ( | ) ( , y| , x) ( , y| , x) log (x)
k k e k k v
e E,k v V ,k
p y x f e g v Z
u

e e
= +

First, we take the log of the equation
Then, take the derivative of the above equation
For training, the first 2 items are easy to get.
For example, for each
k
, f
k
is a sequence of Boolean numbers, such
as 00101110100111.
is just the total number of 1s in the sequence.
( , y | , x)
k k e
f e
The hardest thing is how to calculate Z(x)
Training of CRFs (From Prof. Dietterich) (contd)
Maximal cliques
y
1
y
2
y
3
y
4
c
1
c
2
c
3
c
1
c
2
c
3
1 2 3 4
1 2 3 4
1 1 2 2 2 3 3 3 4
y ,y ,y ,y
1 1 2 2 2 3 3 3 4
y y y y
(x) (y ,y ,x) (y ,y ,x) (y ,y ,x)
(y ,y ,x) (y ,y ,x) (y ,y ,x)
Z c c c
c c c
=
=

3 4 3 4 3 3 4
: exp( (y ,x) (y ,y ,x)) (y ,y ,x) c c + =
1 1 2 1 2 1 1 2
: exp( (y ,x) (y ,x) (y ,y ,x)) (y ,y ,x) c c + + =
2 3 2 3 2 2 3
: exp( (y ,x) (y ,y ,x)) (y ,y ,x) c c + =
Modeling the label bias problem
In a simple HMM, each state generates its designated symbol with probability
29/32 and the other symbols with probability 1/32

Train MEMM and CRF with the same topologies

A run consists of 2,000 training examples and 500 test examples, trained to
convergence using Iterative Scaling algorithm

CRF error is 4.6%, and MEMM error is 42%

MEMM fails to discriminate between the two branches

CRF solves label bias problem
MEMM vs. HMM
The HMM outperforms the MEMM
MEMM vs. CRF
CRF usually outperforms the MEMM
CRF vs. HMM
Each open square represents a data set with < 1/2, and a solid circle indicates
a data set with 1/2; When the data is mostly second order ( 1/2), the
discriminatively trained CRF usually outperforms the HMM
POS tagging Experiments
POS tagging Experiments (contd)
Compared HMMs, MEMMs, and CRFs on Penn treebank POS tagging
Each word in a given input sentence must be labeled with one of 45 syntactic tags
Add a small set of orthographic features: whether a spelling begins with a number
or upper case letter, whether it contains a hyphen, and if it contains one of the
following suffixes: -ing, -ogy, -ed, -s, -ly, -ion, -tion, -ity, -ies
oov = out-of-vocabulary (not observed in the training set)
Summary
Discriminative models are prone to the label bias problem

CRFs provide the benefits of discriminative models

CRFs solve the label bias problem well, and demonstrate good
performance
Thanks for your attention!

Special thanks to
Prof. Dietterich & Tadepalli!

H&M PDF
100% (2)
H&M PDF
21 pages
Nutrition For Athletes DR Milisav Nikolic
100% (1)
Nutrition For Athletes DR Milisav Nikolic
200 pages
Impacts of A Handset Leasing Model On Mobile Telcos
No ratings yet
Impacts of A Handset Leasing Model On Mobile Telcos
5 pages
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
No ratings yet
Conditional Random Fields - A probabilistic graphical model: Yen-Chin Lee 指導老師：鮑興國
25 pages
CRF Klinger Tomanek
No ratings yet
CRF Klinger Tomanek
32 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
CRF Laura Kallmeyer
No ratings yet
CRF Laura Kallmeyer
21 pages
HLT 2004
No ratings yet
HLT 2004
8 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
Conditional Random Field
No ratings yet
Conditional Random Field
5 pages
NLP Summary
No ratings yet
NLP Summary
2 pages
Crftut FNT PDF
No ratings yet
Crftut FNT PDF
109 pages
Sequence Labeling For Parts of Speech and Named Entities PPT 2
No ratings yet
Sequence Labeling For Parts of Speech and Named Entities PPT 2
18 pages
8 CRF
No ratings yet
8 CRF
12 pages
Conditional Random Field Model (CRF)
No ratings yet
Conditional Random Field Model (CRF)
31 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
Conditional Random Fields
No ratings yet
Conditional Random Fields
10 pages
crf2 PDF
No ratings yet
crf2 PDF
10 pages
Conditional Random Fields: An Introduction: 1 Labeling Sequential Data
No ratings yet
Conditional Random Fields: An Introduction: 1 Labeling Sequential Data
9 pages
Conditional Random Fields (CRFS)
No ratings yet
Conditional Random Fields (CRFS)
13 pages
Quantum Conditional Random Field: PACS Numbers
No ratings yet
Quantum Conditional Random Field: PACS Numbers
9 pages
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
No ratings yet
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
90 pages
CRF Eric Xing
No ratings yet
CRF Eric Xing
31 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
Class Test 2 Answer Key
No ratings yet
Class Test 2 Answer Key
4 pages
CRF Tutorial Talk
No ratings yet
CRF Tutorial Talk
35 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
CRF Tutorial ISMIR-2013 PDF
No ratings yet
CRF Tutorial ISMIR-2013 PDF
133 pages
Flexcrfs
No ratings yet
Flexcrfs
34 pages
Module 3
No ratings yet
Module 3
17 pages
Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs
No ratings yet
Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs
5 pages
Semi-Markov Conditional Random Fields For Information Extraction
No ratings yet
Semi-Markov Conditional Random Fields For Information Extraction
8 pages
Partially Directed Graphs and Conditional Random Fields: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Partially Directed Graphs and Conditional Random Fields: Sargur Srihari Srihari@cedar - Buffalo.edu
43 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Challenges in ML&DM
No ratings yet
Challenges in ML&DM
12 pages
Ch13 5-ConditionalRandomFields
No ratings yet
Ch13 5-ConditionalRandomFields
57 pages
This Is AI4001: GCR: t37g47w
No ratings yet
This Is AI4001: GCR: t37g47w
51 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Lect 0407
No ratings yet
Lect 0407
6 pages
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
No ratings yet
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
99 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
12 pages
Semi-Supervised Learning With Graphs
No ratings yet
Semi-Supervised Learning With Graphs
174 pages
A Comprehensive Survey On Pretrained Foundation Models
No ratings yet
A Comprehensive Survey On Pretrained Foundation Models
97 pages
Noun Phrase Extraction: A Description of Current Techniques
No ratings yet
Noun Phrase Extraction: A Description of Current Techniques
36 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
Predicting Structured Data
No ratings yet
Predicting Structured Data
29 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Cat 2 Document Likkitha
No ratings yet
Cat 2 Document Likkitha
80 pages
Csci 544 Sequence Labeling L
No ratings yet
Csci 544 Sequence Labeling L
79 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
Structured Prediction
No ratings yet
Structured Prediction
3 pages
Toward Faster Methods in Bayesian Unsupervised Learning
No ratings yet
Toward Faster Methods in Bayesian Unsupervised Learning
235 pages
SL Part 1
No ratings yet
SL Part 1
23 pages
02 Unit 4
No ratings yet
02 Unit 4
10 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
Machine Learning Lecture - 4 and Lecture - 5
No ratings yet
Machine Learning Lecture - 4 and Lecture - 5
73 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
A Reading of Baudelaire's "Recueillement"
No ratings yet
A Reading of Baudelaire's "Recueillement"
5 pages
Princess Accessories For 18" Doll
No ratings yet
Princess Accessories For 18" Doll
5 pages
Bon Appétit Bon Appétit
No ratings yet
Bon Appétit Bon Appétit
16 pages
Teste de Aderencia
No ratings yet
Teste de Aderencia
7 pages
Tenebrio Molitor Vicia Faba Zea Mays
No ratings yet
Tenebrio Molitor Vicia Faba Zea Mays
57 pages
Geography Oral Presentation Script
No ratings yet
Geography Oral Presentation Script
2 pages
GRFG Panels PDF
No ratings yet
GRFG Panels PDF
14 pages
Signals and Systems For Signals and Systems For
No ratings yet
Signals and Systems For Signals and Systems For
75 pages
Kinetics of Condensation Reaction of Crude Glycerol With Acetaldehyde in A Reactive Extraction Process
No ratings yet
Kinetics of Condensation Reaction of Crude Glycerol With Acetaldehyde in A Reactive Extraction Process
10 pages
The Nitriding Process Is Perhaps One of The Most Misunderstood Thermochemical Surface
No ratings yet
The Nitriding Process Is Perhaps One of The Most Misunderstood Thermochemical Surface
25 pages
Report On Improving Positioning Algorithm During Landing
No ratings yet
Report On Improving Positioning Algorithm During Landing
6 pages
4th Quarter 2015 Lesson 11 Powerpoint Presentation
0% (1)
4th Quarter 2015 Lesson 11 Powerpoint Presentation
8 pages
Air Demand of A Hydraulic Jump in A Closed Conduit
No ratings yet
Air Demand of A Hydraulic Jump in A Closed Conduit
13 pages
Lathe Machine (PDF) - Definition, Parts, Types, Operations & Specifications PDF
No ratings yet
Lathe Machine (PDF) - Definition, Parts, Types, Operations & Specifications PDF
43 pages
IC-M801GMDSS Certificate N° 06212504AA00
No ratings yet
IC-M801GMDSS Certificate N° 06212504AA00
5 pages
HT2000 H56163 HR PDF
No ratings yet
HT2000 H56163 HR PDF
1 page
HPC Computer Engg Sem 8 Notes
No ratings yet
HPC Computer Engg Sem 8 Notes
36 pages
Fire Safety Lecture
No ratings yet
Fire Safety Lecture
37 pages
Triton Family of Feet Brochure-2
No ratings yet
Triton Family of Feet Brochure-2
12 pages
Priced Boq - Kosher Supermarket
No ratings yet
Priced Boq - Kosher Supermarket
4 pages
Industrial Waste Treatment, Volume I
No ratings yet
Industrial Waste Treatment, Volume I
4 pages
Digimon World Data Squad Walk Through
No ratings yet
Digimon World Data Squad Walk Through
15 pages
LAB0016 Covid-19 Molecular Diagnostic Lab Belbas, Butwal
No ratings yet
LAB0016 Covid-19 Molecular Diagnostic Lab Belbas, Butwal
1 page
General Information of Pakistan
No ratings yet
General Information of Pakistan
5 pages
Changes in Matter &
No ratings yet
Changes in Matter &
30 pages
Ultimate Series DVR Manual
No ratings yet
Ultimate Series DVR Manual
21 pages
Research Design
No ratings yet
Research Design
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data

Uploaded by

Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data

Uploaded by

ICML 2001

Conditional Random Fields:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.