0% found this document useful (0 votes)

17 views2 pages

KEA Practical Automatic Keyphrase Extraction

Uploaded by

rickshark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views2 pages

KEA Practical Automatic Keyphrase Extraction

Uploaded by

rickshark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

KEA: Practical Automatic Keyphrase Extraction

Ian H. Witten,* Gordon W. Paynter,* Eibe Frank,*

Carl Gutwin† and Craig G. Nevill-Manning‡
* † ‡
Dept of Computer Science, Dept of Computer Science, Dept of Computer Science,
University of Waikato, University of Saskatchewan, Rutgers University,
Hamilton, New Zealand. Saskatoon, Canada Piscataway, New Jersey
{ihw,gwp,eibe}@cs.waikato.ac.nz gutwin@cs.usask.ca nevill@cs.rutgers.edu

Keyphrases provide semantic metadata that summarize THE KEA ALGORITHM

and characterize documents. Kea is an algorithm for Kea is an algorithm for automatically extracting keyphrases
automatically extracting keyphrases from text. We use a from text. The algorithm has two stages:
large test corpus to evaluate its effectiveness in terms of
1. Training: create a model for identifying keyphrases, using
how many author-assigned keyphrases are correctly
training documents where the author’s keyphrases are
identified. The system is simple, robust, and publicly
known.
available. Kea identifies candidate keyphrases using
lexical methods, calculates feature values for each 2. Extraction: choose keyphrases from a new document, using
candidate, and uses a machine-learning algorithm to the above model.
predict which candidates are good keyphrases. The Both stages choose a set of candidate phrases from their input
machine learning scheme first builds a prediction model documents, and then calculate the values of certain attributes,
using training documents with known keyphrases, and or features, for each candidate.
then uses the model to find keyphrases in new
Candidate phrases. Kea chooses candidate phrases in three
documents.
steps. It first cleans the input text, then identifies candidates,
Keyphrases are useful because they briefly summarize a and finally stems and case-folds the phrases. After splitting the
document’s content. As large document collections such text into words and sentences, Kea considers all the
as digital libraries become widespread, the value of such subsequences in each sentence and determines which of these
summary information increases. Keywords and are suitable candidate phrases. All words are then case-folded
keyphrases are particularly useful because they can be and stemmed.
interpreted individually and independently of each other.
Feature Calculation. Two features are calculated for each
They can be used in information retrieval systems as
candidate phrase and used in training and extraction. They are
descriptions of the documents returned by a query, as the
TF×IDF, a measure of a phrase’s frequency in a document
basis for search indexes, as a way of browsing a
compared to its rarity in general use; and first occurrence,
collection, and as a document clustering technique (e.g.
which is the distance into the document of the phrase’s first
[2], [3], [4]).
appearance.
Keyphrases are usually chosen manually. In many
Training. The training stage uses a set of training documents
academic contexts, authors assign keyphrases to
documents they have written. Professional indexers often for which the author’s keyphrases are known. For each
training document, candidate phrases are identified and their
choose phrases from a “controlled vocabulary” that is
feature values are calculated as described above. The scheme
predefined for the domain at hand. However, the great
then generates a model that predicts the class using the values
majority of documents come without keyphrases, and
of the other two features.
assigning them manually is a tedious process that
requires knowledge of the subject matter. Automatic We have experimented with a number of different machine
extraction techniques are potentially of great benefit. learning schemes; Kea uses the Naïve Bayes technique
because it is simple and yields good results [1]. This scheme
learns two sets of numeric weights from the discretized feature
values, one set applying to positive (“is a keyphrase”)
Permission to make digital or hard copies of all or part of this work for
examples and the other to negative (“is not a keyphrase”)
personal or classroom use is granted without fee provided that copies instances.
are not made or distributed for profit or commercial advantage and that
Extracting keyphrases from new documents. To select
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists, keyphrases from a new document, Kea extracts candidate
requires prior specific permission and/or a fee. phrases, determines feature values, and then applies the model
DL 99, Berkeley, CA USA built during training. The model determines the overall
Copyright ACM 1999 1-58113-145-3/99/08 . . . $5.00
probability that each candidate is a keyphrase, and then a post-
processing operation selects the best set of keyphrases.

254
Protocols for secure, atomic transaction Neural multigrid for gauge theories and Proof nets, garbage, and computations
execution in electronic commerce other disordered systems

anonymity atomicity disordered disordered cut-elimination cut

atomicity auction systems gauge linear logic cut elimination
auction customer gauge fields gauge fields proof nets garbage
electronic electronic multigrid interpolation kernels sharing graphs proof net
commerce commerce neural multigrid length scale typed lambda- weakening
privacy intruder neural networks multigrid calculus
real-time merchant smooth
security protocol
transaction security
third party
transaction
Figure 1 Examples of author- and Kea-assigned keyphrases

EVALUATION CONCLUSION
We carried out an empirical evaluation of Kea using Kea is an algorithm for automatically extracting key phrases
documents from the New Zealand Digital Library [5]. from text. Our goal is to provide useful metadata where none
Our goals were to assess Kea’s overall effectiveness, and existed before. By extracting reasonable summaries from text
also to investigate the effects of varying several documents, we give a valuable tool to designers and users of
parameters in the extraction process. We measured digital libraries.
keyphrase quality by counting the number of matches In future, we plan to expand the evaluation of the algorithm. In
between Kea’s output and the keyphrases that were particular, we have been working with the assumption that
originally chosen by the document’s author. Figure 1 using author-specified keyphrases to evaluate the scheme is a
lists the Kea- and author-assigned keyphrases for three reasonable indicator of finding ‘good’ keyphrases. However,
computer science technical reports. Phrases that appear in the near future we will test that assumption by evaluating
in both lists are italicized. Kea’s output using human expert judges, and by comparing
Our results show that Kea can on average match between Kea to other document summarization methods.
one and two of the five keyphrases chosen by the author Kea is available from the New Zealand Digital Library project
in this collection [1]. We consider this to be good (http://www.nzdl.org/).
performance. Although Kea find less than half the
author’s phrases, it must choose from many thousands of REFERENCES
candidates; also, it is highly unlikely that even another [1] Frank E., Paynter G.W., Witten I.H., Gutwin C. and.
human would select the same set of phrases as the Nevill-Manning C.G. (1999). Domain-Specific
original author. Keyphrase Extraction. In Proceedings of the Sixteenth
Furthermore, we have determined that the following are International Joint Conference on Artificial Intelligence,
reasonable minimums on source data for using Kea Morgan Kaufmann Publishers, San Francisco, CA.
effectively: [2] Gutwin, C., Paynter, G., Witten, I.H., Nevill-Manning,
• Kea works well with a training set of as few as 20 C.G., and Frank, E. (1999) Improving Browsing in
documents, meaning that human indexers need only Digital Libraries With Keyphrase Indexes. J. Decision
assign manual keyphrases to a small number of Support Systems. To Appear.
documents in order to extract good keyphrases from [3] Jones, S. and Paynter G.W. (1999) Topic Based
the rest of the collection. Browsing Within a Digital Library Using Keyphrases. In
• Kea works best on the full text of documents, rather Proc. DL’99.
than just titles and abstracts [4] Witten I.H. (1999) Browsing around a digital library. In
• The global document corpus (used to calculate Proc. Australasian Computer Science Conference,
TFxIDF scores) can contain as few as 10 documents, Auckland, New Zealand, 1–14.
and does not need to contain documents that are [5] Witten, I.H., McNab, R., Jones, S., Apperley, M.,
similar to the collection being processed. Bainbridge, D., and Cunningham, S.J. (1999) Managing
Complexity in a Distributed Digital Library. IEEE
Computer, 32, 2 (1999), 74-79.

255

Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
No ratings yet
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
349 pages
Arabic Keyphrase Extraction
0% (1)
Arabic Keyphrase Extraction
77 pages
2019 Book CyberSecurity PDF
No ratings yet
2019 Book CyberSecurity PDF
184 pages
Keyword Extraction PDF
No ratings yet
Keyword Extraction PDF
29 pages
NRC Publications Archive Archives Des Publications Du CNRC: Coherent Keyphrase Extraction Via Web Mining
No ratings yet
NRC Publications Archive Archives Des Publications Du CNRC: Coherent Keyphrase Extraction Via Web Mining
8 pages
Summarization
No ratings yet
Summarization
10 pages
Ijcsn 2013 2 4 60 PDF
No ratings yet
Ijcsn 2013 2 4 60 PDF
3 pages
How Good Is Your Model?: Andreas Müller
No ratings yet
How Good Is Your Model?: Andreas Müller
54 pages
Valemount Directory Prospect List
No ratings yet
Valemount Directory Prospect List
45 pages
Key-Phrase Extraction For Classification
No ratings yet
Key-Phrase Extraction For Classification
4 pages
1997-00 Listing of Working Papers
No ratings yet
1997-00 Listing of Working Papers
16 pages
Japan and Philippines Similarities Differences
No ratings yet
Japan and Philippines Similarities Differences
5 pages
Search Engine Techniques
No ratings yet
Search Engine Techniques
10 pages
Bondaries Between Sex and Revenge Porn
No ratings yet
Bondaries Between Sex and Revenge Porn
23 pages
Neumann MT 48 Dante Appendix
No ratings yet
Neumann MT 48 Dante Appendix
33 pages
Acl 14
No ratings yet
Acl 14
12 pages
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
No ratings yet
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
26 pages
Research AP Example
No ratings yet
Research AP Example
23 pages
Prelim Exam - Calculus Based Physics 1
No ratings yet
Prelim Exam - Calculus Based Physics 1
17 pages
Keyphrase Extraction in Scientific Publications
No ratings yet
Keyphrase Extraction in Scientific Publications
10 pages
Brochure - Motability-Range 2
No ratings yet
Brochure - Motability-Range 2
18 pages
A Stop List For General Text
No ratings yet
A Stop List For General Text
17 pages
3 - 23-cv-02866-K - 34 - PRIMARY DOCUMENT
No ratings yet
3 - 23-cv-02866-K - 34 - PRIMARY DOCUMENT
15 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
Bluetooth: Objective
No ratings yet
Bluetooth: Objective
8 pages
IPS E-Max ZirCAD
No ratings yet
IPS E-Max ZirCAD
52 pages
Web analysis+IRAC
No ratings yet
Web analysis+IRAC
14 pages
Determinants Lecture
No ratings yet
Determinants Lecture
11 pages
National Artist in Visual Arts
No ratings yet
National Artist in Visual Arts
70 pages
1704.03242
No ratings yet
1704.03242
12 pages
ATS320 AC ATS Controller User Manual V1.0
No ratings yet
ATS320 AC ATS Controller User Manual V1.0
17 pages
Document Centered Approach To Text Normalization
No ratings yet
Document Centered Approach To Text Normalization
8 pages
A Suggestion-Based RDF Instance Matching System: January 2017
No ratings yet
A Suggestion-Based RDF Instance Matching System: January 2017
6 pages
Carenado C340 II Checklistfinal
0% (1)
Carenado C340 II Checklistfinal
2 pages
Cloze Test
100% (1)
Cloze Test
18 pages
Text Extraction Research Paper
No ratings yet
Text Extraction Research Paper
6 pages
Vernacular Term1
No ratings yet
Vernacular Term1
3 pages
3.enhanced ER Model
No ratings yet
3.enhanced ER Model
4 pages
A Comparative Study For Arabic Text Classification Algorithms Based On Stop Words Elimination
No ratings yet
A Comparative Study For Arabic Text Classification Algorithms Based On Stop Words Elimination
5 pages
Definition of Hypothesis in Research
100% (3)
Definition of Hypothesis in Research
5 pages
Classifying Arabic Web Pages Toolkit
No ratings yet
Classifying Arabic Web Pages Toolkit
4 pages
1A - Safe Operation in Chemical Plants With Stop Work Authority
No ratings yet
1A - Safe Operation in Chemical Plants With Stop Work Authority
12 pages
SOWNDARRAJAN Journal
No ratings yet
SOWNDARRAJAN Journal
7 pages
Vi Class Pongal Fun Work For All Orientations
No ratings yet
Vi Class Pongal Fun Work For All Orientations
2 pages
An Unsupervised Model For Text Message Normalization
No ratings yet
An Unsupervised Model For Text Message Normalization
8 pages
ĐỀ THI VIẾT
No ratings yet
ĐỀ THI VIẾT
12 pages
GIPSA and GIC Re Reimbursement Claim Docs Check LIst
No ratings yet
GIPSA and GIC Re Reimbursement Claim Docs Check LIst
1 page
worksheets in 3i
No ratings yet
worksheets in 3i
2 pages
Vscp-Clinical-Skills-List - 1
No ratings yet
Vscp-Clinical-Skills-List - 1
5 pages
User Manual Hager HXW040H (English - 196 Pages)
No ratings yet
User Manual Hager HXW040H (English - 196 Pages)
2 pages
Gas Valve Description Portfolio
No ratings yet
Gas Valve Description Portfolio
2 pages
B.Tech CSE I Year B
No ratings yet
B.Tech CSE I Year B
1 page
Semgrep in Practice: The Complete Guide for Developers and Engineers
From Everand
Semgrep in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
TypeScript in Practice: Definitive Reference for Developers and Engineers
From Everand
TypeScript in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Techniques for Rust Programming: Definitive Reference for Developers and Engineers
From Everand
Essential Techniques for Rust Programming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Java Data Structures Explained: A Practical Guide with Example
From Everand
Java Data Structures Explained: A Practical Guide with Example
William E. Clark
No ratings yet
Python Decorators in Depth: Definitive Reference for Developers and Engineers
From Everand
Python Decorators in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Shell Scripting and Automation: Definitive Reference for Developers and Engineers
From Everand
Essential Shell Scripting and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sinon.js Essentials: Definitive Reference for Developers and Engineers
From Everand
Sinon.js Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Metaprogramming Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Metaprogramming Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming with Nim: Definitive Reference for Developers and Engineers
From Everand
Programming with Nim: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering C++: Advanced Techniques and Tricks
From Everand
Mastering C++: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Keras Deep Learning Essentials: Definitive Reference for Developers and Engineers
From Everand
Keras Deep Learning Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Kerberos Protocol Security and Implementation: Definitive Reference for Developers and Engineers
From Everand
Kerberos Protocol Security and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Elixir Foundations and Practices: Definitive Reference for Developers and Engineers
From Everand
Elixir Foundations and Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
From Everand
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
Omar Khedher
No ratings yet
Eureka Service Discovery Essentials: Definitive Reference for Developers and Engineers
From Everand
Eureka Service Discovery Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Computer Programming: A Comprehensive Guide
From Everand
Mastering Computer Programming: A Comprehensive Guide
Kondwani Hara
No ratings yet
Awk Programming in Practice: Definitive Reference for Developers and Engineers
From Everand
Awk Programming in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
From Everand
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Practical Moq for .NET Developers: Definitive Reference for Developers and Engineers
From Everand
Practical Moq for .NET Developers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Aircrack-ng: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Aircrack-ng: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of Nix Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Nix Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Swift 3 Object-Oriented Programming - Second Edition
From Everand
Swift 3 Object-Oriented Programming - Second Edition
Gastón C. Hillar
No ratings yet
Shell Scripting Step by Step: A Practical Guide with Examples
From Everand
Shell Scripting Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Java Algorithms for Beginners: A Practical Guide with Examples
From Everand
Java Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Comprehensive Guide to Nmap: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Nmap: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Cypress.io Essentials: Definitive Reference for Developers and Engineers
From Everand
Cypress.io Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
From Everand
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
Allen Lee
4.5/5 (6)
Detectron2 in Practice: Definitive Reference for Developers and Engineers
From Everand
Detectron2 in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Regular Expressions Demystified: A Practical Guide with Examples
From Everand
Regular Expressions Demystified: A Practical Guide with Examples
William E. Clark
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
SystemTap Essentials: Definitive Reference for Developers and Engineers
From Everand
SystemTap Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
C++ Mastery: Advanced Techniques and Strategies
From Everand
C++ Mastery: Advanced Techniques and Strategies
Adam Jones
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

KEA Practical Automatic Keyphrase Extraction

Uploaded by

KEA Practical Automatic Keyphrase Extraction

Uploaded by

KEA: Practical Automatic Keyphrase Extraction

Ian H. Witten,* Gordon W. Paynter,* Eibe Frank,*

Keyphrases provide semantic metadata that summarize THE KEA ALGORITHM

anonymity atomicity disordered disordered cut-elimination cut

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.