KEA Practical Automatic Keyphrase Extraction
KEA Practical Automatic Keyphrase Extraction
254
Protocols for secure, atomic transaction Neural multigrid for gauge theories and Proof nets, garbage, and computations
execution in electronic commerce other disordered systems
EVALUATION CONCLUSION
We carried out an empirical evaluation of Kea using Kea is an algorithm for automatically extracting key phrases
documents from the New Zealand Digital Library [5]. from text. Our goal is to provide useful metadata where none
Our goals were to assess Kea’s overall effectiveness, and existed before. By extracting reasonable summaries from text
also to investigate the effects of varying several documents, we give a valuable tool to designers and users of
parameters in the extraction process. We measured digital libraries.
keyphrase quality by counting the number of matches In future, we plan to expand the evaluation of the algorithm. In
between Kea’s output and the keyphrases that were particular, we have been working with the assumption that
originally chosen by the document’s author. Figure 1 using author-specified keyphrases to evaluate the scheme is a
lists the Kea- and author-assigned keyphrases for three reasonable indicator of finding ‘good’ keyphrases. However,
computer science technical reports. Phrases that appear in the near future we will test that assumption by evaluating
in both lists are italicized. Kea’s output using human expert judges, and by comparing
Our results show that Kea can on average match between Kea to other document summarization methods.
one and two of the five keyphrases chosen by the author Kea is available from the New Zealand Digital Library project
in this collection [1]. We consider this to be good (http://www.nzdl.org/).
performance. Although Kea find less than half the
author’s phrases, it must choose from many thousands of REFERENCES
candidates; also, it is highly unlikely that even another [1] Frank E., Paynter G.W., Witten I.H., Gutwin C. and.
human would select the same set of phrases as the Nevill-Manning C.G. (1999). Domain-Specific
original author. Keyphrase Extraction. In Proceedings of the Sixteenth
Furthermore, we have determined that the following are International Joint Conference on Artificial Intelligence,
reasonable minimums on source data for using Kea Morgan Kaufmann Publishers, San Francisco, CA.
effectively: [2] Gutwin, C., Paynter, G., Witten, I.H., Nevill-Manning,
• Kea works well with a training set of as few as 20 C.G., and Frank, E. (1999) Improving Browsing in
documents, meaning that human indexers need only Digital Libraries With Keyphrase Indexes. J. Decision
assign manual keyphrases to a small number of Support Systems. To Appear.
documents in order to extract good keyphrases from [3] Jones, S. and Paynter G.W. (1999) Topic Based
the rest of the collection. Browsing Within a Digital Library Using Keyphrases. In
• Kea works best on the full text of documents, rather Proc. DL’99.
than just titles and abstracts [4] Witten I.H. (1999) Browsing around a digital library. In
• The global document corpus (used to calculate Proc. Australasian Computer Science Conference,
TFxIDF scores) can contain as few as 10 documents, Auckland, New Zealand, 1–14.
and does not need to contain documents that are [5] Witten, I.H., McNab, R., Jones, S., Apperley, M.,
similar to the collection being processed. Bainbridge, D., and Cunningham, S.J. (1999) Managing
Complexity in a Distributed Digital Library. IEEE
Computer, 32, 2 (1999), 74-79.
255