Coba Coba Upload
Coba Coba Upload
Keywords
Text mining, Tokenize, Filtering, Stop words, Stemming.
1.INTRODUCTION
Text mining [11, 12] is the analysis of data contained in
natural language text. Text mining can help an organization
derive potentially valuable business insights from text-based
content such as word documents, electronic mail as well as
postings on social media streams. Mining unstructured
data with natural language processing (NLP), statistical
modeling and machine learning techniques can be a challenge,
because natural language text is often inconsistent. It suffers
from ambiguities caused by inconsistent syntax and
semantics.
Fig. 1. Processing document from files in RapidMiner
16
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868
Foundation of Computer Science FCS, New York, USA
Volume 7– No. 2, April 2014 – www.ijais.org
17
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868
Foundation of Computer Science FCS, New York, USA
Volume 7– No. 2, April 2014 – www.ijais.org
automatically learned Information extraction system to extract [7] C. Cardie, “Empirical methods in information extraction”,
a structured database. AI Magazine, 18(4):65–79, 1997.
[8] C. Cardie and R. J. Mooney, “Machine learning and
7. REFERNCES natural language (Introduction to special issue on natural
[1] R. Agrawal and R. Srikant. Fast algorithms for mining language learning)” Machine Learning, 34:5–9, 1999.
association rules in Proceedings of the 20th International
Conference on Very Large Databases (VLDB-94), Chile, [9] Jiawei Han and Micheline Kamber, “Data Mining
Sept. 1994. Concepts and Techniques”, Morgan Kaufmann
Publisher, 722
[2] Margaret H. Dunham, Data Mining “Introduction and
Advanced Topics”. [10] Yang Y M, “An evaluation of statistical approach to text
categorization [R]” in Technical Report CMU - CS - 97-
[3] R. Baeza-Yates and B. Ribeiro-Neto, “Modern 127. Computer Science Department, Carnegie Mellon
Information Retrieval” ACM Press, New York, 1999. University, 1997
[4] Agrawal , T. lmielinski and A. Swami “ Database mining: [11] C. Choi and Y. Park "R&D proposal screening system
A performance perspective”, IEEE Transactions on based on text-mining approach", Int. J. Technol. Intell.
knowledge and Data Eng. , vol. 5, no. 6. Plan., vol. 2, no. 1, pp.61 -72 2006
[5] M. E. Califf, editor. Papers from the Sixteenth National [12] H. C. Yang and C. H. Lee "A text mining approach for
Conference on Artificial Intelligence(AAAI- automatic construction of hypertexts", Expert Syst.
99) Workshop on Machine Learning for Information Appl., vol. 29, no. 4, pp.723 -734 2005
Extraction, Orlando, FL, 1999. AAAI Press.
[13] Agrawal R, Imielinski T and Swami A, “Mining
[6] M. E. Califf and R. J. Mooney, “ Relational learning association rules between sets of items in large
of pattern-match rules for information extraction” in database[M]”, Washington, DC: SIGMOD, 1993.207-
Proceedings of the 16th National Conference on 216.
Artificial Intelligence(AAAI-99), pages 328–
334, Orlando, FL, July 1999.
18