Improved Method For Pattern Discovery in Text Mining
Improved Method For Pattern Discovery in Text Mining
Abstract
Digital data in the form of text documents is rapidly growing. Analyzing such data manually is a tedious task. Data mining techniques
have been around to analyze such data and bring about interesting patterns. Many existing methods are based on term-based
approaches that cant deal with synonymy and polysemy. Moreover they lack the ability in using and updating the discovered patterns.
Zhong et al. proposed an effective pattern discovery technique. It discovers patterns and then computes specificities of patterns for
evaluating term weights as per their distribution in the discovered patterns. It also takes care of updating patterns that exhibit
ambiguity which is a feature known as pattern evolution. In this paper we implemented that technique and also built a prototype
application to test the efficiency of the technique. The empirical results revealed that the solution is very useful in text mining domain.
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
574
2. PRIOR WORK
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
575
4. PROTOTYPE IMPLEMENTAITON
The pattern discovery technique proposed by Zhong et al. [10]
has been implemented by us using Java programming
language. The environment used for the implementation
include a PC with 4GB RAM, Core 2 Dual processor.
Operating system used is Windows and the IDE is Net Beans.
Java SWING API is used to build GUI (Graphical User
Interface). The main UI of the application is as shown in fig.
2.
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
576
CONCLUSIONS
Data mining techniques have been around for long time. The
techniques used to discover knowledge include sequential
pattern mining, frequent item set mining, closed pattern
mining and maximum pattern mining. These data mining
techniques are not useful for text mining. This is due to lack of
high specificity of discovered patterns. Not all frequent
patterns discovered by mining algorithm are useful. Moreover
then can be misinterpreted to make the problem worse. To
overcome the problems of misinterpretation and low
frequency, we proposed an effective pattern discovery. Pattern
deploying and evolving are the two parts in the proposed
technique. The empirical results revealed that the proposed
technique is effective.
REFERENCES
[1] Y. Li, C. Zhang, and J.R. Swan, An Information Filtering
Model on the Web and Its Application in Jobagent,
Knowledge-Based Systems, vol. 13, no. 5, pp. 285-296, 2000.
[2]S. Robertson and I. Soboroff, The Trec 2002 Filtering
Track
Report,
TREC,
2002,
trec.nist.gov/pubs/trec11/papers/OVER. FILTERING.ps.gz
[3] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information
Retrieval. Addison Wesley, 1999.
[4] D.D. Lewis, Feature Selection and Feature Extraction for
Text Categorization, Proc. Workshop Speech and Natural
Language, pp. 212-217, 1992.
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
577
Knowledge and Data Eng., vol. 18, no. 4, pp. 554-568, Apr.
2006.
[22] Y. Huang and S. Lin, Mining Sequential Patterns Using
Graph Search Techniques, Proc. 27th Ann. Intl Computer
Software and Applications Conf., pp. 4-9, 2003
[23] M. Seno and G. Karypis, Slpminer: An Algorithm for
Finding Frequent Sequential Patterns Using LengthDecreasing Support Constraint, Proc. IEEE Second Intl
Conf. Data Mining (ICDM 02), pp. 418-425, 2002.
[24] M. Zaki, Spade: An Efficient Algorithm for Mining
Frequent Sequences, Machine Learning, vol. 40, pp. 31-60,
2001.
[25] Y. Li, W. Yang, and Y. Xu, Multi-Tier Granule Mining
for Representations of Multidimensional Association Rules,
Proc. IEEE Sixth Intl Conf. Data Mining (ICDM 06), pp.
953-958, 2006.
[26] Y. Xu and Y. Li, Generating Concise Association Rules,
Proc. ACM 16th Conf. Information and Knowledge
Management (CIKM 07), pp. 781-790, 2007
[27] Y. Li, X. Zhou, P. Bruza, Y. Xu, and R.Y. Lau, A TwoStage Text Mining Model for Information Filtering, Proc.
ACM 17th Conf. Information and Knowledge Management
(CIKM 08), pp. 1023-1032, 2008.
[28] S. Shehata, F. Karray, and M. Kamel, Enhancing Text
Clustering Using Concept-Based Mining Model, Proc. IEEE
Sixth Intl Conf. Data Mining (ICDM 06), pp. 1043-1048,
2006
[29] S. Shehata, F. Karray, and M. Kamel, A Concept-Based
Model for Enhancing Text Categorization, Proc. 13th Intl
Conf. Knowledge Discovery and Data Mining (KDD 07), pp.
629-637, 2007
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org
578