An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847
Pursuing M.Tech in CSE at Vignan's LARA Institute Of Technology and Science, Vadlamudi, Guntur Dist., A.P., India. Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India.
ABSTRACT
As the trends in the technology developing data mining turns to the advanced aspects. This paper explains about the item set mining. Frequent item sets are the one occurring randomly while mining the transactional data base. Utility based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in data mining tasks. Advanced area in this field is the fast utility mining process which gives accurate results. Frequent Utility Frequent Mining(FUFM) is the new algorithm introduced here to retrieve the item sets fast from transactional database. The main aim in this paper is to retrieve the frequent utility itemsets and cluster those item sets with keyword or by number assignment. The results will be displayed without any loss of data.
Keywords: Frequent Utility Frequent Mining(FUFM), Umining, Knowledge Discovery in Databases (KDD) ,UP growth.
1. INTRODUCTION
Data mining and knowledge discovery from data bases has received much attention in recent years. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, previously unknown and potentially useful patterns in data. These patterns are used to make predictions or classifications about new data, explain existing data, summarize the contents of a large database to support decision making and provide graphical data visualization to aid humans in discovering deeper patterns. The main aim in this paper is identifying and grouping the frequently used item sets from the transactional database. While in the auditing process of data base, the items which are purchased or collected frequently and clustering the frequent items displays as the better mining process.
2. BACKGROUND WORK
KDD: The KDD process comprises of a few steps leading from raw data to some form of new knowledge. The volume of data contained in a database often exceeds the ability to analyze it efficiently, resulting in a gap between the collection of data and its understanding. A new concept is proposed for generating different kinds of itemsets namely High utility and high frequent itemsets (HUHF), High utility and low frequent itemsets (HULF), Low utility and high frequent itemsets (LUHF) and Low utility and low frequent itemsets (LULF). These itemsets are generated using the basic framework of FUM and FUFM algorithms. Customer Relationship Management (CRM) is incorporated into the system by generating a list of customers who are frequent buyers of these four different kind of itemsets.
Page 81
c. LUHF: To generate Low utility and high frequent itemsets. It follows the basic frame work of FUFM algorithm. d. LULF: Low utility and low frequent First phase using exhaustive search low utility itemsets are determined. Second phase, using set difference function low utility low frequent itemsets are generated from LU and LUHF.
4. ALGORITHM FUFM
Task: Discovery of Utility Frequent Itemsets Input Database DB Constraints minUtil and minSup Output High Utility High Frequent itemsets (HUHF) [1] L = 1 [2] Find the set of candidates of length L with support >= minSup [3] Compute extended support for all candidates and output utility frequent itemsets [4] L += 1 [5] Use the frequent itemset mining algorithm to obtain new set of frequent candidates of length L from the old set of frequent candidates [6] Stop if the new set is empty otherwise go to step[3] Algorithm Working process The above steps proved success in finding the frequently occurred high utility itemsets. This is completely based on the threshold value which we assumed. Each and every stage is compared with the assumed value. Initial step here is to assigning the length of the candidate and comparing the value with the minimum support value. If it is greater than or equal to the minimum support, length of the set of candidates are displayed. Next step is for calculating the frequently occurred item sets and arranging the item sets into ascending order. With the use of frequent item set mining algorithm we get the frequent candidates of length L from the old set of frequent itemsets. While in the rotation of this process if we occur a new set with empty then stop the performance if not repeat the calculation process again and again until it get for empty set. Then proceed to stop the process and note the results occurred.
The above diagram depicts the complete chain process of calculating and displaying the frequent itemsets. In this comparing with threshold value gives the frequent utility item sets as the results.
Page 82
Step 3 Is opening four different mining algorithms. 1. HUHF- High Utility High Frequent Mining
Page 83
To view customer details press customer detail button 4. LULF- Low Utility Low Frequent Mining
7. CONCLUSION
The UMining and FUM algorithms are for mining all high utility item sets. FUFM and FUM-F algorithms use both the statistical and the utility measures. From the basic framework of these algorithms the different kinds of item sets namely high utility high frequent, high utility low frequent, low utility high frequent and low utility low frequent are generated. Then Customer Relationship Management (CRM) is incorporated into the system by tracking the customers who are frequent buyers of the different kinds of item sets.
REFERENCES
[1] A. Erwin, R. P. Gopalan and N. R. Achuthan, Efficient mining of high utility itemsets from large datasets, in Proc. of PAKDD 2008, LNAI 5012, pp. 554-561 [2] H. F. Li, H. Y. Huang, Y. C. Chen, Y. J. Liu and S. Y. Lee, Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams, in Proc. of the 8th IEEE Int'l Conf. on Data Mining, pp. 881-886, 2008. [3] Y. Liu, W. Liao and A. Choudhary, A fast high utility itemsets mining algorithm, in Proc. of the Utility-Based Data Mining Workshop, 2005. [4] R. Agrawal and R. Srikant. Fast algorithms for mining association rules, in Proc. of the 20th VLDB Conf., pp. 487-499, 1994 [5] R. Agrawal and R. Srikant, Mining Sequential Patterns, in Proc. of the 11th Intl Conference on Data Engineering, pp. 3-14, Mar., 1995. [6] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong and Y.-K. Lee. Efficient tree structures for high utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, Issue 12, pp.1708-1721, 2009. [7] Nazeer shaik, B. Renuka Devi, N L Prasanna, V.Satish kumar An Algorithm Used For Mining Frequent Pattern Sets From Very Large Databases in the international conference. [8] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, Proc. 1994 Intl Conf. Very Large Data Bases (VLDB 94), pp. 487-499, Sept. 1994. [9] D. Burdick, M. Calimlim, and J. Gehrke, MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases, Proc.2001 Intl Conf. Data Eng. (ICDE 01), pp. 443-452, Apr. 2001.
Page 84
AUTHOR PROFILE
Nazeer.Shaik, pursuing M.Tech in Computer Science Engineering at Vignan's LARA Institute Of Technology and Science, Vadlamudi, Guntur Dist., A.P., India. His research interests are Image Processing, Pattern Recognition and Data Mining. E-mail id: nazeer723@gmail.com.
N.L.Prasanna, Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India. Her research interests are Data Mining Data Warehousing and Image Processing. Email id: prasanna.manu@gmail.com.
Page 85