Abstract
Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mined using frequent pattern growth methodology. Higher level of performance improvement can be expected from parallel execution. In particular, PC cluster is gaining popularity as the high cost-performance parallel platform for data extensive task like data mining. However, we have to address many issues such as space distribution on each node and skew handling to efficiently mine frequent patterns from tree structure on a shared-nothing environment. We develop a framework to address those issues using novel granularity control mechanism and tree remerging. The common framework can be enhanced with temporal constrain to mine web access patterns. We invent improved support counting procedure to reduce the additional communication overhead. Real implementation using up to 32 nodes confirms that good speedup ratio can be achieved even on skewed environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of the ACM SIGMOD Conference on Management of Data (1993)
Agrawal, R., Shafer, J.C.: Parallel Mining of Associaton Rules. IEEE Transaction on Knowledge and Data Engineering 8(6), 962–969 (1996)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th Int. Conf. on VLDB, pp. 487–499 (September 1994)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. of International Conference of Data Engineering, pp. 3–14 (March 1995)
Han, J., Pei, J., Yin, Y.: Mining Frequent Pattern without Candidate Generation. In: Proc. of the ACM SIGMOD Conf. on Management of Data (2000)
Goda, K., Tamura, T., Oguci, M., Kitsuregawa, M.: Run-time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, p. 182. Springer, Heidelberg (2002)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and Resource- Aware Mining of Frequent Sets. In: Proc. of the Int. Conf. on Data Mining (2002)
Park, J.S., Chen, M.-S., Yu, P.S.: Efficient Parallel Algorithms for Mining Association Rules. In: Proc. of 4th Int. Conf. on Information and Knowledge Management (CIKM 1995), pp. 31–36 (November 1995)
Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805. Springer, Heidelberg (2000)
Shintani, T., Kitsuregawa, M.: Hash Based Parallel Algorithms for Mining Association Rules. In: IEEE Fourth Int. Conf. on Parallel and Distributed Information Systems, pp. 19–30 (December 1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pramudiono, I., Kitsuregawa, M. (2003). Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster. In: MaÅ™Ãk, V., Retschitzegger, W., Å tÄ›pánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_53
Download citation
DOI: https://doi.org/10.1007/978-3-540-45227-0_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive