Abstract
Data mining for spatial data has become increasingly important as more and more organizations are exposed to spatial data from sources such as remote sensing, geographical information systems, astronomy, computer cartography, environmental assessment and planning, etc. Recently, density based clustering methods, such as DENCLUE, DBSCAN, OPTICS, have been published and recognized as powerful clustering methods for data mining. These approaches have run time complexity of O(nlogn) when using spatial index techniques, R + tree and grid cell. However, these methods are known to lack scalability with respect to dimensionality. In this paper, a unique approach to efficient neighborhood search and a new efficient density based clustering algorithm using EIN-rings are developed. Our approach exploits compressed vertical data structures, Peano Trees (P-trees), and fast P-tree logical operations to accelerate the calculation of the density function within EIN-rings. This approach stands in contrast to the ubiquitous approach of vertically scanning horizontal data structures (records). The average run time complexity of our algorithm for spatial data in d-dimension is \(O(dn\sqrt{n})\). Our proposed method has comparable cardinality scalability with other density methods for small and medium size of data, but superior speed and dimensional scalability.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Perrizo, W.: Peano Count Tree Technology. Technical Report NDSU-CSOR-TR-01-1 (2001)
Khan, M., Ding, Q., Perrizo, W.: K-Nearest Neighbor Classification on Spatial Data Streams Using P-Trees. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 517–528. Springer, Heidelberg (2002)
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Proceeding 4th Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park (1998)
TIFF image data sets. Available at, http://midas-10cs.ndsu.nodak.edu/data/images/
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Density-Connected Sets and their Application for Trend Detection in Spatial Databases. In: Proceeding 3rd Int. Conf. On Knowledge Discovery and Data Mining, AAAI Press, Menlo Park (1997)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD, Portland, Oregon, pp. 226–231 (1996)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2, 169–194 (1998)
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering points to identify clustering structure. In: Proceedings of the ACM SIGMOD Conference, Philadelphia, PA, pp. 49–60 (1999)
Xu, X., Ester, M., Kriegel, H.-P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th ICDE, Orlando, FL, pp. 324–331 (1998)
Han, J., Kamber, M.: Data Mining. Morgan Kaufmann Publishers, San Francisco (2001)
Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: A survey. In: Miller, H., Han, J. (eds.) Geographic Data Mining and Knowledge Discovery, Taylor and Francis (2001)
Arya, S., Mount, D.M., Narayan, O.: Accounting for boundary effects in nearestneighbor searching. Discrete and Computational Gemetry, 155–176 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pan, F., Wang, B., Zhang, Y., Ren, D., Hu, X., Perrizo, W. (2003). Efficient Density Clustering Method for Spatial Data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive