Abstract
Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declustering using replication has been proposed to reduce the additive overhead. Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval mechanism for arbitrary queries based on design theory. Using the proposed c-copy replicated declustering scheme, \((c-1)k^{2}+ck\) buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with \(\Theta(|Q|)\) complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high dimensions, supports incremental declustering and has good fault-tolerance property. Experimental results show the feasibility of the algorithm.
Similar content being viewed by others
References
K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal allocation of two-dimensional data,” in ICDT, Delphi, Greece, Jan. 1997, pp. 409–418.
M.J. Atallah and S. Prabhakar, “(Almost) optimal parallel block access for range queries,” in Proc. ACM PODS, Dallas, Texas, May 2000, pp. 205–215.
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R* tree: An efficient and robust access method for points and rectangles,” in Proc. ACM SIGMOD, May 1990, pp. 322–331.
S. Berchtold, C. Bohm, B. Braunmuller, D.A. Keim, and H.-P. Kriegel, “Fast parallel similarity search in multimedia databases,” in Proc. ACM SIGMOD, Arizona, U.S.A., 1997, pp. 1–12.
R. Bhatia, R.K. Sinha, and C.-M. Chen, “Hierarchical declustering schemes for range queries,” in EDBT 2000, Konstanz, Germany, March 2000, pp. 525–537.
C.-M. Chen, R. Bhatia, and R. Sinha, “Declustering using golden ratio sequences,” in ICDE, San Diego, California, Feb. 2000, pp. 271–280.
C.-M. Chen and C.T. Cheng, “From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries,” in Proc. ACM PODS, Wisconsin, Madison, 2002, pp. 29–38.
C.-M. Chen and C. Cheng. “Replication and retrieval strategies of multidimensional data on parallel disks,” in CIKM, Oct. 2003.
L.T. Chen and D. Rotem, “Optimal response time retrieval of replicated data,” in Proc. ACM PODS, 1994.
P. Ciaccia and A. Veronesi, “Dynamic declustering methods for parallel grid files,” in Proceedings of Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O, Berlin, Germany, Sept. 1996, pp. 110–123.
C.J. Colbourn and J.H. Dinitz (Eds.), Handbook of Combinatorial Designs, CRC Press, 1996.
H.C. Du and J.S. Sobolewski, “Disk allocation for cartesian product files on multiple-disk systems,” ACM Trans. on Database Systems, vol. 7, no. 1, pp. 82–101, 1982.
C. Faloutsos and P. Bhagwat, “Declustering using fractals,” in Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, San Diego, CA, Jan. 1993, pp. 18–25.
C. Faloutsos and D. Metaxas, “Declustering using error correcting codes,” in Proc. ACM PODS, 1989, pp. 253–258.
H. Ferhatosmanoglu, D. Agrawal, and A. El Abbadi, “Concentric hyperspaces and disk allocation for fast parallel range searching,” in Proc. ICDE, Sydney, Australia, March 1999, pp. 608–615.
H. Ferhatosmanoglu, A.Ş. Tosun, and A. Ramachandran, “Replicated declustering of spatial data,” in 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2004.
V. Gaede and O. Gunther, “Multidimensional access methods,” ACM Computing Surveys, vol. 30, pp. 170–231, 1998.
S. Ghandeharizadeh and D.J. DeWitt, “Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines,” in VLDB, Aug. 1990, pp. 481–492.
S. Ghandeharizadeh and D.J. DeWitt, “A performance analysis of alternative multi-attribute declustering strategies,” in Proc. ACM SIGMOD, 1992, pp. 29–38.
J. Gray, B. Horst, and M. Walker, “Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput,” in Proc. VLDB, Washington, DC, Aug. 1990, pp. 148–161.
A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proc. ACM SIGMOD, 1984, pp. 47–57.
M. Holland and G. Gibson, “Parity declustering for continuous operation in redundant disk arrays,” in 5th Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, pp. 23–35.
K.A. Hua and H.C. Young, “A general multidimensional data allocation method for multicomputer database systems,” in Database and Expert System Applications, Toulouse, France, Sept. 1997, pp. 401–409.
M.H. Kim and S. Pramanik, “Optimal file distribution for partial match retrieval,” in Proc. ACM SIGMOD, Chicago, 1988, pp. 173–182.
J. Li, J. Srivastava, and D. Rotem, “CMD: A multidimensional declustering method for parallel database systems,” in Proc. VLDB, Vancouver, Canada, Aug. 1992, pp. 3–14.
B. Moon, A. Acharya, and J. Saltz, “Study of scalable declustering algorithms for parallel grid files,” in Proc. of the Parallel Processing Symposium, April 1996.
S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi, “Cyclic allocation of two-dimensional data,” in ICDE, Orlando, Florida, 1998, pp. 94–101.
S. Prabhakar, D. Agrawal, and A. El Abbadi, “Efficient disk allocation for fast similarity searching,” in SPAA’98, Mexico, June 1998, pp. 78–87.
H. Samet, The Design and Analysis of Spatial Structures, Addison Wesley, Massachusetts, 1989.
P. Sanders, S. Egner, and J. Korst, “Fast concurrent access to parallel disks,” in ACM-SIAM Symposium on Discrete Algorithms, 2000.
R.K. Sinha, R. Bhatia, and C.-M. Chen, “Asymptotically optimal declustering schemes for range queries,” in 8th International Conference on Database Theory, Lecture Notes in Computer Science, London, UK, Jan. 2001, pp. 144–158 (Springer).
A.S. Tosun and H. Ferhatosmanoglu, “Optimal parallel I/O using replication,” in Proceedings of International Workshops on Parallel Processing (ICPP), Vancouver, Canada, Aug. 2002.
A.Ş. Tosun, “Replicated declustering for arbitrary queries,” in 19th ACM Symposium on Applied Computing, March 2004.
A.Ş. Tosun, “Constrained declustering,” in International Conference on Information Technology Coding and Computing, April 2005.
A.Ş. Tosun, “Threshold based declustering in high dimensions,” in International Conference on Database and Expert Systems Applications, Aug. 2005.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by: Sunil Prabhakar
Rights and permissions
About this article
Cite this article
Tosun, A.Ş. Efficient retrieval of replicated data. Distrib Parallel Databases 19, 107–124 (2006). https://doi.org/10.1007/s10619-006-8484-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-006-8484-0