A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Lam, Tak-Wah; Sadakane, Kunihiko; Sung, Wing-Kin; Yiu, Siu-Ming

doi:10.1007/3-540-45655-4_43

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

International Computing and Combinatorics Conference

665 Accesses

Abstract

With the first Human DNA being decoded into a sequence of about 2.8 billion base pairs, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from text. The main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before, i.e., O(n log n).

This research was supported in part by NUS Academic Research Grant R-252-000-119-112

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections

Article Open access 22 September 2020

Faster Repetition-Aware Compressed Suffix Trees Based on Block Trees

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Article 22 October 2019

References

D. R. Clark and J. I. Munro. Efficient suffix trees on secondary storage. In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 383–391. 1996.
Google Scholar
Altschul S. F., Gish W., Miller W., Myers E. W., and Lipman D. J. Basic locol alignment search tool. Journal of Molecular Biology, pages 403–410, 1990.
Google Scholar
P. Elias. Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.
Article MATH MathSciNet Google Scholar
P. Ferragine and G. Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 390–398. 2000.
Google Scholar
R. Grossi and J.S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pages 397–406, 2000.
Google Scholar
E. Hunt, M. P. Atkinson, and R. W. Irving. A database index to large biological sequences. In Proceedings of the 27th VLDB Conference, pages 410–421. 2000.
Google Scholar
S. Kurtz. Reducing the space requirement of suffix trees. Software Practice and Experiences, 29:1149–1171, 1999.
Article Google Scholar
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.
Article MATH MathSciNet Google Scholar
E. M. MCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.
Article Google Scholar
K. Sadakane. Compressed text databases with efficient query algorithms based on compressed suffix array. In Proceedings of the 11th International Conference on Algorithms and Computation (ISAAC), pages 410–421. 2000.
Google Scholar
K. Sadakane and T. Shibyya. Indexing huge genome sequences for solving various porblems. In Genome Informatics, pages 175–183. 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Hong Kong, Hong Kong
Tak-Wah Lam & Siu-Ming Yiu
Department of System Information Sciences Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Kunihiko Sadakane
Department of Computer Science, National University of Singapore, Singapore
Wing-Kin Sung

Authors

Tak-Wah Lam
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Sadakane
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar
Siu-Ming Yiu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Santa Barbara, California, 93106, USA
Oscar H. Ibarra
Department of Mathematics, National University of Singapore, Singapore, Singapore, 117543
Louxin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, TW., Sadakane, K., Sung, WK., Yiu, SM. (2002). A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_43

Download citation

DOI: https://doi.org/10.1007/3-540-45655-4_43
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections

Faster Repetition-Aware Compressed Suffix Trees Based on Block Trees

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections

Faster Repetition-Aware Compressed Suffix Trees Based on Block Trees

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.