Finding longest increasing and common subsequences in streaming data

Liben-Nowell, David; Vee, Erik; Zhu, An

doi:10.1007/s10878-006-7125-x

Finding longest increasing and common subsequences in streaming data

Published: March 2006

Volume 11, pages 155–175, (2006)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

227 Accesses
20 Citations
Explore all metrics

Abstract

We present algorithms and lower bounds for the Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS) problems in the data-streaming model. To decide if the LIS of a given stream of elements drawn from an alphabet αbet has length at least k, we discuss a one-pass algorithm using O(k log αbetsize) space, with update time either O(log k) or O(log log αbetsize); for αbetsize = O(1), we can achieve O(log k) space and constant-time updates. We also prove a lower bound of Ω(k) on the space requirement for this problem for general alphabets αbet, even when the input stream is a permutation of αbet. For finding the actual LIS, we give a ⌈log (1 + 1/ɛ)-pass algorithm using O(k^1+ɛlog αbetsize) space, for any ɛ > 0. For LCS, there is a trivial Θ(1)-approximate O(log n)-space streaming algorithm when αbetsize = O(1). For general alphabets αbet, the problem is much harder. We prove several lower bounds on the LCS problem, of which the strongest is the following: it is necessary to use Ω(n/ρ²) space to approximate the LCS of two n-element streams to within a factor of ρ, even if the streams are permutations of each other.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exact Algorithms for the Bounded Repetition Longest Common Subsequence Problem

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Tight Conditional Lower Bounds for Longest Common Increasing Subsequence

Article Open access 23 July 2018

References

Ajtai M, Jayram TS, Kumar R, Sivakumar D (2002) Approximate counting of inversions in a data stream. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 370–379
Alon N, Matias Y, Szegedy M (1999) The space complexity of approximating the frequency moments, Journal of Computer and System Sciences 58(1):137–147
Article MathSciNet Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool, Journal of Molecular Biology 215:403–410
Article Google Scholar
Apostolico A, Guerra C (1987) The longest common subsequence problem revisited, Algorithmica 2:315–336
Article MathSciNet Google Scholar
Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequence. In: SIAM International Conference on Data Mining Workshop on Web Mining
Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D (2004) An information statistics approach to data stream and communication complexity, Journal of Computer and System Sciences 68(4):702–732
Article MathSciNet Google Scholar
Bender MA, Cole R, Demaine ED, Farach-Colton M (2002) Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In: Proceedings of the European Symposium on Algorithms (ESA) pp. 139–151
Bespamyatnikh S, Segal M (2000) Enumerating longest increasing subsequences and patience sorting, Information Processing Letters 76(1–2):7–11
Google Scholar
Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams, Theoretical Computer Science 312(1):3–15
Article MathSciNet Google Scholar
Cormen T, Leiserson C, Rivest R, Stein C (2002) Introduction to Algorithms, 2nd edition. McGraw-Hill
Cormode G, Muthukrishnan S (2006) What's new: Finding significant differences in network data streams. Transactions on Networking
Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes, Nucleic Acids Research 27(11):2369–2376
Article Google Scholar
Demaine ED, López-Ortiz A, Ian Munro J (2002) Frequency estimation of internet packet streams with limited space. In: Proceedings of the European Symposium on Algorithms (ESA), pp. 348–360
Erdős P, Szekeres, G (1935) A combinatorial problem in geometry, Compositio Mathematica 463–470
Farach-Colton M, Ferragina P, Muthukrishnan S (1998) Overcoming the memory bottleneck in suffix tree construction. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 174–185
Feigenbaum J, Kannan S, Strauss M, Viswanathan M (2002) An approximate $L_1$-difference algorithm for massive data streams, SIAM Journal on Computing 32(1):131–151
Article MathSciNet Google Scholar
Fong JH, Strauss M (2001) An approximate $L_p$-difference algorithm for massive data streams, Discrete Mathematics & Theoretical Computer Science 4(2):301–322
MathSciNet Google Scholar
Fredman ML (1975) On computing the length of longest increasing subsequences, Discrete Mathematics 11:29–35
Article MATH MathSciNet Google Scholar
Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M (2002) Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 389–398
Guha S, Koudas N, Shim K (2001) Data-streams and histograms. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 471–475
Guha S, Mishra N, Motwani R, O'Callaghan L (2000) Clustering data streams. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 359–366
Henzinger MR, Raghavan P, Rajagopalan S (1998) Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center
Hirschberg DS (1977) Algorithms for the longest common subsequence problem, Journal of the ACM 24:644–675
Article MathSciNet Google Scholar
Hunt J, Szymanski T (1977) A fast algorithm for computing longest common subsequences, Communications of the ACM 20:350–353
Article MathSciNet Google Scholar
Indyk P (2000) Stable distributions, pseudorandom generators, embeddings, and data stream computations. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 189–197
Kalyanasundaram B, Schnitger G (1992) The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics 5(5):545–557
Article MathSciNet Google Scholar
Manku G, Rajagopalan S, Lindsay B (1998) Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 426–435
Razborov A (1984) On the distributional complexity of disjointness. Journal of Computer and System Sciences 28(2)
Saks ME, Sun X (2002) Space lower bounds for distance approximation in the data stream model. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 360–369
Sankoff D, Kruskal J (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley
Schensted C (1961) Longest increasing and decreasing subsequences, Canadian Journal of Mathematics 13:179–191
MATH MathSciNet Google Scholar
van Emde Boas P (1977) Preserving order in a forest in less than logarithmic time and linear space, Information Processing Letters 6(3):80–82
Article MATH Google Scholar
Willard DE (August 1983) Log-logarithmic worst-case range queries are possible in space Θ N, Information Processing Letters 17(2):81–84
Article MATH MathSciNet Google Scholar
Zhang H (2003) Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm, Bioinformatics 19(11):1391–1396
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Carleton College, USA
David Liben-Nowell
IBM, Almaden Research Center, New York, USA
Erik Vee
Google, Inc., USA
An Zhu

Authors

David Liben-Nowell
View author publications
You can also search for this author in PubMed Google Scholar
Erik Vee
View author publications
You can also search for this author in PubMed Google Scholar
An Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Liben-Nowell.

Additional information

A preliminary version of this paper appears in the Proceedings of the 11th International Computing and Combinatorics Conference (COCOON'05), August 2005, pp. 263–272.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liben-Nowell, D., Vee, E. & Zhu, A. Finding longest increasing and common subsequences in streaming data. J Comb Optim 11, 155–175 (2006). https://doi.org/10.1007/s10878-006-7125-x

Download citation

Received: 25 September 2005
Accepted: 25 December 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10878-006-7125-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding longest increasing and common subsequences in streaming data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exact Algorithms for the Bounded Repetition Longest Common Subsequence Problem

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Tight Conditional Lower Bounds for Longest Common Increasing Subsequence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Finding longest increasing and common subsequences in streaming data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exact Algorithms for the Bounded Repetition Longest Common Subsequence Problem

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Tight Conditional Lower Bounds for Longest Common Increasing Subsequence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.