Skip to main content
Log in

Finding longest increasing and common subsequences in streaming data

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

We present algorithms and lower bounds for the Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS) problems in the data-streaming model. To decide if the LIS of a given stream of elements drawn from an alphabet αbet has length at least k, we discuss a one-pass algorithm using O(k log αbetsize) space, with update time either O(log k) or O(log log αbetsize); for αbetsize = O(1), we can achieve O(log k) space and constant-time updates. We also prove a lower bound of Ω(k) on the space requirement for this problem for general alphabets αbet, even when the input stream is a permutation of αbet. For finding the actual LIS, we give a ⌈log (1 + 1/ɛ)-pass algorithm using O(k1+ɛlog αbetsize) space, for any ɛ > 0. For LCS, there is a trivial Θ(1)-approximate O(log n)-space streaming algorithm when αbetsize = O(1). For general alphabets αbet, the problem is much harder. We prove several lower bounds on the LCS problem, of which the strongest is the following: it is necessary to use Ω(n2) space to approximate the LCS of two n-element streams to within a factor of ρ, even if the streams are permutations of each other.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ajtai M, Jayram TS, Kumar R, Sivakumar D (2002) Approximate counting of inversions in a data stream. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 370–379

  • Alon N, Matias Y, Szegedy M (1999) The space complexity of approximating the frequency moments, Journal of Computer and System Sciences 58(1):137–147

    Article  MathSciNet  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool, Journal of Molecular Biology 215:403–410

    Article  Google Scholar 

  • Apostolico A, Guerra C (1987) The longest common subsequence problem revisited, Algorithmica 2:315–336

    Article  MathSciNet  Google Scholar 

  • Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequence. In: SIAM International Conference on Data Mining Workshop on Web Mining

  • Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D (2004) An information statistics approach to data stream and communication complexity, Journal of Computer and System Sciences 68(4):702–732

    Article  MathSciNet  Google Scholar 

  • Bender MA, Cole R, Demaine ED, Farach-Colton M (2002) Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In: Proceedings of the European Symposium on Algorithms (ESA) pp. 139–151

  • Bespamyatnikh S, Segal M (2000) Enumerating longest increasing subsequences and patience sorting, Information Processing Letters 76(1–2):7–11

    Google Scholar 

  • Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams, Theoretical Computer Science 312(1):3–15

    Article  MathSciNet  Google Scholar 

  • Cormen T, Leiserson C, Rivest R, Stein C (2002) Introduction to Algorithms, 2nd edition. McGraw-Hill

  • Cormode G, Muthukrishnan S (2006) What's new: Finding significant differences in network data streams. Transactions on Networking

  • Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes, Nucleic Acids Research 27(11):2369–2376

    Article  Google Scholar 

  • Demaine ED, López-Ortiz A, Ian Munro J (2002) Frequency estimation of internet packet streams with limited space. In: Proceedings of the European Symposium on Algorithms (ESA), pp. 348–360

  • Erdős P, Szekeres, G (1935) A combinatorial problem in geometry, Compositio Mathematica 463–470

  • Farach-Colton M, Ferragina P, Muthukrishnan S (1998) Overcoming the memory bottleneck in suffix tree construction. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 174–185

  • Feigenbaum J, Kannan S, Strauss M, Viswanathan M (2002) An approximate $L_1$-difference algorithm for massive data streams, SIAM Journal on Computing 32(1):131–151

    Article  MathSciNet  Google Scholar 

  • Fong JH, Strauss M (2001) An approximate $L_p$-difference algorithm for massive data streams, Discrete Mathematics & Theoretical Computer Science 4(2):301–322

    MathSciNet  Google Scholar 

  • Fredman ML (1975) On computing the length of longest increasing subsequences, Discrete Mathematics 11:29–35

    Article  MATH  MathSciNet  Google Scholar 

  • Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M (2002) Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 389–398

  • Guha S, Koudas N, Shim K (2001) Data-streams and histograms. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 471–475

  • Guha S, Mishra N, Motwani R, O'Callaghan L (2000) Clustering data streams. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 359–366

  • Henzinger MR, Raghavan P, Rajagopalan S (1998) Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center

  • Hirschberg DS (1977) Algorithms for the longest common subsequence problem, Journal of the ACM 24:644–675

    Article  MathSciNet  Google Scholar 

  • Hunt J, Szymanski T (1977) A fast algorithm for computing longest common subsequences, Communications of the ACM 20:350–353

    Article  MathSciNet  Google Scholar 

  • Indyk P (2000) Stable distributions, pseudorandom generators, embeddings, and data stream computations. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 189–197

  • Kalyanasundaram B, Schnitger G (1992) The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics 5(5):545–557

    Article  MathSciNet  Google Scholar 

  • Manku G, Rajagopalan S, Lindsay B (1998) Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 426–435

  • Razborov A (1984) On the distributional complexity of disjointness. Journal of Computer and System Sciences 28(2)

  • Saks ME, Sun X (2002) Space lower bounds for distance approximation in the data stream model. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 360–369

  • Sankoff D, Kruskal J (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley

  • Schensted C (1961) Longest increasing and decreasing subsequences, Canadian Journal of Mathematics 13:179–191

    MATH  MathSciNet  Google Scholar 

  • van Emde Boas P (1977) Preserving order in a forest in less than logarithmic time and linear space, Information Processing Letters 6(3):80–82

    Article  MATH  Google Scholar 

  • Willard DE (August 1983) Log-logarithmic worst-case range queries are possible in space Θ N, Information Processing Letters 17(2):81–84

    Article  MATH  MathSciNet  Google Scholar 

  • Zhang H (2003) Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm, Bioinformatics 19(11):1391–1396

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Liben-Nowell.

Additional information

A preliminary version of this paper appears in the Proceedings of the 11th International Computing and Combinatorics Conference (COCOON'05), August 2005, pp. 263–272.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liben-Nowell, D., Vee, E. & Zhu, A. Finding longest increasing and common subsequences in streaming data. J Comb Optim 11, 155–175 (2006). https://doi.org/10.1007/s10878-006-7125-x

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-006-7125-x

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy