Abstract
We present algorithms and lower bounds for the Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS) problems in the data-streaming model. To decide if the LIS of a given stream of elements drawn from an alphabet αbet has length at least k, we discuss a one-pass algorithm using O(k log αbetsize) space, with update time either O(log k) or O(log log αbetsize); for αbetsize = O(1), we can achieve O(log k) space and constant-time updates. We also prove a lower bound of Ω(k) on the space requirement for this problem for general alphabets αbet, even when the input stream is a permutation of αbet. For finding the actual LIS, we give a ⌈log (1 + 1/ɛ)-pass algorithm using O(k1+ɛlog αbetsize) space, for any ɛ > 0. For LCS, there is a trivial Θ(1)-approximate O(log n)-space streaming algorithm when αbetsize = O(1). For general alphabets αbet, the problem is much harder. We prove several lower bounds on the LCS problem, of which the strongest is the following: it is necessary to use Ω(n/ρ2) space to approximate the LCS of two n-element streams to within a factor of ρ, even if the streams are permutations of each other.
Similar content being viewed by others
References
Ajtai M, Jayram TS, Kumar R, Sivakumar D (2002) Approximate counting of inversions in a data stream. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 370–379
Alon N, Matias Y, Szegedy M (1999) The space complexity of approximating the frequency moments, Journal of Computer and System Sciences 58(1):137–147
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool, Journal of Molecular Biology 215:403–410
Apostolico A, Guerra C (1987) The longest common subsequence problem revisited, Algorithmica 2:315–336
Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequence. In: SIAM International Conference on Data Mining Workshop on Web Mining
Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D (2004) An information statistics approach to data stream and communication complexity, Journal of Computer and System Sciences 68(4):702–732
Bender MA, Cole R, Demaine ED, Farach-Colton M (2002) Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In: Proceedings of the European Symposium on Algorithms (ESA) pp. 139–151
Bespamyatnikh S, Segal M (2000) Enumerating longest increasing subsequences and patience sorting, Information Processing Letters 76(1–2):7–11
Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams, Theoretical Computer Science 312(1):3–15
Cormen T, Leiserson C, Rivest R, Stein C (2002) Introduction to Algorithms, 2nd edition. McGraw-Hill
Cormode G, Muthukrishnan S (2006) What's new: Finding significant differences in network data streams. Transactions on Networking
Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes, Nucleic Acids Research 27(11):2369–2376
Demaine ED, López-Ortiz A, Ian Munro J (2002) Frequency estimation of internet packet streams with limited space. In: Proceedings of the European Symposium on Algorithms (ESA), pp. 348–360
Erdős P, Szekeres, G (1935) A combinatorial problem in geometry, Compositio Mathematica 463–470
Farach-Colton M, Ferragina P, Muthukrishnan S (1998) Overcoming the memory bottleneck in suffix tree construction. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 174–185
Feigenbaum J, Kannan S, Strauss M, Viswanathan M (2002) An approximate $L_1$-difference algorithm for massive data streams, SIAM Journal on Computing 32(1):131–151
Fong JH, Strauss M (2001) An approximate $L_p$-difference algorithm for massive data streams, Discrete Mathematics & Theoretical Computer Science 4(2):301–322
Fredman ML (1975) On computing the length of longest increasing subsequences, Discrete Mathematics 11:29–35
Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M (2002) Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 389–398
Guha S, Koudas N, Shim K (2001) Data-streams and histograms. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 471–475
Guha S, Mishra N, Motwani R, O'Callaghan L (2000) Clustering data streams. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 359–366
Henzinger MR, Raghavan P, Rajagopalan S (1998) Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation, Systems Research Center
Hirschberg DS (1977) Algorithms for the longest common subsequence problem, Journal of the ACM 24:644–675
Hunt J, Szymanski T (1977) A fast algorithm for computing longest common subsequences, Communications of the ACM 20:350–353
Indyk P (2000) Stable distributions, pseudorandom generators, embeddings, and data stream computations. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pp. 189–197
Kalyanasundaram B, Schnitger G (1992) The probabilistic communication complexity of set intersection. SIAM Journal on Discrete Mathematics 5(5):545–557
Manku G, Rajagopalan S, Lindsay B (1998) Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 426–435
Razborov A (1984) On the distributional complexity of disjointness. Journal of Computer and System Sciences 28(2)
Saks ME, Sun X (2002) Space lower bounds for distance approximation in the data stream model. In: Proceedings of the ACM Symposium on Theory of Computing (STOC), pp. 360–369
Sankoff D, Kruskal J (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley
Schensted C (1961) Longest increasing and decreasing subsequences, Canadian Journal of Mathematics 13:179–191
van Emde Boas P (1977) Preserving order in a forest in less than logarithmic time and linear space, Information Processing Letters 6(3):80–82
Willard DE (August 1983) Log-logarithmic worst-case range queries are possible in space Θ N, Information Processing Letters 17(2):81–84
Zhang H (2003) Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm, Bioinformatics 19(11):1391–1396
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appears in the Proceedings of the 11th International Computing and Combinatorics Conference (COCOON'05), August 2005, pp. 263–272.
Rights and permissions
About this article
Cite this article
Liben-Nowell, D., Vee, E. & Zhu, A. Finding longest increasing and common subsequences in streaming data. J Comb Optim 11, 155–175 (2006). https://doi.org/10.1007/s10878-006-7125-x
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10878-006-7125-x