Abstract
There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kalinin, A., Cetintemel, U., Zdonik, S.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)
Orang, M., Shiri, N.: Improving performance of similarity measures for uncertain time series using preprocessing techniques. In: ACM SSDBM, pp. 31:1–31:12 (2015)
Zacharatou, E.T., Tauheedz, F., Heinis, T., Ailamaki, A.: RUBIK: efficient threshold queries on massive time series. In: ACM SSDBM, pp. 18:1–18:12 (2015)
Lee, D., Sim, A., Choi, J., Wu, K.: Novel data reduction based on statistical similarity. In: ACM SSDBM, pp. 21:1–21:12 (2016)
Shafer, I., Ren, K., Boddeti, V.N., Abe, Y., Ganger, G.R., Faloutsos, C.: RainMon: an integrated approach to mining bursty timeseries monitoring data. In: ACM SIGKDD, pp. 1158–1166 (2012)
Zhu, Y., Shasha, D.: StatStream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: ACM SIGKDD, pp. 743–749 (2005)
Jankov, D., Sikdar, S., Mukherjee, R., Teymourian, K., Jermaine, C.: Real-time high performance anomaly detection over data streams: grand challenge. In: ACM DEBS, pp. 292–297 (2017)
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: ACM SIGMOD, pp. 1555–1566 (2014)
Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: ACM SIGMOD, pp. 277–281 (2015)
Feng, K., Cong, G., Bhowmick, S.S., Peng, W.C., Miao, C.: Towards best region search for data exploration. In: ACM SIGMOD, pp. 1055–1070 (2016)
Petrov, D., Alseghayer, R., Sharaf, M., Chrysanthis, P.K., Labrinidis, A.: Interactive exploration of correlated time series. In: ACM ExploreDB, pp. 2:1–2:6 (2017)
Alseghayer, R., Petrov, D., Chrysanthis, P.K., Sharaf, M., Labrinidis, A.: Detection of highly correlated live data streams. In: BIRTE, pp. 3:1–3:8 (2017)
Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through group lag correlations. In: ACM SIGMOD, pp. 599–610 (2005)
Yahoo Inc.: Yahoo finance historical data (2016)
Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: ACM SIGMOD, pp. 505–516 (2014)
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: ACM SIGMOD, pp. 171–182 (2010)
Mirylenka, K., Dallachiesa, M., Palpanas, T.: Data series similarity using correlation-aware measures. In: ACM SSDBM, pp. 11:1–11:12 (2017)
Acknowledgment
This paper was partially supported by NSF under award CBET-1609120, and NIH under Award U01HL137159. The content is solely the responsibility of the authors and does not represent the views of NSF and NIH.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alseghayer, R., Petrov, D., Chrysanthis, P.K., Sharaf, M., Labrinidis, A. (2019). DCS: A Policy Framework for the Detection of Correlated Data Streams. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-24124-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24123-0
Online ISBN: 978-3-030-24124-7
eBook Packages: Computer ScienceComputer Science (R0)