High Influencing Pattern Discovery over Time Series Data
Abstract
:1. Introduction
- For an influence propagation process abstracted from epidemic dispersal scenarios, we define a semantic proximate relationship to describe the nearness between instances of distinct features and accordingly propose a novel influencing pattern, and we further introduce a high influencing pattern based on the directions of influence and inner changes in instances perceived by an attribute-aware analysis of time series data to meet the needs for pattern discovery in the influence propagation process.
- We propose a framework for mining high IPs, design a Benchmark algorithm wherein we apply a top–down method to identify IPs to implement it, utilize a three-layer hashmap structure to store the IPs, compute the influence between instances and the influence of features in the IPs by using preprocessed multi-dimensional attribute vectors and attribute weights, and design influencing metrics to filter high IPs. As the metrics do not satisfy the downward closure property, we analyze two time-saving properties, propose two corresponding pruning strategies, and thus design two improved algorithms to boost efficiency.
- Extensive experiments were conducted on real and synthetic datasets, and the experimental results verified the effectiveness, efficiency, and scalability of our methods.
2. Related Works
2.1. Spatial Co-Location Pattern Mining
2.2. Influence Analysis
2.3. Spatio-Temporal Pattern Discovery
3. Definitions and Problem Statement
3.1. Definitions and Formulae
3.2. Problem Statement
4. Methodology
4.1. A Framework and Benchmark Algorithm
- Data preprocessing (Steps 1–4): In this step, one extracts time series data from spatio-temporal datasets, identifies each influential instance that has influential media to/from other instances, generates proximate relations for those influential instances (Steps 1–3), and calculates attribute matrices and , weight vectors and , and probabilities p, as described in Section 5.1.3 (Step 4).
- Identifying influencing patterns (Steps 5–12): In this step, one first initiates IPs and HIPs sets and generates a star neighbor instance set as per the proximate relations of instances (Steps 5–6). Then, one applies a three-layer hashmap structure to extract candidate patterns from the star neighbor instance set (Step 7). Next, one generates star row instances and extracts star participation instances (Steps 8–9). Then, one traverses candidate patterns in a top–down way (Step 10) to check whether they are IPs. The identified IPs are added to the IPs set (Step 11); otherwise, the candidate patterns are decomposed with central feature to sub candidate patterns at a smaller size (Step 12).
- Mining high influencing patterns (Steps 13–19): This process starts at size-2 IPs (Step 13). Each c in size-k IPs (Steps 14–15) gets c’s star influence index by calling Function 1 (Step 16). Pattern c, whose SIIc SIIthreshold, is then added to HIPs set (Step 17). The while loop continues in ascending order (Step 18). All found HIPs are returned (Step 19).
Listing 1. A benchmark algorithm for high IP mining (Benchmark for short). |
Input: , , , SIIthreshold. Output: All high influencing patterns satisfying SIIthreshold. Variables: Please refer to Table A1 in Appendix A. |
Data Preprocessing:
|
Listing 2. Function 1: calculate_star_influence_index(c, Ns, p, A, B, , ). |
Input: c, Ns, p, A, B, , . Output: Star influence index of pattern c. Variables: Please refer to Table A1 in Appendix A. |
|
4.2. Analysis of Related Properties
4.3. Two Improved Algorithms with Pruning Strategies
Listing 3. Improved algorithm for high IP mining with a pruning strategy (Improved-1 for short). |
Input, Output, Variables: The same as in Listing 1. |
|
Listing 4. Improved algorithm for high IP mining with two pruning strategies (Improved-2 for short). |
Input, Output, Variables: The same as in Listing 1. |
|
4.4. Analysis of Time Complexity
5. Experimental Evaluation
5.1. Study Area and Data
5.1.1. Study Area
5.1.2. Data Description
5.1.3. Data Preprocessing
5.2. Effectiveness of High IP Mining Algorithms
5.2.1. Comparison of Top 5 Patterns at All Sizes Mined by Benchmark and HICP Mining of the Real-1 Dataset
5.2.2. Comparison of Mined Results of Benchmark and HICP Mining of the Real-1 Dataset
- Effect of distance threshold on pattern amount
- Effect of SIIthreshold/InIthreshold on pattern amount
5.3. Efficiencies of High IP Mining Algorithms
5.3.1. Efficiency Comparison of High IP and HICP Mining Algorithms over the Real-1 Dataset
- Effect of distance threshold on efficiency
- Effect of SIIthreshold/InIthreshold on efficiency
5.3.2. Efficiency Comparison of High IP and HICP Mining Algorithms over the Syn-1 Dataset
- Effect of distance threshold on efficiency
- Effect of SIIthreshold/InIthreshold on efficiency
5.4. Scalability of High IP Mining Algorithms
- Effect of instance amount on scalability
- Effect of influential media amount on scalability
- Effect of feature amount on scalability
- Effect of attribute dimensions on scalability
6. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Notation | Description |
---|---|
Cs | A set of all candidate patterns |
HIPs | A set of all high influencing patterns |
HIPsk-1 | A set of all size-k-1 high influencing patterns |
Influence of feature fi in pattern c | |
IPs | A set of all influencing patterns |
IPsk | A set of all size-k influencing patterns |
Ns | A set of all star neighbor instance sets |
Star influence index of pattern c | |
Star influence ratio of feature fi in pattern c | |
An upper bound index for star influence index of pattern c | |
Star participation instances of feature fi in pattern c | |
Star row instances of feature fi in pattern c |
Appendix B
References
- Shekhar, S.; Huang, Y. Discovering Spatial Co-Location Patterns: A Summary of Results. Lect. Notes Comput. Sci. 2001, 2121, 236–256. [Google Scholar] [CrossRef]
- Li, J.; Adilmagambetov, A.; Jabbar, M.S.M.; Zaïane, O.R.; Osornio-Vargas, A.; Wine, O. On discovering co-location patterns in datasets: A case study of pollutants and child cancers. GeoInformatica 2016, 20, 651–692. [Google Scholar] [CrossRef] [Green Version]
- Yu, W. Spatial co-location pattern mining for location-based services in road networks. Expert Syst. Appl. 2016, 46, 324–335. [Google Scholar] [CrossRef]
- Zhang, D.; Guo, Z.; Guo, F.; Dong, Y. An offline map matching algorithm based on shortest paths. Int. J. Geogr. Inf. Sci. 2021, 2, 1–24. [Google Scholar] [CrossRef]
- Wang, Y.; Zeng, D.; Zheng, X.; Wang, F. Propagation of online news: Dynamic patterns. In Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics, Richardson, TX, USA, 8–11 June 2009; pp. 257–259. [Google Scholar] [CrossRef]
- Chen, Z. Epidemic Thresholds in Networks: Impact of Heterogeneous Infection Rates and Recovery Rates. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Jiang, C.; Zhang, Y.; Wang, H.; Zhou, Y.; Zou, Y. Study on Coupled Social Network Public Opinion Communication Based on Improved SEIR. In Proceedings of the 2020 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Exeter, UK, 17–19 December 2020; pp. 1495–1500. [Google Scholar] [CrossRef]
- Michael, F.G. The validity and usefulness of laws in geographic information science and geography. Ann. Assoc. Am. Geogr. 2004, 94, 300–303. [Google Scholar] [CrossRef] [Green Version]
- Tobler, W. On the First Law of Geography: A Reply. Ann. Assoc. Am. Geogr. 2004, 94, 304–310. [Google Scholar] [CrossRef]
- Yu, W. Identifying and Analyzing the Prevalent Regions of a Co-Location Pattern Using Polygons Clustering Approach. ISPRS Int. J. Geo-Infor. 2017, 6, 259. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Cao, C.; Chang, C. The first law of geography and spatio-temporal proximity. Chin. J. Nat. 2007, 29, 69–71. Available online: https://www.nature.shu.edu.cn/CN/Y2007/V29/I2/69 (accessed on 14 October 2021).
- Brockmann, D.; Hufnagel, L.; Geisel, T. The scaling laws of human travel. Nature 2006, 439, 462–465. [Google Scholar] [CrossRef]
- Barua, S.; Sander, J. Mining Statistically Significant Co-Location and Segregation Patterns. IEEE Trans. Knowl. Data Eng. 2013, 26, 1185–1199. [Google Scholar] [CrossRef]
- Duan, J.; Wang, L.; Hu, X. The effect of spatial autocorrelation on spatial co-location pattern mining. In Proceedings of the 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China, 21–23 July 2017; pp. 210–214. [Google Scholar] [CrossRef]
- Wang, L.; Bao, X.; Zhou, L.; Chen, H. Maximal Sub-Prevalent Co-Location Patterns and Efficient Mining Algorithms. In Proceedings of the International Conference on Web Information Systems Engineering, Puschino, Russia, 7–11 October 2017; Bouguettaya, A., Gao, Y., Klimenko, A., Eds.; Springer: Cham, Switzerland, 2017; pp. 199–214. [Google Scholar] [CrossRef]
- Wang, L.; Bao, X.; Zhou, L.; Chen, H. Mining maximal sub-prevalent co-location patterns. World Wide Web 2019, 22, 1971–1997. [Google Scholar] [CrossRef]
- Huang, Y.; Shekhar, S.; Xiong, H. Discovering Colocation Patterns from Spatial Datasets: A General Approach. IEEE Trans. Knowl. Data Eng. 2004, 16, 1472–1485. [Google Scholar] [CrossRef] [Green Version]
- Yoo, J.S.; Shekhar, S.; Smith, J.; Kumquat, J.P. A partial join approach for mining co-location patterns. In Proceedings of the 12th annual ACM international workshop on Geographic information systems—GIS ’04, Washington, DC, USA, 12–13 November 2004; pp. 241–249. [Google Scholar] [CrossRef]
- Yoo, J.S.; Shekhar, S.; Celik, M. A Join-Less Approach for Co-Location Pattern Mining: A Summary of Results. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX, USA, 27–30 November 2006; p. 4. [Google Scholar] [CrossRef]
- Wang, L.; Bao, Y.; Lu, J.; Yip, J. A new join-less approach for co-location pattern mining. In Proceedings of the 2008 8th IEEE International Conference on Computer and Information Technology, Sydney, NSW, Australia, 8–11 July 2008; pp. 197–202. [Google Scholar] [CrossRef]
- Wang, L.; Bao, Y.; Lu, Z. Efficient discovery of spatial co-location patterns using the iCPI-tree. Open Inf. Syst. J. 2009, 3, 69–80. [Google Scholar] [CrossRef] [Green Version]
- Yao, X.; Peng, L.; Yang, L.; Chi, T. A fast space-saving algorithm for maximal co-location pattern mining. Expert Syst. Appl. 2016, 63, 310–323. [Google Scholar] [CrossRef]
- Bao, X.; Wang, L. A clique-based approach for co-location pattern mining. Inf. Sci. 2019, 490, 244–264. [Google Scholar] [CrossRef]
- Tran, V.; Wang, L.; Chen, H.; Xiao, Q. MCHT: A maximal clique and hash table-based maximal prevalent co-location pattern mining algorithm. Expert Syst. Appl. 2021, 175, 114830. [Google Scholar] [CrossRef]
- Wang, L.; Jiang, W.; Chen, H.; Fang, Y. Efficiently Mining High Utility Co-Location Patterns from Spatial Data Sets with In-Stance-Specific Utilities. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA ’17), Taipei, Taiwan, 11–14 April 2021; Candan, S., Chen, L., Pedersen, T., Eds.; Springer: Cham, Switzerland, 2017; pp. 458–474. [Google Scholar] [CrossRef]
- Wang, X.; Lei, L.; Wang, L.; Yang, P.; Chen, H. Spatial Co-location Pattern Discovery Incorporating Fuzzy Theory. In IEEE Transactions on Fuzzy Systems; IEEE: Piscataway, NJ, USA, 2021; p. 1. [Google Scholar] [CrossRef]
- Chen, L. Spatial High Impact Co-Location Pattern Mining. Master’s Thesis, Yunnan University, Kunming, China, 2017. Available online: http://113.55.8.26/search/article?id=15501 (accessed on 14 October 2021).
- Fang, D.; Wang, L.; Yang, P.; Chen, L. Mining high influence co-location patterns from instances with attributes. Evol. Intell. 2019, 13, 197–210. [Google Scholar] [CrossRef]
- Lei, L.; Wang, L.; Zeng, Y.; Zeng, L. Discovering High Influence Co-Location Patterns from Spatial Datasets. In Proceedings of the 2019 IEEE International Conference on Big Knowledge (ICBK ’19), Beijing, China, 10–11 November 2019; pp. 137–144. [Google Scholar] [CrossRef]
- Ma, D.; Chen, H.; Wang, L.; Xiao, Q. Dominant feature mining of spatial sub-prevalent co-location patterns. J. Comput. Appl. 2019, 40, 465–472. [Google Scholar] [CrossRef]
- Shang, Y.; Fan, X.; Yu, H. A node influence measurement algorithm based on characteristics of users and propagation. Comput. Eng. Sci. 2015, 37, 2105–2111. Available online: http://en.cnki.com.cn/Article_en/CJFDTotal-JSJK201511017.htm (accessed on 14 October 2021).
- Li, W.; Li, C.; Geng, Y. APS: Attribute-aware privacy-preserving scheme in location-based services. Inf. Sci. 2020, 527, 460–476. [Google Scholar] [CrossRef]
- Citraro, S.; Rossetti, G. Eva: Attribute-aware Network Segmentation. In Complex Networks and Their Applications VIII, Proceedings of the International Conference on Complex Networks and Their Applications VIII, Lisbon, Portugal, on 10–12 December 2019; Cherifi, H., Gaito, S., Mendes, J.F., Moro, E., Rocha, L.M., Eds.; Springer: Cham, Switzerland, 2020; Volume 881, pp. 141–151. [Google Scholar] [CrossRef] [Green Version]
- Subbian, K.; Aggarwal, C.; Srivastava, J. Content-centric flow mining for influence analysis in social streams. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management-CIKM ’13, San Francisco, CA, USA, 27 October–1 November 2013; ACM Press: New York, NY, USA, 2013; pp. 841–846. [Google Scholar] [CrossRef] [Green Version]
- Yoo, J.S.; Shekhar, S.; Kim, S.; Celik, M. Discovery of Co-evolving Spatial Event Sets. In 2006 SIAM International Conference on Data Mining; Society for Industrial & Applied Mathematics (SIAM): University City, PA, USA, 2006; pp. 306–315. [Google Scholar] [CrossRef] [Green Version]
- Celik, M.; Shekhar, S.; Rogers, J.P.; Shine, J.A.; Yoo, J.S. Mixed-drove Spatio-temporal Co-occurence Pattern Mining: A Summary of Results. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 119–128. [Google Scholar] [CrossRef]
- Celik, M.; Shekhar, S.; Rogers, J.P.; Shine, J.A. Sustained Emerging Spatio-Temporal Co-occurrence Pattern Mining: A Summary of Results. In Proceedings of the 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), Arlington, VA, USA, 13–15 November 2006; pp. 106–115. [Google Scholar] [CrossRef] [Green Version]
- Celik, M.; Shekhar, S.; Rogers, J.P.; Shine, J.A.; Kang, J.M. Mining at Most Top-K% Mixed-drove Spatio-temporal Co-occurrence Patterns: A Summary of Results. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, Istanbul, Turkey, 17–20 April 2007; pp. 565–574. [Google Scholar] [CrossRef]
- Qian, F.; He, Q.; He, J. Mining Spread Patterns of Spatio-temporal Co-occurrences over Zones. In Lecture Notes in Computer Science; Gabler: Seoul, Korea, 2009; pp. 677–692. [Google Scholar] [CrossRef]
- Celik, M. Discovering partial spatio-temporal co-occurrence patterns. In Proceedings of the 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, 29 June–1 July 2011; pp. 116–120. [Google Scholar] [CrossRef]
- Qian, F.; Yin, L.; He, Q.; He, J. Mining spatio-temporal co-location patterns with weighted sliding window. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 20–22 November 2009; Volume 3, pp. 181–185. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, L.; Zhang, P. A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets. IEEE Trans. Knowl. Data Eng. 2008, 20, 433–448. [Google Scholar] [CrossRef] [Green Version]
- Hufnagel, L.; Brockmann, D.; Geisel, T. Forecast and control of epidemics in a globalized world. Proc. Natl. Acad. Sci. USA 2004, 101, 15124–15129. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.; Gao, Y.; Fang, Z.; Miao, X.; Jensen, C.S.; Guo, C. Real-time distributed co-movement pattern detection on streaming trajectories. In VLDB Endowment; VLDB Endowment: Los Angeles, CA, USA, 2019; Volume 12, pp. 1208–1220. [Google Scholar] [CrossRef] [Green Version]
- Hu, X.; Wang, G.; Duan, J. Mining Maximal Dynamic Spatial Co-Location Patterns. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1026–1036. [Google Scholar] [CrossRef] [Green Version]
- Moosavi, S.; Samavatian, M.H.; Nandi, A.; Parthasarathy, S.; Ramnath, R. Short and Long-Term Pattern Discovery over Large-Scale Geo-Spatio-Temporal Data. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19), Anchorage, AK, USA, 4–8 August 2019; Assoc Comp Machinery: New York, NY, USA, 2019; pp. 2905–2913. [Google Scholar] [CrossRef] [Green Version]
- Shekhar, S.; Jiang, Z.; Ali, R.Y.; Eftelioglu, E.; Tang, X.; Gunturi, V.M.V.; Zhou, X. Spatio-temporal data mining: A computational perspective. ISPRS Int. J. Geo-Inf. 2015, 4, 2306–2338. [Google Scholar] [CrossRef]
- Feng, K.; Cong, G.; Jensen, C.S.; Guo, T. Finding attribute-aware similar regions for data analysis. In Proceedings of the VLDB Endowment; VLDB Endowment: Los Angeles, CA, USA, 2019; Volume 12, pp. 1414–1426. [Google Scholar] [CrossRef]
- Saaty, T.L. Priority setting in complex problems. IEEE Trans. Eng. Manag. 1983, 30, 140–155. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, P.; Zhang, C. On the Relationships between Clustering and Spatial Co-Location Pattern Mining. Int. J. Artif. Intell. Tools 2008, 17, 55–70. [Google Scholar] [CrossRef] [Green Version]
- Lei, L.; Wang, L.; Wang, X. Mining Spatial Co-Location Patterns by the Fuzzy Technology. In Proceedings of the 2019 IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 10–11 November 2019; pp. 129–136. [Google Scholar] [CrossRef]
- Li, Z.; Wang, X.; Li, J.; Zhang, Q. Deep attributed network representation learning of complex coupling and interaction. Knowl.-Based Syst. 2021, 212, 106618. [Google Scholar] [CrossRef]
- Wang, C.; Dong, X.; Zhou, F.; Cao, L.; Chi, C.-H. Coupled Attribute Similarity Learning on Categorical Data. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 781–797. [Google Scholar] [CrossRef]
- He, L.; Lu, C.-T.; Chen, Y.; Zhang, J.; Shen, L.; Yu, P.S.; Wang, F. A Self-Organizing Tensor Architecture for Multi-view Clustering. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 1007–1012. [Google Scholar] [CrossRef] [Green Version]
- Hong, R.; He, Y.; Wu, L.; Ge, Y.; Wu, X. Deep Attributed Network Embedding by Preserving Structure and Attribute Information. IEEE Trans. Syst. Man. Cybern. Syst. 2021, 51, 1434–1445. [Google Scholar] [CrossRef]
- Djenouri, Y.; Comuzzi, M. Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf. Sci. 2017, 420, 1–15. [Google Scholar] [CrossRef]
Name of Dataset | Instance Amount | Influential Media Amount | Feature Amount | Attribute Dimensions |
---|---|---|---|---|
Real-1 | 365 | 1,300 | 7 | 6 |
Syn-1 | 20,000 | 5,000 | 15 | 6 |
Syn-2/3/4/5/6 | 200,000/400,000/600,000/ 800,000/1,000,000 | 10,000 | 20 | 10 |
Syn-2/7/8/9/10 | 200,000 | 10,000/20,000/30,000/40,000/50,000 | 20 | 10 |
Syn-2/11/12/13/14 | 200,000 | 10,000 | 20/30/40/50/60 | 10 |
Syn-2/15/16/17/18 | 200,000 | 10,000 | 20 | 10/20/30/40/50 |
Size-2 Patterns | Size-3 Patterns | Size-4 Patterns | Size-5 Patterns | |
---|---|---|---|---|
Top 5 high influencing patterns (mined by Benchmark) | {E,F} 0.036990 {D,E} 0.035215 {F,G} 0.019201 | {B,D,E} 0.155312 {B,E,F} 0.089215 {C,E,F} 0.025089 {D,E,F} 0.021476 {C,D,F} 0.017934 | {A,B,C,D} 0.068126 {B,C,D,E} 0.047151 {B,C,E,F} 0.013266 | {A,B,C,D,E} 0.01 |
Top 5 high influence co-location patterns (mined by HICP mining) | {F,G} 0.019537 {E,F} 0.015028 {C,F} 0.014729 {D,F} 0.013853 {C,D} 0.013547 | {C,E,F} 0.012839 {C,D,F} 0.012581 {D,E,F} 0.011468 {D,F,G} 0.011316 {C,D,E} 0.010995 | {C,D,E,F} 0.010517 | null |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, D.; Wang, L.; Wang, J.; Wang, M. High Influencing Pattern Discovery over Time Series Data. ISPRS Int. J. Geo-Inf. 2021, 10, 696. https://doi.org/10.3390/ijgi10100696
Fang D, Wang L, Wang J, Wang M. High Influencing Pattern Discovery over Time Series Data. ISPRS International Journal of Geo-Information. 2021; 10(10):696. https://doi.org/10.3390/ijgi10100696
Chicago/Turabian StyleFang, Dianwu, Lizhen Wang, Jialong Wang, and Meijiao Wang. 2021. "High Influencing Pattern Discovery over Time Series Data" ISPRS International Journal of Geo-Information 10, no. 10: 696. https://doi.org/10.3390/ijgi10100696
APA StyleFang, D., Wang, L., Wang, J., & Wang, M. (2021). High Influencing Pattern Discovery over Time Series Data. ISPRS International Journal of Geo-Information, 10(10), 696. https://doi.org/10.3390/ijgi10100696