计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 216-221.doi: 10.11896/j.issn.1002-137X.2018.06.039
季海娟, 周从华, 刘志锋
JI Hai-juan, ZHOU Cong-hua, LIU Zhi-feng
摘要: 时间序列数据的特征表示方法是时间序列数据挖掘任务的关键技术,符号聚合近似表示(SAX)是特征表示方法中比较常用的一种。针对SAX算法在各序列段表示符号一致时无法区分时间序列间的相似性这一缺陷,提出了一种基于始末距离的时间序列符号聚合近似表示方法(SAX_SM)。由于时间序列有很强的形态趋势,因此文中提出的方法选用起点和终点来表示各个序列段的形态特征,并使用各序列段的形态特征和表示符号来近似表示时间序列数据,以将其从高维空间映射到低维空间;然后,针对起点和终点构建始末距离来计算两序列段间的形态距离;最后,结合始末距离和符号距离定义一种新的距离度量方式,以更客观地度量时间序列间的相似性。理论分析表明,该距离度量满足下界定理。在20组UCR时间序列数据集上的实验表明,所提SAX_SM方法在13个数据集中获得了最高的分类准确率(包含并列最大的),而SAX只在6个数据集中获得了最高的分类准确率(包含并列最大的),因此SAX_SM具有比SAX更优的分类效果。
中图分类号:
[1]CRYER J D,CHAN K S,时间序列分析及应用:R 语言[M].潘红宇,译.北京:机械工业出版社,2011:25-29. [2]HAN J,PEI J,KAMBER M.Data mining:concepts and techniques[M].Amsterdam:Elsevier,2011:20-23. [3]FU T.A review on time series data mining[J].Engineering Applications of Artificial Intelligence,2011,24(1):168-181. [4]LI H L.Research on Feature Representation and Similarity Meaure Methods in Time Series Data Mining[D].Dalian:Dalian University of Technology,2012.(in Chinese) 李海林.时间序列数据挖掘中的特征表示与相似性度量方法研究[D].大连:大连理工大学,2012. [5]ESLING P,AGON C.Time-series data mining[J].ACM Computing Surveys (CSUR),2012,45(1):12. [6]LI H L,GUO C H.Survey of feature representations and similarity measurements in time series data mining[J].Application Research of Computers,2013,30(5):1285-1291.(in Chinese) 李海林,郭崇慧.时间序列数据挖掘中特征表示与相似性度量研究综述[J].计算机应用研究,2013,30(5):1285-1291. [7]YUAN J D,WANG Z H.Review of Time Series Representation and ClassificationTechniques[J].Computer Science,2015,42(3):1-7.(in Chinese) 原继东,王志海.时间序列的表示与分类算法综述[J].计算机科学,2015,42(3):1-7. [8]AGRAWAL R,FALOUTSOS C,SWAMI A.Efficient similarity search in sequence databases[C]//International Conference on Foundations of Data Organization and Algorithms.1993:69-84. [9]RATANAMAHATANA C,KEOGH E,BAGNALL A J,et al.A novel bit level time series representation with implication of similarity search and clustering[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2005:771-777. [10]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:a novel symbolic representation of time series[J].Data Mining and knowledge discovery,2007,15(2):107-144. [11]AZZOUZI M,NABNEY I T.Analysing time series structure with Hidden Markov Models[C]//Neural Networks for Signal Processing VIII,1998.Proceedings of the 1998 IEEE Signal Processing Society Workshop.IEEE,1998:402-408. [12]KALPAKIS K,GADA D,PUTTAGUNTA V.Distance measures for effective clustering of ARIMA time-series[C]//Proceedings IEEE International Conference on Data Mining,2001(ICDM 2001).IEEE,2001:273-280. [13]KEOGH E,CHAKRABARTI K,PAZZANI M,et al.Dimensionality reduction for fast similarity search in large time series databases[J].Knowledge and Information Systems,2001,3(3):263-286. [14]ZHU Y.High performance data mining in time series:techniques and case studies[D].New York:New York University,2004. [15]LKHAGVA B,SUZUKI Y,KAWAGOE K.New time series data representation ESAX for financial applications[C]//22nd International Conference on Data Engineering Workshops.IEEE,2006:115-115. [16]ZHONG Q L,CAI Z X.The Symbolic Algorithm for Time Series Data Based on Statistic Feature[J].Chinese Journal of Com-puters,2008,31(10):1857-1864.(in Chinese) 钟清流,蔡自兴.基于统计特征的时序数据符号化算法[J].计算机学报,2008,31(10):1857-1864. [17]ESMAEL B,ARNAOUT A,FRUHWIRTH R,et al.Multivariate time series classification by combining trend-based andvaluebased approximations[M]//Computational Science and Its Applications-ICCSA 2012.Springer Berlin Heibelberg,2012:392-403. [18]SUN Y,LI J,LIU J,et al.An improvement of symbolic aggregate approximation distance measure for time series[J].Neurocomputing,2014,138(11):189-198. [19]ZHENG X,SHENG L H,CUI X Y.A Piecewise Aggregation Approximation of Time Series Based on Wavelet Entropy [J].Computer Simulation,2015,32(1):411-415.(in Chinese) 郑旭,盛立辉,崔宵语.基于小波熵的时间序列分段聚合近似表示[J].计算机仿真,2015,32(1):411-415. [20]WANG Y,AN Y.Composite similarity measure algorithm[C]//2016 12th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).IEEE,2016:1254-1258. [21]LARSEN R J,MARX M L.An introduction to mathematical statistics and its applications[M].Prentice-Hall Englewood,1986:470-481. |
[1] | 丁武, 马媛, 杜诗蕾, 李海辰, 丁公博, 王超. 基于XGBoost算法的多元水文时间序列趋势相似性挖掘 Mining Trend Similarity of Multivariate Hydrological Time Series Based on XGBoost Algorithm 计算机科学, 2020, 47(11A): 459-463. https://doi.org/10.11896/jsjkx.200500128 |
[2] | 李海林,杨丽彬. 基于增量动态时间弯曲的时间序列相似性度量方法 Similarity Measure for Time Series Based on Incremental Dynamic Time Warping 计算机科学, 2013, 40(4): 227-230. |
|