计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 66-71.
刘慧清, 郭延哺, 李红灵, 李维华
LIU Hui-qing, GUO Yan-bu, LI Hong-ling, LI Wei-hua
摘要: 针对短文本特征词稀疏、表示能力不足等问题,提出了一种基于贝叶斯网的短文本特征扩展方法。该方法根据短文本中特征词之间的依赖关系构建语义贝叶斯网,定义特征词与短文本之间的关联度。基于贝叶斯网的推理计算关联度,将与短文本关联密切的特征词扩展到短文本中,以达到降低短文本的噪声、改善特征稀疏的目的。在此基础上,以短文本分类作为基本的文本分析任务,分析所提方法的可行性和有效性。在Amazon评论数据集上进行实验,结果表明所提方法是可行和有效的。
中图分类号:
[1]SEVERYN A,MOSCHITTI A.Learning to Rank Short TextPairs with Convolutional Deep Neural Networks[C]∥The International ACM SIGIR Conference.2015:373-382. [2]ZHANG W,XUE G R,XUE G R,et al,Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia[J].Acm Transactions on Intelligent Systems & Technology,2012,3(2):36:1-36:25. [3]NGUYEN T H,GRISHMAN R.Relation Extraction:Perspec-tive from Convolutional Neural Networks[C]∥The Workshop on Vector Space Modeling for Natural Language Processing.2015:39-48. [4]MA H,JI Y,LI X,et al.A Microblog Hot Topic Detection Algorithm Based on Discrete Particle Swarm Optimization[C]∥Pacific Rim International Conference on Trends in Artificial Intelligence.2016:271-282. [5]MA J L,LIU J L,YU C H.An efficient algorithm for Chinese text clustering[J].Computer Engineering & Science,2013,35(2):103-108. [6]高永兵,钟振华,王宇,等.基于混合方法的中文微博自动摘要技术研究[J].计算机工程与科学,2016,38(6):1257-1261. [7]王仲远,程健鹏,王海勋,等.短文本理解研究[J].计算机研究与发展,2016,53(2):262-269. [8]YU Z,WANG H,LIN X,et al.Understanding short textsthrough semantic enrichment and hashing[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(2):566-579. [9]WANG Y,HUANG H,FENG C.Query Expansion Based on a Feedback Concept Model for Microblog Retrieval[C]∥International Conference on World Wide Web.2017:559-568. [10]崔婉秋,杜军平,寇菲菲,等.面向微博短文本的社交与概念化语义扩展搜索方法[J].计算机研究与发展,2018,55(8):1641-1652. [11]吕超镇,姬东鸿,吴飞飞.基于LDA特征扩展的短文本分类[J].计算机工程与应用,2015,51(4):123-127. [12]XU K,FENG Y,HUANG S,et al.Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling[J].Computer Science,2015,71(7):941-949. [13]SRIRAM B,FUHRY D,DEMIR E,et al.Short text classification in twitter to improve information filtering[C]∥Internatio-nal ACM SIGIR Conference on Research and Development in Information Retrieval.2010:841-842. [14]ZHANG W,XU W,CHEN G,et al.A Feature Extraction Me-thod Based on Word Embedding for Word Similarity Computing[J].Communications in Computer & Information Science,2014,496:160-167. [15]袁满,欧阳元新,熊璋,等.一种基于频繁词集的短文本特征扩展方法[J].东南大学学报(自然科学版),2014,44(2):256-260. [16]郭永辉.面向短文本分类的特征扩展方法[D].哈尔滨:哈尔滨工业大学,2013. [17]MENDES E.Introduction to Bayesian Networks[J].Medical Imaging Technology,2014,21(2):1-5. [18]PEARL J.Probabilistic Reasoning in Intelligent Systems[M].Morgan Kaufmann Publishers,1988:1022-1027. [19]YI Z H,WEI W L,XI C Y,et al.Research Progress of Probabilistic Graphical Models:A Survey[J].Journal of Software,2013,24(11):2476-2497. [20]TANG B,KAY S,HE H.Toward Optimal Feature Selection in Naive Bayes for Text Categorization[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(9):2508-2521. [21]陈为,朱标,张宏鑫.BN-Mapping:基于贝叶斯网络的地理空间数据可视分析[J].计算机学报,2016(7):1281-1293. [22]王双成,高瑞,杜瑞杰.具有超父结点时间序列贝叶斯网络集成回归模型[J].计算机学报,2017,40(12):2748-2761. [23]HECKERMAN D,DAN G,CHICKERING D M.LearningBayesian networks:The combination of knowledge and statistical data[J].Machine Learning,1995,20(3):197-243. [24]BLITZER J,DREDZE M,PEREIRA F.Biographies,Bollywood,Boom-boxes and Blenders:Domain Adaptation for Sentiment Classification[C]∥Proceedings of ACL’07.2007. |
[1] | 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241 |
[2] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[3] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[4] | 李嘉睿, 凌晓波, 李晨曦, 李子木, 杨家海, 张蕾, 吴程楠. 基于贝叶斯攻击图的动态网络安全分析 Dynamic Network Security Analysis Based on Bayesian Attack Graphs 计算机科学, 2022, 49(3): 62-69. https://doi.org/10.11896/jsjkx.210800107 |
[5] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[6] | 史伟, 付月. 考虑语境的微博短文本挖掘:情感分析的方法 Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis 计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089 |
[7] | 韩丽霞, 张占营. 基于树增益朴素贝叶斯网络的服务定价策略 TAN-based Service Pricing Strategy 计算机科学, 2021, 48(6A): 203-. https://doi.org/10.11896/jsjkx.200900024 |
[8] | 张明阳, 王刚, 彭起, 张岩峰. 学术论文公开评审平台数据分析 Data Analysis of OpenReview 计算机科学, 2021, 48(6): 63-70. https://doi.org/10.11896/jsjkx.200500138 |
[9] | 李超, 覃飙. 高效计算因果网中的最大可能解释 Efficient Computation of MPE in Causal Bayesian Networks 计算机科学, 2021, 48(4): 14-19. https://doi.org/10.11896/jsjkx.200500155 |
[10] | 鲁博仁, 胡世哲, 娄铮铮, 叶阳东. 面向铁路文本分类的字符级特征提取方法 Character-level Feature Extraction Method for Railway Text Classification 计算机科学, 2021, 48(3): 220-226. https://doi.org/10.11896/jsjkx.200200061 |
[11] | 李建兰, 潘岳, 李小聪, 刘子维, 王天宇. 基于CiteSpace的中文评论文本研究现状与趋势分析 Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace 计算机科学, 2021, 48(11A): 17-21. https://doi.org/10.11896/jsjkx.210300172 |
[12] | 纪南巡, 孙晓燕, 李祯其. 多源异构用户生成内容的融合向量化表示学习 Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents 计算机科学, 2021, 48(10): 51-58. https://doi.org/10.11896/jsjkx.200900194 |
[13] | 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆. 一种低频词词向量优化方法及其在短文本分类中的应用 Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification 计算机科学, 2020, 47(8): 255-260. https://doi.org/10.11896/jsjkx.191000163 |
[14] | 倪海清, 刘丹, 史梦雨. 基于语义感知的中文短文本摘要生成模型 Chinese Short Text Summarization Generation Model Based on Semantic-aware 计算机科学, 2020, 47(6): 74-78. https://doi.org/10.11896/jsjkx.190600006 |
[15] | 徐源音,柴玉梅,王黎明,刘箴. 基于OCC模型和贝叶斯网络的情绪句分类方法 Emotional Sentence Classification Method Based on OCC Model and Bayesian Network 计算机科学, 2020, 47(3): 222-230. https://doi.org/10.11896/jsjkx.190200331 |
|