• 首 页期刊简介编 委 会规章制度作者指南审稿流程联系我们
期刊封面
吴迪,赵玉凤.融合LDA和GloVe模型的病症文本聚类算法[J].河北工程大学自然版,2022,39(1):92-98
融合LDA和GloVe模型的病症文本聚类算法
Disease Text Clustering Algorithm Based on LDA and GloVe Model
投稿时间:2021-06-21  
DOI:10.3969/j.issn.1673-9469.2022.01.014
中文关键词:  病症文本  LDA  GloVe  相似度结合  聚类
英文关键词:disease text  LDA  GloVe  similarity combined finite  clustering
基金项目:河北省自然科学基金资助项目(F2020402003,F2019402428)
作者单位
吴迪 河北工程大学 信息与电气工程学院, 河北 邯郸 056038 
赵玉凤 河北工程大学 信息与电气工程学院, 河北 邯郸 056038 
摘要点击次数: 256
全文下载次数: 70
中文摘要:
      针对隐含狄利克雷分布(LDA)模型特征提取时忽略语义信息的问题,提出一种融合LDA和全局文本表示(GloVe)模型的病症文本聚类算法LG&K-Medoide。首先,利用LDA对病症文本数据建模,采用JS(Jensen-Shannon)距离计算文本相似度;其次,利用GloVe对病症文本数据建模获取词向量,根据病症词性贡献度,对词向量权重进行标注,采用余弦距离计算基于GloVe建模加权的文本相似度;最后,将两种相似度进行结合,改进距离公式,实现K-Medoide聚类。实验结果表明,LG&K-Medoide算法较基于LDA,LDA+TF-IDF,LDA+Word2Vec模型的聚类算法具有较高的精度。
英文摘要:
      Aiming at solving the problem of ignoring semantic information in LDA model feature extraction, a disease text clustering algorithm LG&K-Medoide based on LDA and GloVe model was proposed. First, LDA was used to model the disease text data, and the JS distance was used to calculate the text similarity; second, GloVe was used to model the disease text data to obtain the word vector, the weight of the word vector was labeled according to the contribution to part of speech from disease text, and the cosine distance was used to calculate weighted text similarity based on GloVe modeling; finally, the two similarities are combined to improve the distance formula to realize K-Medoide clustering. The experimental results show that the LG&K-Medoide algorithm has higher accuracy than the clustering algorithm based on LDA, LDA+TF-IDF and LDA+Word2Vec models.
  查看/发表评论  下载PDF阅读器  下载全文
关闭