融合LDA和GloVe模型的病症文本聚类算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TG391

基金项目:

河北省自然科学基金资助项目(F2020402003,F2019402428)


Disease Text Clustering Algorithm Based on LDA and GloVe Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对隐含狄利克雷分布(LDA)模型特征提取时忽略语义信息的问题,提出一种融合LDA和全局文本表示(GloVe)模型的病症文本聚类算法LG&K-Medoide。首先,利用LDA对病症文本数据建模,采用JS(Jensen-Shannon)距离计算文本相似度;其次,利用GloVe对病症文本数据建模获取词向量,根据病症词性贡献度,对词向量权重进行标注,采用余弦距离计算基于GloVe建模加权的文本相似度;最后,将两种相似度进行结合,改进距离公式,实现K-Medoide聚类。实验结果表明,LG&K-Medoide算法较基于LDA,LDA+TF-IDF,LDA+Word2Vec模型的聚类算法具有较高的精度。

    Abstract:

    Aiming at solving the problem of ignoring semantic information in LDA model feature extraction, a disease text clustering algorithm LG&K-Medoide based on LDA and GloVe model was proposed. First, LDA was used to model the disease text data, and the JS distance was used to calculate the text similarity; second, GloVe was used to model the disease text data to obtain the word vector, the weight of the word vector was labeled according to the contribution to part of speech from disease text, and the cosine distance was used to calculate weighted text similarity based on GloVe modeling; finally, the two similarities are combined to improve the distance formula to realize K-Medoide clustering. The experimental results show that the LG&K-Medoide algorithm has higher accuracy than the clustering algorithm based on LDA, LDA+TF-IDF and LDA+Word2Vec models.

    参考文献
    相似文献
    引证文献
引用本文

吴迪,赵玉凤.融合LDA和GloVe模型的病症文本聚类算法[J].河北工程大学自然版,2022,39(1):92-98

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-21
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-04-09
  • 出版日期:
文章二维码