Abstract:Aiming at solving the problem of ignoring semantic information in LDA model feature extraction, a disease text clustering algorithm LG&K-Medoide based on LDA and GloVe model was proposed. First, LDA was used to model the disease text data, and the JS distance was used to calculate the text similarity; second, GloVe was used to model the disease text data to obtain the word vector, the weight of the word vector was labeled according to the contribution to part of speech from disease text, and the cosine distance was used to calculate weighted text similarity based on GloVe modeling; finally, the two similarities are combined to improve the distance formula to realize K-Medoide clustering. The experimental results show that the LG&K-Medoide algorithm has higher accuracy than the clustering algorithm based on LDA, LDA+TF-IDF and LDA+Word2Vec models.