|DISEASE DIAGNOSIS AND INDIVIDUAL IDENTIFICATION MODELING BASED ON MACHINE LEARNING AND GUT MICROBIOTA
|Place of Conferral
|肠道菌群 Gut microbiota 机器学习 Machine learning 人工神经网络 Annual neural network 疾病诊断模型 Disease diagnostic model 个体识别模型 Individual identification model
肠道菌群是人类的终身共生菌，因其关系密切、种类众多、数量庞大被称为“人类第二基因组”。前人的研究发现，肠道菌群与多种疾病存在不同程度的关联，此外肠道菌群具有 “指纹特性”，可反应宿主健康状况、性别、年龄等个人信息。但遗憾的是，目前国内外相关研究普遍存在样本来源单一、宿主背景简单及分析方法不足等问题，针对单一影响因素的研究较多，而对样本来源复杂、疾病种类多、个体差异大的多因素、多变量的混合样本的研究较少。鉴于此，本研究选择毒品成瘾、Ⅱ型糖尿病、帕金森病和胆囊炎四种疾病为研究对象，对采集自上海、云南、甘肃、山东四个地区，及不同性别、年龄、身高、体重的1072名志愿者的肠道菌群，在16S rRNA基因测序的基础上，通过16种机器学习算法和20种回归分类方法，首先构建了单疾病诊断模型，随后将以上菌群测序数据混合，建立复杂背景下的疾病、地区、性别、年龄、体重、身高、身体质量指数、烟酒嗜好8类特征的个体识别模型，为将肠道菌群应用于疾病诊断和个体识别提供理论依据和实践探索。主要研究结果如下：
从Ⅱ型糖尿病组中筛选出22个潜在菌群标志物，其中丁酸代谢菌（Bacteroides, Faecalibacterium, Roseburia）与丙酸代谢菌（Prevotella, Megamonas, Coprococcus）丰度显著下降，提示短链脂肪酸相关菌是Ⅱ型糖尿病的关键因素；Escherichia Shigella丰度显著上升，与患者顽固性腹泻等临床症状吻合；另外糖异生和糖酵解水平显著提升，提示宿主高血糖环境与肠道菌群糖代谢活跃有关。
Gut microbiota, a lifelong symbiotic organism with humans, is often referred to as the "second genome" due to its close relationship, diverse species and large population. Previous research has demonstrated varying degrees of association between the gut microbiota and numerous diseases. Furthermore, studies have revealed that the gut microbiota possesses unique "fingerprint features," and it can be used to diagnose and identify individual-specific information such as health condition, gender and age. However, it is regrettable that current domestic and international research struggled with issues such as limited sample sources, simplified host backgrounds, and inadequate analysis methods. The majority of studies focus on the diagnosis and identification of single impact factor cases, while research on the diagnosis and identification of complex mixed microbial communities with diverse sample sources, various diseases, and significant individual variations is relatively limited. In light of these circumstances, this study selected four diseases, namely drug use disorder, type 2 diabetes, Parkinson's syndrome, and cholecystitis, as research targets. The study collected gut microbiota samples from 1,072 volunteers in four different regions (Shanghai, Yunnan, Gansu, and Shandong) with variations in gender, age, height, weight, and other factors. Based on 16S rRNA gene sequencing, a total of 16 machine learning algorithms and 20 regression classification methods were employed. Firstly, individual disease diagnosis models were constructed. Subsequently, by combining the aforementioned microbiota sequencing data, an individual identification model was established for the complex backgrounds of diseases, regions, genders, ages, weights, heights, body mass indexes, nicotine and alcohol preferrance across eight categories of single features. This research provides a theoretical foundation and practical exploration for the application of gut microbiota in disease diagnosis and individual identification. The main results were listed as follows:
19 potential bacterial markers were selected from the drug use disorder group. Among them, indoles and short-chain fatty acid metabolism-related bacteria showed significant changes, which were likely to cause intestinal inflammation, irritable bowel syndrome, depression, etc. For example, the up-regulation of Prevotella_9 was likely to exacerbate intestinal inflammation and lead to schizophrenia. The up-regulation of Alistipes is likely to affect the level of tryptophan, a precursor of serotonin, and lead to depression. Down-regulated abundance of Faecalibacterium can cause inflammatory bowel disease. The downregulation of Dorea abundance compromises the integrity of the intestinal mucosal barrier.
22 potential bacterial markers were selected from the type 2 diabetes group. The abundances of butyric acid metabolizing bacteria (Bacteroides, Faecalibacterium, Roseburia) and propionic acid metabolizing bacteria (Prevotella, Megamonas, Coprococcus) decreased significantly, suggesting that short-chain fatty acid-related bacteria is the key factor of type 2 diabetes mellitus. Notably, Escherichia Shigella is a bacterial pathogen associated with bacterial dysentery, which aligns with clinical symptoms such as persistent diarrhea in diabetes patients. In addition, the levels of gluconeogenesis and glycolysis were significantly increased, suggesting that the host hyperglycemic environment is related to the active glucose metabolism of intestinal flora.
31 potential bacterial markers were selected from the Parkinson’s syndrome group. The abundance of short-chain fatty acid producing bacteria and anti-inflammatory related bacteria decreased significantly, suggesting that intestinal flora is one of the causes of gastrointestinal symptoms such as enteritis and malnutrition in Parkinson’s syndrome patients. Contrary to the previous results from Japan, the abundance of aging marker Eubacterium was significantly increased in our Chinese samples. In addition, the level of gluconeogenesis was significantly up-regulated, suggesting that diabetes is a potential risk factor for Parkinson’s syndrome.
33 potential bacterial markers were selected from the cholecystitis group, among them Escherichia - Shigella abundance ratio as high as 32.59%, which may be the main cause of long-term diarrhea in cholecystitis patients. Additionally, the cobalamin biosynthesis level in the cholecystitis group was significantly reduced, resulting in an increase in methylmalonic acid, reducing the inhibition of fatty acid synthesis, indicating its association with abnormalities in fatty acid and cholesterol metabolism.
The accuracy rates in disease diagnosis models are: 91.01% for drug use disorder, 95.55% for type 2 diabetes, 93.79% for Parkinson's syndrome, and 94.67% for cholecystitis. The multiple factor individual identification model achieved an accuracy of 77% for disease, 73% for gender, 76% for region. The diagnostic accuracy rate reaches the advanced level at home and abroad. The age regression model had an error of ±8.4 years, the height model had an error of ±6.7 cm, the weight model had an error of ±9.6 kg, and the BMI model had an error of ±2.7.
This study is the first to systematically analyze of gut microbiota diversity and function in four diseases from four regions of China at the same time. It established 4 disease diagnostic models and 7 individual identification models including disease, gender, region, age, weight, height, smoking and alcohol habits, and BMI . These models demonstrated certain capabilities in disease diagnosis and individual identification, which lays a solid theoretical foundation for the application fields of early screening and early warning of disease, auxiliary diagnosis, fecal transplantation treatment, forensic individual identification and appraisal, and suspect tracking.
|MOST Discipline Catalogue
|理学 - 生物学 - 微生物学
|刘耘汀. 基于机器学习的肠道菌群疾病诊断及个体识别建模研究[D]. 兰州. 兰州大学,2023.
|Files in This Item:
|There are no files associated with this item.
|Recommend this item
|Export to Endnote
|Similar articles in Google Scholar
|Similar articles in Baidu academic
|Similar articles in Bing Scholar