兰州大学机构库 >生命科学学院
基于机器学习的肠道菌群疾病诊断及个体识别建模研究
Alternative TitleDISEASE DIAGNOSIS AND INDIVIDUAL IDENTIFICATION MODELING BASED ON MACHINE LEARNING AND GUT MICROBIOTA
刘耘汀
Subtype博士
Thesis Advisor安黎哲
2023-09-03
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name理学博士
Degree Discipline微生物学
Keyword肠道菌群 Gut microbiota 机器学习 Machine learning 人工神经网络 Annual neural network 疾病诊断模型 Disease diagnostic model 个体识别模型 Individual identification model
Abstract

肠道菌群是人类的终身共生菌,因其关系密切、种类众多、数量庞大被称为“人类第二基因组”。前人的研究发现,肠道菌群与多种疾病存在不同程度的关联,此外肠道菌群具有 “指纹特性”,可反应宿主健康状况、性别、年龄等个人信息。但遗憾的是,目前国内外相关研究普遍存在样本来源单一、宿主背景简单及分析方法不足等问题,针对单一影响因素的研究较多,而对样本来源复杂、疾病种类多、个体差异大的多因素、多变量的混合样本的研究较少。鉴于此,本研究选择毒品成瘾、Ⅱ型糖尿病、帕金森病和胆囊炎四种疾病为研究对象,对采集自上海、云南、甘肃、山东四个地区,及不同性别、年龄、身高、体重的1072名志愿者的肠道菌群,在16S rRNA基因测序的基础上,通过16种机器学习算法和20种回归分类方法,首先构建了单疾病诊断模型,随后将以上菌群测序数据混合,建立复杂背景下的疾病、地区、性别、年龄、体重、身高、身体质量指数、烟酒嗜好8类特征的个体识别模型,为将肠道菌群应用于疾病诊断和个体识别提供理论依据和实践探索。主要研究结果如下:

从毒品成瘾组中筛选出19个潜在菌群标志物,其中吲哚和短链脂肪酸代谢相关菌显著变化,易引起肠道炎症、肠易激综合征、抑郁等,如Prevotella_9丰度上调易加剧肠道炎症,引发精神分裂;Alistipes丰度上调易影响血清素前体色氨酸水平,导致抑郁症;Faecalibacterium丰度下调易引起炎性肠病;Dorea丰度下调使维持肠道黏膜屏障的完整性受损。

从Ⅱ型糖尿病组中筛选出22个潜在菌群标志物,其中丁酸代谢菌(Bacteroides, Faecalibacterium, Roseburia)与丙酸代谢菌(Prevotella, Megamonas, Coprococcus)丰度显著下降,提示短链脂肪酸相关菌是Ⅱ型糖尿病的关键因素;Escherichia Shigella丰度显著上升,与患者顽固性腹泻等临床症状吻合;另外糖异生和糖酵解水平显著提升,提示宿主高血糖环境与肠道菌群糖代谢活跃有关。

从帕金森病组中筛选出31个潜在菌群标志物,其中短链脂肪酸产生菌和抗炎相关菌丰度显著下降,提示肠道菌群是帕金森患者肠炎和营养不良等消化道症状的诱发原因之一;与先前日本的研究结果相反,衰老标志物Eubacterium丰度在中国样本中显著提高;此外糖异生水平显著上调,提示糖尿病是帕金森病潜在风险因素。

从胆囊炎病组中筛选出33个潜在菌群标志物,其中Escherichia-Shigella丰度占比高达32.59%,推测为胆囊炎患者的长期腹泻的主要原因;钴胺素生物合成水平显著降低,导致甲基丙二酸升高,减少对脂肪酸合成的抑制,提示其与脂肪酸与胆固醇代谢异常有关。

单疾病诊断模型对毒品成瘾诊断准确率达91.01%,Ⅱ型糖尿病达95.55%,帕金森综合征达93.79%,胆囊炎达94.67%,诊断准确率达到国内外先进水平。多因素混杂的个体识别模型对疾病的识别准确率达77%,性别识别准确率达73%,地区识别准确率达76%,年龄回归模型误差为±8.4年,身高模型误差为±6.7cm,体重模型误差为±9.6kg,BMI模型误差为±2.7。

本研究首次对中国四个地区的四种疾病肠道菌群同时进行多样性和功能的系统分析,并对疾病、性别、地区、年龄、体重、身高、烟酒嗜好和BMI指数8项特征,建立了4个疾病诊断模型和7个个体识别模型,所建模型有较好的单疾病诊断能力和疾病、性别、地区因素的个体识别能力,为疾病早筛与预警、辅助诊断、粪菌移植治疗、法医个体识别鉴定、嫌疑人追踪等应用领域奠定坚实的理论基础。

Other Abstract

Gut microbiota, a lifelong symbiotic organism with humans, is often referred to as the "second genome" due to its close relationship, diverse species and large population. Previous research has demonstrated varying degrees of association between the gut microbiota and numerous diseases. Furthermore, studies have revealed that the gut microbiota possesses unique "fingerprint features," and it can be used to diagnose and identify individual-specific information such as health condition, gender and age. However, it is regrettable that current domestic and international research struggled with issues such as limited sample sources, simplified host backgrounds, and inadequate analysis methods. The majority of studies focus on the diagnosis and identification of single impact factor cases, while research on the diagnosis and identification of complex mixed microbial communities with diverse sample sources, various diseases, and significant individual variations is relatively limited. In light of these circumstances, this study selected four diseases, namely drug use disorder, type 2 diabetes, Parkinson's syndrome, and cholecystitis, as research targets. The study collected gut microbiota samples from 1,072 volunteers in four different regions (Shanghai, Yunnan, Gansu, and Shandong) with variations in gender, age, height, weight, and other factors. Based on 16S rRNA gene sequencing, a total of 16 machine learning algorithms and 20 regression classification methods were employed. Firstly, individual disease diagnosis models were constructed. Subsequently, by combining the aforementioned microbiota sequencing data, an individual identification model was established for the complex backgrounds of diseases, regions, genders, ages, weights, heights, body mass indexes, nicotine and alcohol preferrance across eight categories of single features. This research provides a theoretical foundation and practical exploration for the application of gut microbiota in disease diagnosis and individual identification. The main results were listed as follows:

19 potential bacterial markers were selected from the drug use disorder group.  Among them, indoles and short-chain fatty acid metabolism-related bacteria showed significant changes, which were likely to cause intestinal inflammation, irritable bowel syndrome, depression, etc. For example, the up-regulation of Prevotella_9 was likely to exacerbate intestinal inflammation and lead to schizophrenia. The up-regulation of Alistipes is likely to affect the level of tryptophan, a precursor of serotonin, and lead to depression. Down-regulated abundance of Faecalibacterium can cause inflammatory bowel disease. The downregulation of Dorea abundance compromises the integrity of the intestinal mucosal barrier.

22 potential bacterial markers were selected from the type 2 diabetes group. The abundances of butyric acid metabolizing bacteria (Bacteroides, Faecalibacterium, Roseburia) and propionic acid metabolizing bacteria (Prevotella, Megamonas, Coprococcus) decreased significantly, suggesting that short-chain fatty acid-related bacteria is the key factor of type 2 diabetes mellitus. Notably, Escherichia Shigella is a bacterial pathogen associated with bacterial dysentery, which aligns with clinical symptoms such as persistent diarrhea in diabetes patients. In addition, the levels of gluconeogenesis and glycolysis were significantly increased, suggesting that the host hyperglycemic environment is related to the active glucose metabolism of intestinal flora.

31 potential bacterial markers were selected from the Parkinson’s syndrome group. The abundance of short-chain fatty acid producing bacteria and anti-inflammatory related bacteria decreased significantly, suggesting that intestinal flora is one of the causes of gastrointestinal symptoms such as enteritis and malnutrition in Parkinson’s syndrome patients. Contrary to the previous results from Japan, the abundance of aging marker Eubacterium was significantly increased in our Chinese samples. In addition, the level of gluconeogenesis was significantly up-regulated, suggesting that diabetes is a potential risk factor for Parkinson’s syndrome.

33 potential bacterial markers were selected from the cholecystitis group, among them Escherichia - Shigella abundance ratio as high as 32.59%, which may be the main cause of long-term diarrhea in cholecystitis patients. Additionally, the cobalamin biosynthesis level in the cholecystitis group was significantly reduced, resulting in an increase in methylmalonic acid, reducing the inhibition of fatty acid synthesis, indicating its association with abnormalities in fatty acid and cholesterol metabolism.

The accuracy rates in disease diagnosis models are: 91.01% for drug use disorder, 95.55% for type 2 diabetes, 93.79% for Parkinson's syndrome, and 94.67% for cholecystitis. The multiple factor individual identification model achieved an accuracy of 77% for disease, 73% for gender, 76% for region. The diagnostic accuracy rate reaches the advanced level at home and abroad. The age regression model had an error of ±8.4 years, the height model had an error of ±6.7 cm, the weight model had an error of ±9.6 kg, and the BMI model had an error of ±2.7.

This study is the first to systematically analyze of gut microbiota diversity and function in four diseases from four regions of China at the same time. It established 4 disease diagnostic models and 7 individual identification models including  disease, gender, region, age, weight, height, smoking and alcohol habits, and BMI . These models demonstrated certain capabilities in disease diagnosis and individual identification, which lays a solid theoretical foundation for the application fields of early screening and early warning of disease, auxiliary diagnosis, fecal transplantation treatment, forensic individual identification and appraisal, and suspect tracking.

Subject Area医药微生物
MOST Discipline Catalogue理学 - 生物学 - 微生物学
URL查看原文
Language中文
Other Code262010_120190905440
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/538012
Collection生命科学学院
Affiliation
兰州大学生命科学学院
Recommended Citation
GB/T 7714
刘耘汀. 基于机器学习的肠道菌群疾病诊断及个体识别建模研究[D]. 兰州. 兰州大学,2023.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[刘耘汀]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘耘汀]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘耘汀]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.