兰州大学机构库 >数学与统计学院
ROC方法在生物标记物的选择与组合中的应用
Alternative TitleSelection and combination of biomarkers using ROC method
杨帆
Thesis Advisor李周平
2014-05-22
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name学士
KeywordAUC ROC曲线 SCAD惩罚函数 sigmoid 近似
Abstract在生物医学的研究中,微阵列实验是研究基因与疾病表现型关系的主要技术手段.微阵列中每一个DNA序列都可以被视为一个潜在的生物标记物.首先,在大量的生物标记物中准确地选择与疾病表现型相关的生物标记物是非常有必要的.其次,确定所选的每一个生物标记物对疾病的影响程度,即确定生物标记物间的组合形式,也是非常重要的. 第一个问题属于变量选择的范畴,目前常用的变量选择方法有:逐步删除法(stepwise deletion)、子集选择法 (subset selection),但这些方法的缺点为计算量大、在选择变量的过程中会忽略随机误差.本文我们采用惩罚的思想,选取SCAD惩罚函数进行变量的选择. 第二个问题属于诊断试验准确性的问题.对影响程度的判断越准确,疾病的诊断结果也就越准确,诊断试验的价值也就越高.ROC曲线是目前公认的评价诊断试验准确性的最佳综合指标,而ROC曲线下的面积 (AUC) 是总结ROC曲线的重要指标.ROC曲线下的面积越大,诊断试验的准确性越高.所以我们可以通过最大化ROC曲线下的面积,最终得到各生物标记物对疾病的影响程度,即生物标记物间的组合形式. 基于SCAD惩罚函数与ROC曲线下的面积分析,我们可以同时进行生物标记物的选择与组合.对ROC曲线下的面积进行分析时,我们可以用光滑函数近似经验AUC中的示性函数,以得到光滑的经验AUC函数.本文我们通过模拟研究,对采用Gauss核近似与采用sigmoid近似这两种方法进行了比较.
Other AbstractMicroarray experiments that study an association between gene expression level and disease phenotypes have become commonplace in biomedical research. Each DNA sequence in a microarray can be considered as a potential biomarker. Firstly, it is necessary to select the biomarker that is associated with the disease phenotype from a large number of biomarkers. Secondly, it is important to ensure that how each biomarker influences the disease phenotype, that is, ensure the combination of biomarkers. For the first problem, it belongs to variable selection. Many approaches in use are stepwise deletion and subset selection. But the disadvantages of the two approaches are computationally expensive and ignoring stochastic errors in the variable selection process. So in this article, we adopt the idea of penalty and choose SCAD penalty function. The second problem is about the accuracy of diagnostic test. Ensure the influence of each biomarker on the diagnostic result. The judgment to the influence is more accurate, the result of the diagnostic test is more accurate, so the value of the diagnostic test is higher. The ROC curve is the most popular index judging the accuracy of diagnostic test and the area under the ROC curve (AUC) summaries the ROC curve satisfactorily. The AUC is larger, the result of the diagnostic test is higher. So we can get the combination of the biomarkers through maximizing the area under the ROC curve. Based on the SCAD penalty and the area under the ROC curve, we can select and combine biomarkers simultaneously. When analyze the area under the ROC curve, we can use the smoothed function to replace the indicator function of the empirical ROC to get the smoothed empirical AUC function. Through the simulation study, we compared the difference between using Gauss approximation and using sigmoid approximation.
URL查看原文
Language中文
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/225660
Collection数学与统计学院
Recommended Citation
GB/T 7714
杨帆. ROC方法在生物标记物的选择与组合中的应用[D]. 兰州. 兰州大学,2014.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[杨帆]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨帆]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨帆]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.