| 基于高斯混合模型聚类的变量选择及应用 |
Alternative Title | Variable Selection for Gaussian Mixture Model-Based Clustering and Its Application
|
| 陈玉雯 |
Thesis Advisor | 赵学靖
|
| 2016-05-14
|
Degree Grantor | 兰州大学
|
Place of Conferral | 兰州
|
Degree Name | 硕士
|
Keyword | 变量选择
L_无穷 −
GMM
L_无穷−
GMM
EM 算法
高维聚类分析
|
Abstract | 在高维数据的聚类分析中,由于数据维数的增加,使得传统的方法在高维数据中无法进行聚类的有效应用,因而,高维数据处理的首要问题是寻找合适的方法以降低数据的维数。本文结合变量选择的降维思想及基于高斯混合模型(GMM)聚类的方法,对惩罚型 GMM进行聚类分析及应用。含惩罚项的GMM能够找出高维数据中具有重要影响的信息变量。因此,我们首先提出 -GMM的惩罚模型,通过压缩非重要信息变量的最大均值参数,选择对聚类有重要影响的信息变量,并且采用改进的贝叶斯信息准则MBIC对模型的惩罚参数和聚类数K进行选取。其次,我们提出Adaptive L_无穷 -GMM 的惩罚模型,通过调整信息变量的惩罚参数,对重要的信息变量做较轻的惩罚,对非重要的信息变量做较重的惩罚,弥补L_无穷 -GMM对重要信息变量过度的惩罚缺陷。最后,将Adaptive L_无穷 -GMM的惩罚模型应用在生物信息数据上。结果表明:含惩罚项的GMM对高维数据做聚类分析时,可以得到有效的聚类结果和小鼠蛋白质基因表达水平的重要信息变量。 |
Other Abstract | In the high-dimensional clustering analysis, traditional methods cannot be the effective clustering application due to the increase of the data dimension. Thus, the primary problem of high-dimensional clustering is to find appropriate methods to reduce the dimension of data. This paper combined the dimension reduction of variable selection and the Gaussian mixture model-based clustering to implement the type of penalty clustering analysis and its application. Penalty GMM can find the important information of variables for the high-dimensional data. Therefore, we first proposed the penalty model of GMM to select the important information for clustering by compressing the maximum average parameters, and the modified bayesian information criterion MBIC select the penalty parameters and the cluster number K. Secondly, we put forward the Adaptive L_infinity -penalty model of GMM that do a lighter shrinkage for the unimportant variables and do the heavier shrinkage for the important variables by adjusting the penalty parameters, which can make up for the L_infinity -GMM excessive punishment of important information variables. Finally, the Adaptive L_infinity -GMM applied in the biological information data, the results show that we get effectively clustering results and mice protein gene expression levels of important information variables when the GMM clustering the high-dimensional data analysis with the penalty term. |
URL | 查看原文
|
Language | 中文
|
Document Type | 学位论文
|
Identifier | https://ir.lzu.edu.cn/handle/262010/225175
|
Collection | 数学与统计学院
|
Recommended Citation GB/T 7714 |
陈玉雯. 基于高斯混合模型聚类的变量选择及应用[D]. 兰州. 兰州大学,2016.
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.