兰州大学机构库 >数学与统计学院
基于高斯混合模型聚类的变量选择及应用
Alternative TitleVariable Selection for Gaussian Mixture Model-Based Clustering and Its Application
陈玉雯
Thesis Advisor赵学靖
2016-05-14
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name硕士
Keyword变量选择 L_无穷 − GMM L_无穷− GMM EM 算法 高维聚类分析
Abstract在高维数据的聚类分析中,由于数据维数的增加,使得传统的方法在高维数据中无法进行聚类的有效应用,因而,高维数据处理的首要问题是寻找合适的方法以降低数据的维数。本文结合变量选择的降维思想及基于高斯混合模型(GMM)聚类的方法,对惩罚型 GMM进行聚类分析及应用。含惩罚项的GMM能够找出高维数据中具有重要影响的信息变量。因此,我们首先提出 -GMM的惩罚模型,通过压缩非重要信息变量的最大均值参数,选择对聚类有重要影响的信息变量,并且采用改进的贝叶斯信息准则MBIC对模型的惩罚参数和聚类数K进行选取。其次,我们提出Adaptive L_无穷 -GMM 的惩罚模型,通过调整信息变量的惩罚参数,对重要的信息变量做较轻的惩罚,对非重要的信息变量做较重的惩罚,弥补L_无穷 -GMM对重要信息变量过度的惩罚缺陷。最后,将Adaptive L_无穷 -GMM的惩罚模型应用在生物信息数据上。结果表明:含惩罚项的GMM对高维数据做聚类分析时,可以得到有效的聚类结果和小鼠蛋白质基因表达水平的重要信息变量。
Other AbstractIn the high-dimensional clustering analysis, traditional methods cannot be the effective clustering application due to the increase of the data dimension. Thus, the primary problem of high-dimensional clustering is to find appropriate methods to reduce the dimension of data. This paper combined the dimension reduction of variable selection and the Gaussian mixture model-based clustering to implement the type of penalty clustering analysis and its application. Penalty GMM can find the important information of variables for the high-dimensional data. Therefore, we first proposed the penalty model of GMM to select the important information for clustering by compressing the maximum average parameters, and the modified bayesian information criterion MBIC select the penalty parameters and the cluster number K. Secondly, we put forward the Adaptive L_infinity -penalty model of GMM that do a lighter shrinkage for the unimportant variables and do the heavier shrinkage for the important variables by adjusting the penalty parameters, which can make up for the L_infinity -GMM excessive punishment of important information variables. Finally, the Adaptive L_infinity -GMM applied in the biological information data, the results show that we get effectively clustering results and mice protein gene expression levels of important information variables when the GMM clustering the high-dimensional data analysis with the penalty term.
URL查看原文
Language中文
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/225175
Collection数学与统计学院
Recommended Citation
GB/T 7714
陈玉雯. 基于高斯混合模型聚类的变量选择及应用[D]. 兰州. 兰州大学,2016.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[陈玉雯]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈玉雯]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈玉雯]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.