兰州大学机构库 >数学与统计学院
稀疏主成分分析及其相关方法的研究
Alternative TitleResearch on Sparse Principal Component Analysis and Related Algorithms
王小薇
Thesis Advisor赵学靖
2016-05-17
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name学士
KeywordLasso 弹性网 SCoTLASS PCA SPCA
Abstract主成分分析(Principal Component Analysis, PCA)是研究如何将多变量问题转化为少变量问题的一种重要统计方法,在人脸识别及基因数据分析方面应用相当广泛。主要思路是对原始变量进行线性组合使得新的变量在原始变量的线性组合类中的方差最大,依次寻找下去可以得到的一系列方差递减的主成分。主成分分析不仅降低了多变量数据系统的维度以外,同时还简化了变量系统的统计数字特征。主成分分析主要在于如下两方面的优点:一方面,各主成分依次达到最大的方差,因此允许最小的信息丢失;另一个方面,各主成分之间互不相关,所以我们可以单独考虑其中某一主成分而不考虑其他主成分。但是主成分分析的明显缺点是,每一个主成分都是原始变量的线性组合而且组合系数通常都是非零的,这使得模型难以解释主成分。 Lasso是一个有前景的变量选择方法,主要是在一般线性最小二乘的前提下加约束,使各系数的绝对值之和小于某一常数。Lasso在很多的数据模拟中都很成功,但是也有许多局限性。弹性网是Lasso的推广,进一步改进了Lasso选择变量的方法。 为了得到稀疏的主成分,应用一种改进的主成分分析方法:稀疏主成分分析法(Sparse Principal Component Analysis,SPCA)。此方法通过寻找主成分变量的一个子集,从而达到稀疏的目的。其中,修正的主成分可通过基于Lasso和弹性网(Elastic Net)的简化主成分方法SCoTLASS(Simplified Component Technique)来获得。 本文主要介绍PCA与SPCA算法及上述各算法步骤,并应用于具体实例中。
Other AbstractPCA (Principal Component Analysis) is an important statistical method which can transform high-dimensional variables to low-dimensional variables, which is widely used in the face recognition and genie analysis. The main idea is giving an output of the linear combination of original variables which maximizes the variance of new variables, meanwhile, find a series of principal component to get decreasing variance. Principal Component Analysis not only reduces the dimension of multivariate data, but also simplifies the statistic characteristics of variables. The advantages of Principal Component Analysis mainly lies in the following two aspects: on the one hand, PCs achieve maximum variance, which allows the minimum information loss. On the other hand, PCs are unrelated, so we can pick up one of PCs without considering others. However, PCA has an obvious disadvantage that each PC is a linear combination of the original variables and the combination coefficients are nonzero, which makes it difficult to interpret the certain PC. The lasso is a promising method of variable selection that adds constraint to the linear least squares and decreases the sum of absolute values of each coefficient below a constant. Even the lasso is successful in a lot of data simulation, but it also has many limitations. The elastic net is a generalization of the Lasso, improving the lasso in variables selection. In order to obtain the Sparse Principal Component, we apply an improved Principal Component Analysis method: SPCA (Sparse Principal Component Analysis). This approach seeks for a subset of the principal component variables to achieve the goal. Based on the Lasso and Elastic Net via SCoTLASS, adjusted principal Component can be easily obtained. This paper mainly gives the introduction of PCA, SPCA and related algorithms. Some empirical applications are included as well.
URL查看原文
Language中文
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/224582
Collection数学与统计学院
Recommended Citation
GB/T 7714
王小薇. 稀疏主成分分析及其相关方法的研究[D]. 兰州. 兰州大学,2016.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[王小薇]'s Articles
Baidu academic
Similar articles in Baidu academic
[王小薇]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王小薇]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.