兰州大学机构库 >数学与统计学院
高维数据变量筛选方法的若干模拟研究
Alternative TitleSome Simulation Studies on High-dimensional Variable Selection Methods
金开秀
Thesis Advisor李周平
2015-05-12
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name学士
Keyword多元线性回归模型 变量筛选 Lasso 岭回归 Elastic Net SCAD
Abstract在大数据的背景下,从海量数据中挑选一定数量自变量建立模型,进而提高分析结果的准确性和对现实问题的解释能力是个非常值得探索的研究方向。Lasso的提出和发展极大地促进了统计分析在这个领域的进步。对于线性回归中变量选择的问题,最初的解决思路是最小二乘方法给出的。岭回归首开先河地舍弃了最小二乘中无偏性特征,通过收缩回归系数来保持模型的稳定性。在岭回归的基础上,Lasso方法增大了收缩的力度使得某些变量前的系数减少到0,得到满足稀疏假定的回归模型。但是Lasso方法在数据具有共线性和变量个数大于观测次数的等情况下无法获得较佳表现,后续的Adaptive Lasso、Elastic Net和SCAD等方法在不同方面上改进了Lasso方法。 本文第一章回顾了近20年来变量选择方面的研究进展,第二章介绍了多元回归模型和Lasso及其相关方法的预备知识。第三章是数值模拟分析部分,主要比较了岭回归、Lasso、SCAD、Elastic Net这四种方法在白噪音强度逐渐增大、数据共线性程度不断增强情况下的表现,用5个指标进行全方面的衡量,探讨各种方法的优势所在。文章最后对全文进行了总结并提出了一些有待进一步研究的问题。
Other AbstractIn the background of big data, high-dimensional variable selection is worth exploring in statistical modeling. Regression Models with appropriate number of variables can be an important argument of the actual processing, thereby improving the explanatory and predictive statistical model accuracy. Lasso and its associated data processing algorithms help us recognize the nature of the model chosen is a poor selling process. For linear regression variable selection problem, method of least square gives an unbiased estimate of the coefficient. When take the stability and interpretability into account, method of least square is inappropriate. Ridge regression firstly give up unbiased estimated, instead of contraction and stability of regression coefficient regression model. On the basis of the ridge regression, Lasso method increases the strength of contraction and makes a certain part of coefficients reduced to zero, that is, variable selection. But Lasso method of data has no way to get better performance when collinearity is obvious and the number of variables is greater than the number of observations. Subsequently, Adaptive Lasso, Elastic Net and SCAD are some of the corrective method to overcome the limitations of Lasso in varying degrees. The first chapter reviews the progress of ordinary linear model in variable selection. Chapter two introduces the multiple regression model and preliminary knowledge of Lasso correlated methods. The third chapter is a numerical simulation analysis section which compares the performance of ridge regression, Lasso, SCAD, Elastic Net those four methods when increasing the intensity of the white noise and enhancing the performance of co-linearity. Their performances are measured by five indicators in order to explore the advantages of each method. Numerical simulation results show that in the process of actual data, it should be combined with the advantages of different methods. Finally, a summary of the full text and put forward some issues for further study.
URL查看原文
Language中文
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/225385
Collection数学与统计学院
Recommended Citation
GB/T 7714
金开秀. 高维数据变量筛选方法的若干模拟研究[D]. 兰州. 兰州大学,2015.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[金开秀]'s Articles
Baidu academic
Similar articles in Baidu academic
[金开秀]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[金开秀]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.