LISA Statistics Short Course: Model selection in R featuring the lasso
LISA SHORT COURSES IN STATISTICS
LISA (Virginia Tech's Laboratory for Interdisciplinary Statistical Analysis) is providing a series of evening short courses to help graduate students use statistics in their research. The focus of these two-hour courses is on teaching practical statistical techniques for analyzing or collecting data. See www.lisa.stat.vt.edu/?q=short_courses for instructions on how to REGISTER and to learn more.
Summer 2013 Schedule:
Monday, June 10: Structural Equation Modeling (SEM) Using AMOS;
Monday, June 17: Designing Experiments and Collecting Useful Data*;
Monday, June 24 & Tuesday, June 25: Basics of R;
Monday, July 1 & Tuesday, July 2: Statistical Analysis in R;
Monday, July 8 & Tuesday, July 9: Graphing with R;
Monday, July 15: SAS Programming I;
Tuesday, July 16: SAS Programming II;
Monday, July 22: Model selection in R featuring the lasso;
*This course will be held in Fralin Auditorium, all other courses are in 3060 Torgersen Hall.
Monday, July 22, 5:15-7:15 pm;
Instructor: Dr. Chris Franck;
Location: 3060 Torgersen Hall;
Title: Model selection in R featuring the lasso;
The purpose of statistical model selection is to identify a parsimonious model, which is a model that is as simple as possible while maintaining good predictive ability over the outcome of interest. Parsimony is a fundamental concept in statistical modeling for a wide variety of fields, and many model selection and variable subset selection approaches have been proposed. The lasso, or "least absolute shrinkage and selection operator," provides a method of continuous subset selection. Rather than completely including or excluding predictors, the lasso shrinks the magnitude of unimportant predictors and even has the ability to drive coefficients to zero for variables which have low predictive value for the response.
Implementing the lasso requires more technical groundwork compared with simpler subset selection or information criteria-based routines such as forward, backward, or stepwise selection and AIC or BIC. However, the lasso approach avoids some of the high variability associated with subset selection and is computationally cheaper to implement than information criteria when the number of candidate predictors is large (see e.g. Hastie, Tibshirani, Friedman 2009).
This short course includes lecture and computer laboratory components. In the lecture component the mathematical formulation of the lasso approach will be briefly motivated, compared, and contrasted with other methods including ordinary least squares, ridge regression, stepwise selection, and information criteria. During the laboratory portion the lasso approach will be implemented using R on a classic prostate data set (Stamey, et al. 1989), which includes 9 clinical measurements on 97 men. Specification of the lasso tuning parameter will be discussed and demonstrated via cross validation, which is another important modeling concept. This course covers more advanced content than other LISA short courses and assumes basic R coding ability and familiarity with regression and model selection. A schedule of available LISA short courses may be found here: www.lisa.stat.vt.edu/?q=short_courses
The Lasso Page: www-stat.stanford.edu/~tibs/lasso.html
Download R: www.r-project.org
Hastie T, Tibshirani T, Friedman J. The elements of statistical learning: data mining, inference, and prediction, 2009. Springer.
Stamey TA, Kabalin JN, McNeal JE, Johnstone IM, Freiha FS, Redwine EA, and Yang N: Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. J Urol.141: 1076-1083, 1989.
Follow us on Facebook (www.facebook.com/Statistical.collaboration) or Twitter (www.twitter.com/LISA_VT) to be the first to know about LISA events!