Wrapper feature selection
Feature Selection in Parkinson’s Disease (PPMI Data)
Use the 06_PPMI_ClassificationValidationData_Short dataset setting ResearchGroup
as class variable.
- Delete irrelevant columns (e.g.
X
, FID_IID
) and select only the PD and Control cases
- Properly convert the variables types
- Apply
Boruta
to train a model, try different parameters (e.g., try different pValue
, maxRuns
). What are the differences?
- Summarize and visualize the results
- Apply Random Feature Elimination (RFE) and tune the model size
- Evaluate the
Boruta
model performance by comparing with REF
- Report and compare the variables selected by both methods. How much overlap is there in the selected variables?
- Apply LASSO and knockoff
- Generate a table contrasting the performance of different approaches.
Regularized Linear Modeling and Controlled Variable Selection (Knockoff Filtering)
Use the Heart Attack (CaseStudy12_ AdultsHeartAttack) data to:
- Identify and impute any missing values
- Use the
DIAGNOSIS
as a clinically relevant outcome variable
- Randomly split the data into training (70%) and testing (30%) sets
- Use the LASSO model to standardize the predictors and report the model results
- Optimize the choice of the regularization parameter
- Apply cross validation to report internal statistical validity of the model
- Report and compare the OLS, Stepwise OLS with AIC, Ridge and LASSO coefficient estimates
- Calculate the predicted values for all 4 models and report the models performance
- Apply knockoff filtering to control the false variable selection rate
- Compare the variables selected by Stepwise OLS, LASSO and knockoff
- Apply Bootstrap LASSO and knockoff, and compare the results.