1 Wrapper feature selection

2 Feature Selection in Parkinson’s Disease (PPMI Data)

Use the 06_PPMI_ClassificationValidationData_Short dataset setting ResearchGroup as class variable.

  • Delete irrelevant columns (e.g. X, FID_IID) and select only the PD and Control cases
  • Properly convert the variables types
  • Apply Boruta to train a model, try different parameters (e.g., try different pValue, maxRuns). What are the differences?
  • Summarize and visualize the results
  • Apply Random Feature Elimination (RFE) and tune the model size
  • Evaluate the Boruta model performance by comparing with REF
  • Report and compare the variables selected by both methods. How much overlap is there in the selected variables?
  • Apply LASSO and knockoff
  • Generate a table contrasting the performance of different approaches.

3 Regularized Linear Modeling and Controlled Variable Selection (Knockoff Filtering)

Use the Heart Attack (CaseStudy12_ AdultsHeartAttack) data to:

  • Identify and impute any missing values
  • Use the DIAGNOSIS as a clinically relevant outcome variable
  • Randomly split the data into training (70%) and testing (30%) sets
  • Use the LASSO model to standardize the predictors and report the model results
  • Optimize the choice of the regularization parameter
  • Apply cross validation to report internal statistical validity of the model
  • Report and compare the OLS, Stepwise OLS with AIC, Ridge and LASSO coefficient estimates
  • Calculate the predicted values for all 4 models and report the models performance
  • Apply knockoff filtering to control the false variable selection rate
  • Compare the variables selected by Stepwise OLS, LASSO and knockoff
  • Apply Bootstrap LASSO and knockoff, and compare the results.
