1 ABIDE Case-study

These data include imaging, clinical, genetics and phenotypic data for over \(1,000\) pediatric cases - Autism Brain Imaging Data Exchange (ABIDE).

Apply several models (e.g., C5.0, k-Means, linear models, neural nets) to predict the clinical diagnosis using part of the data (training data)
Evaluate the model’s performance, using confusion matrices, accuracy, \(\kappa\), precision and recall, F-measure, etc.
Evaluate, compare and interpret the results
Use the ROC to examine the tradeoff between detecting true positives and avoiding the false positives and report AUC
Finally, apply cross validation on C5.0 and report CV error.

2 Assessing Model Performance

Use some of the methods below to do classification, prediction, and model performance evaluation on one of the datasets included in DSPA Case-Studies 31-35.

Model	Learning Task	Method	Parameters
KNN	Classification	`knn`	`k`
Naive Bayes	Classification	`nb`	`fL, usekernel`
Decision Trees	Classification	`C5.0`	`model, trials, winnow`
OneR Rule Learner	Classification	`OneR`	None
RIPPER Rule Learner	Classification	`JRip`	`NumOpt`
Linear Regression	Regression	`lm`	None
Regression Trees	Regression	`rpart`	`cp`
Model Trees	Regression	`M5`	`pruned, smoothed, rules`
Neural Networks	Dual use	`nnet`	`size, decay`
Support Vector Machines (Linear Kernel)	Dual use	`svmLinear`	`C`
Support Vector Machines (Radial Basis Kernel)	Dual use	`svmRadial`	`C, sigma`
Random Forests	Dual use	`rf`	`mtry`

2.1 Model improvement case study

From the course datasets, use the 05_PPMI_top_UPDRS_Integrated_LongFormat1.csv case-study to perform a multi-class prediction. Use ResearchGroup as an outcome response, which includes three classes: “PD”,“Control” and “SWEDD” .

Delete the ID column, impute the missing values using feature mean or median (justify your choice)
Normalize the covariates
Implement automated parameter tuning process and report the optimal accuracy and \(\kappa\)
Set arguments and rerun the tuning - try different method and number settings
Train a random forest classifier and tune the parameters, report the result and the cross table
Use a bagging algorithm and report the accuracy and \(\kappa\)
Perform a random Forest classification and report the accuracy and \(\kappa\)
Report the accuracy of AdaBoost
Finally, give a brief summary about all the model improvement approaches.

Try similar protocols on other data in the list of Case-Studies, e.g., Traumatic Brain Injury Study and the corresponding dataset.

3 Cross-validation

Use each of the following two case-studies

Example 1: ALS (Amyotrophic Lateral Sclerosis)
Example 2: Quality of Live in Chronic Illness (Case06_QoL_Symptom_ChronicIllness.csv)

to implement and test the following protocol

Review each case-study
Choose appropriate dichotomous, polytomous, or continuous outcome variables, e.g., use ALSFRS_slope for ALS, CHRONICDISEASESCORE for case 06 and cast as an outcome dichotomous outcome
Apply appropriate data preprocessing
Perform regression modeling for continuous outcomes
Perform classification and prediction using various methods (LDA, QDA, AdaBoost, SVM, Neural Network, KNN) for discrete outcomes
Apply cross-validation on these regression and classification methods, respectively
Report standard error for regression approaches
Report appropriate quality metrics that can be used to rank the forecasting approaches based on the predictive power of the corresponding prediction/classification results
Compare the results of model-driven and data-driven (e.g., KNN) methods
Compare sensitivity and specificity, respectively
Use unsupervised clustering methods (e.g., k-Means) and spectral clustering
Evaluate and justify k-Means model and detect how agreement of the clusters with labels
Report the classification error of k-means and compare it against the result of k-means++.

DSPA2: Data Science and Predictive Analytics (UMich HS650)

Assessment: 9. Model Performance Assessment, Validation, and Improvement

SOCR/MIDAS (Ivo Dinov)

March 2022

1 ABIDE Case-study

2 Assessing Model Performance

2.1 Model improvement case study

3 Cross-validation