SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |

These data include imaging, clinical, genetics and phenotypic data for over \(1,000\) pediatric cases - Autism Brain Imaging Data Exchange (ABIDE).

- Apply several models (e.g., C5.0, k-Means, linear models, neural nets) to predict the clinical diagnosis using part of the data (training data)
- Evaluate the model’s performance, using confusion matrices, accuracy, \(\kappa\), precision and recall, F-measure, etc.
- Evaluate, compare and interpret the results
- Use the ROC to examine the tradeoff between detecting true positives and avoiding the false positives and report AUC
- Finally, apply cross validation on C5.0 and report CV error.

Use some of the methods below to do classification, prediction, and model performance evaluation on one of the datasets included in DSPA Case-Studies 31-35.

Model |
Learning Task |
Method |
Parameters |
---|---|---|---|

KNN | Classification | `knn` |
`k` |

Naive Bayes | Classification | `nb` |
`fL, usekernel` |

Decision Trees | Classification | `C5.0` |
`model, trials, winnow` |

OneR Rule Learner | Classification | `OneR` |
None |

RIPPER Rule Learner | Classification | `JRip` |
`NumOpt` |

Linear Regression | Regression | `lm` |
None |

Regression Trees | Regression | `rpart` |
`cp` |

Model Trees | Regression | `M5` |
`pruned, smoothed, rules` |

Neural Networks | Dual use | `nnet` |
`size, decay` |

Support Vector Machines (Linear Kernel) | Dual use | `svmLinear` |
`C` |

Support Vector Machines (Radial Basis Kernel) | Dual use | `svmRadial` |
`C, sigma` |

Random Forests | Dual use | `rf` |
`mtry` |

From the course datasets, use the 05_PPMI_top_UPDRS_Integrated_LongFormat1.csv case-study to perform a multi-class prediction. Use `ResearchGroup`

as an outcome response, which includes three classes: “PD”,“Control” and “SWEDD” .

- Delete the ID column, impute the missing values using feature mean or median (justify your choice)
- Normalize the covariates
- Implement automated parameter tuning process and report the optimal accuracy and \(\kappa\)
- Set arguments and rerun the tuning - try different
`method`

and`number`

settings - Train a random forest classifier and tune the parameters, report the result and the cross table
- Use a bagging algorithm and report the accuracy and \(\kappa\)
- Perform a random Forest classification and report the accuracy and \(\kappa\)
- Report the accuracy of AdaBoost
- Finally, give a brief summary about all the model improvement approaches.

Try similar protocols on other data in the list of Case-Studies, e.g., Traumatic Brain Injury Study and the corresponding dataset.

Use each of the following two case-studies

- Example 1: ALS (Amyotrophic Lateral Sclerosis)
- Example 2: Quality of Live in Chronic Illness (Case06_QoL_Symptom_ChronicIllness.csv)

to implement and test the following protocol

- Review each case-study
- Choose appropriate dichotomous, polytomous, or continuous outcome variables, e.g., use
`ALSFRS_slope`

for ALS,`CHRONICDISEASESCORE`

for*case 06*and cast as an outcome dichotomous outcome - Apply appropriate data preprocessing
- Perform regression modeling for continuous outcomes
- Perform classification and prediction using various methods (LDA, QDA, AdaBoost, SVM, Neural Network, KNN) for discrete outcomes
- Apply
*cross-validation*on these regression and classification methods, respectively - Report standard error for regression approaches
- Report appropriate quality metrics that can be used to rank the forecasting approaches based on the predictive power of the corresponding prediction/classification results
- Compare the results of model-driven and data-driven (e.g., KNN) methods
- Compare sensitivity and specificity, respectively
- Use unsupervised clustering methods (e.g.,
*k-Means*) and spectral clustering - Evaluate and justify
*k-Means*model and detect how agreement of the clusters with labels - Report the classification error of
*k-means*and compare it against the result of*k-means++*.