SOCR ≫ DSPA ≫ DSPA2 Topics ≫

1 ABIDE Case-study

These data include imaging, clinical, genetics and phenotypic data for over \(1,000\) pediatric cases - Autism Brain Imaging Data Exchange (ABIDE).

  • Apply several models (e.g., C5.0, k-Means, linear models, neural nets) to predict the clinical diagnosis using part of the data (training data)
  • Evaluate the model’s performance, using confusion matrices, accuracy, \(\kappa\), precision and recall, F-measure, etc.
  • Evaluate, compare and interpret the results
  • Use the ROC to examine the tradeoff between detecting true positives and avoiding the false positives and report AUC
  • Finally, apply cross validation on C5.0 and report CV error.

2 Assessing Model Performance

Use some of the methods below to do classification, prediction, and model performance evaluation on one of the datasets included in DSPA Case-Studies 31-35.

Model Learning Task Method Parameters
KNN Classification knn k
Naive Bayes Classification nb fL, usekernel
Decision Trees Classification C5.0 model, trials, winnow
OneR Rule Learner Classification OneR None
RIPPER Rule Learner Classification JRip NumOpt
Linear Regression Regression lm None
Regression Trees Regression rpart cp
Model Trees Regression M5 pruned, smoothed, rules
Neural Networks Dual use nnet size, decay
Support Vector Machines (Linear Kernel) Dual use svmLinear C
Support Vector Machines (Radial Basis Kernel) Dual use svmRadial C, sigma
Random Forests Dual use rf mtry

2.1 Model improvement case study

From the course datasets, use the 05_PPMI_top_UPDRS_Integrated_LongFormat1.csv case-study to perform a multi-class prediction. Use ResearchGroup as an outcome response, which includes three classes: “PD”,“Control” and “SWEDD” .

  • Delete the ID column, impute the missing values using feature mean or median (justify your choice)
  • Normalize the covariates
  • Implement automated parameter tuning process and report the optimal accuracy and \(\kappa\)
  • Set arguments and rerun the tuning - try different method and number settings
  • Train a random forest classifier and tune the parameters, report the result and the cross table
  • Use a bagging algorithm and report the accuracy and \(\kappa\)
  • Perform a random Forest classification and report the accuracy and \(\kappa\)
  • Report the accuracy of AdaBoost
  • Finally, give a brief summary about all the model improvement approaches.

Try similar protocols on other data in the list of Case-Studies, e.g., Traumatic Brain Injury Study and the corresponding dataset.

3 Cross-validation

Use each of the following two case-studies

to implement and test the following protocol

  • Review each case-study
  • Choose appropriate dichotomous, polytomous, or continuous outcome variables, e.g., use ALSFRS_slope for ALS, CHRONICDISEASESCORE for case 06 and cast as an outcome dichotomous outcome
  • Apply appropriate data preprocessing
  • Perform regression modeling for continuous outcomes
  • Perform classification and prediction using various methods (LDA, QDA, AdaBoost, SVM, Neural Network, KNN) for discrete outcomes
  • Apply cross-validation on these regression and classification methods, respectively
  • Report standard error for regression approaches
  • Report appropriate quality metrics that can be used to rank the forecasting approaches based on the predictive power of the corresponding prediction/classification results
  • Compare the results of model-driven and data-driven (e.g., KNN) methods
  • Compare sensitivity and specificity, respectively
  • Use unsupervised clustering methods (e.g., k-Means) and spectral clustering
  • Evaluate and justify k-Means model and detect how agreement of the clusters with labels
  • Report the classification error of k-means and compare it against the result of k-means++.
SOCR Resource Visitor number Web Analytics SOCR Email