Data Science and Predictive Analytics (UMich HS650)

Demonstrate cross validation on these two case-studies independently:

Go through the following protocol:

Review each case-study.
Choose appropriate dichotomous, polytomous or continuous outcome variables, e.g., use ALSFRS_slope for ALS, CHRONICDISEASESCORE(cutoff at 1.2) for Case06_QoL_Symptom_ChronicIllness.csv and binarize the outcome.
Apply proper data preprocessing.
Perform regression modeling (OLS, glmnet, Forward or Backward model selection, etc.) for continuous outcomes.
Perform classification and prediction using various methods (e.g., LDA, QDA, AdaBoost, SVM, Neural Network, KNN) for discrete outcomes.
Apply cross-validation on these regression and classification methods, respectively.
Report standard error for the regression type approaches.
Report appropriate quality metrics that can be used to rank the forecasting approaches based on the predictive power of their results.
Compare the results of model-driven and data-driven (e.g., KNN) techniques.
Compare sensitivity and specificity.
Use unsupervised classification methods, e.g., k-means and spectral clustering.
Evaluate and justify the k-means model and detect the level of agreement the model and the real clusters labels.
Report the discrepancy (difference of agreement) between k-means and k-mean++, also including the diagnosis of k-mean++.

SOCR Resource Visitor number

Assessment: 20. Prediction and Internal Statistical Cross Validation