| SOCR ≫ | DSPA ≫ | Topics ≫ |
Use the kNN algorithm to provide a classification of the TBI SOCR data. DIchotomize the field.gcs outcome variable by field.gcs>=7. Determine an appropriate \(k\), train and evaluate the performance of the classification model on the data. Report some model quality statistics for a couple of different values of \(k\), and use these to rank-order (and perhaps plot the classification results of) the models.
Use 05_PPMI_top_UPDRS_Integrated_LongFormat data to practice kNN classification.
Index and FID_IID `VisitID column; convert the response variable ResearchGroup to bipolar factor(consider SWEDD as disease); detect NA values (impute if necessary).str, summary, cor, ggpairs.log(x-min(x)) and discretize either 0 or 1.set.seed and random sample, \(train:test = 2:1\).caret::knn.tuning or caret::train to verify the results (Hint: select the same folds, or you may get slightly different results).Try all the above again but select only the variables: UPDRS_Part_I_Summary_Score_Baseline, UPDRS_Part_I_Summary_Score_Month_24, UPDRS_Part_II_Patient_Questionnaire_Summary_Score_Baseline, UPDRS_Part_II_Patient_Questionnaire_Summary_Score_Month_24, UPDRS_Part_III_Summary_Score_Baseline, UPDRS_Part_III_Summary_Score_Month_24, as predictors. Now, what about the specific \(k\) you select and the error rates for each kind of data (original data, normalized data, log-transformed data, and binary data). Comment on any interesting observations.