SOCR ≫ | DSPA ≫ | Topics ≫ |
Use the kNN algorithm to provide a classification of the TBI SOCR data. DIchotomize the field.gcs
outcome variable by field.gcs>=7
. Determine an appropriate \(k\), train and evaluate the performance of the classification model on the data. Report some model quality statistics for a couple of different values of \(k\), and use these to rank-order (and perhaps plot the classification results of) the models.
Use 05_PPMI_top_UPDRS_Integrated_LongFormat data to practice kNN classification.
Index
and FID_IID
`VisitID
column; convert the response variable ResearchGroup
to bipolar factor(consider SWEDD
as disease); detect NA values (impute if necessary).str
, summary
, cor
, ggpairs
.log(x-min(x))
and discretize either 0 or 1.set.seed
and random sample
, \(train:test = 2:1\).caret::knn.tuning
or caret::train
to verify the results (Hint: select the same folds, or you may get slightly different results).Try all the above again but select only the variables: UPDRS_Part_I_Summary_Score_Baseline
, UPDRS_Part_I_Summary_Score_Month_24
, UPDRS_Part_II_Patient_Questionnaire_Summary_Score_Baseline
, UPDRS_Part_II_Patient_Questionnaire_Summary_Score_Month_24
, UPDRS_Part_III_Summary_Score_Baseline
, UPDRS_Part_III_Summary_Score_Month_24
, as predictors. Now, what about the specific \(k\) you select and the error rates for each kind of data (original data, normalized data, log-transformed data, and binary data). Comment on any interesting observations.