1 Traumatic Brain injury

Use the kNN algorithm to provide a classification of the data in the TBI case study, (CaseStudy11_TBI). Determine an appropriate k, train and evaluate the performance of the classification model on the data. Report some model quality statistics for a couple of different values of k and use these to rank-order (and perhaps plot the classification results of) the models.

2 Parkinson’s Disease

Use 05_PPMI_top_UPDRS_Integrated_LongFormat1 data to practice KNN classification.

3 High Dimensional Space KNN Classification

Preprocess the data: delete the index and ID columns; convert the response variable ResearchGroup to binary 0-1 factor; detect NA (missing) values (impute if necessary)
Summarize the dataset: use str, summary, cor, ggpairs
Scale/Normalize the data: As appropriate, scale to 0 to 1; transform \(log(x+1)\); discretize (0 or 1)
Partition data into training and testing sets: use set.seed and random sample, train:test = 2:1
Select the optimal \(k\) for each of the scaled data: Plot a error graph for \(k\), including three lines: training_error, cross-validation_error and testing_error, respectively
What is the impact of \(k\)?: Formulate a hypothesis about the relation between \(k\) and the error rates. You can try to use knn.tunning to verify the results (Hint: select the same folds, all you may obtain a result slightly different)
Interpret the results: Hint: Considering the number of dimension of the data, how many points are necessary to obtain the same density result for 100 dimensional space compared to a 1 dimensional space?
Report the error rates for both the training and the testing data. What do you find?

4 Lower Dimension Space KNN Classification

Try the above protocol again but select only columns 1 to 5 as predictors (after deleting the index and ID columns). Now, what about the \(k\) you select and the error rates for each kind of scaled data (original data, normalized data)? Comment on any interesting observations.

SOCR Resource Visitor number

Data Science and Predictive Analytics (UMich HS650)

Assignment 6: Lazy Learning - Classification Using Nearest Neighbors