SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
Use the kNN algorithm to provide a classification of the data in the TBI case study, (CaseStudy11_TBI). Determine an appropriate k, train and evaluate the performance of the classification model on the data. Report some model quality statistics for a couple of different values of k and use these to rank-order (and perhaps plot the classification results of) the models.
Use the 05_PPMI_top_UPDRS_Integrated_LongFormat1 data to practice kNN classification.
index
and ID
columns; convert the response variable ResearchGroup
to binary 0-1 factor; detect NA
(missing) values (impute if necessary)str
, summary
, cor
, ggpairs
set.seed
and random sample
, train:test = 2:1knn.tunning
to verify the results (Hint: select the same folds, all you may obtain a result slightly different)Try the above protocol again but select only columns 1 to 5 as predictors (after deleting the index and ID columns). Now, what about the \(k\) you select and the error rates for each kind of scaled data (original data, normalized data)? Comment on any interesting observations.
Load the SOCR 2011 US Job Satisfaction data. The last column (Description
) contains free text describing each job type. Notice that spaces are replaced by underscores, __
. To mine the text field and suggest some meta-data analytics, construct an R protocol for:
Stress_Category
and Hiring_Potential
.sparsity
of a matrix is the fraction: \(Sparsity(A) =\frac{\text{number of zero-valued elements}}{\text{total number of matrix elements (} m\times n\text{)}}\).Use the SOCR Neonatal Pain data to build and display a decision tree recursively partitioning the data using the provided features and attributes to split the data into clusters.
C5.0
and rpart
, separatelyC5.0
and rpart
resultsrpart
and evaluate again