Explain the following concepts
- Information Gain Measure
- Impurity
- Entropy
- Gini
Decision Tree Partitioning
Use the SOCR Neonatal Pain data to build and display a decision tree recursively partitioning the data using the provided features and attributes to split the data into clusters.
- Create two classes using variable Cluster
- Create random training and test datasets
- Train a decision tree model on the data, use
C5.0
and rpart
, separately
- Evaluate the model performance and compare the
C5.0
and rpart
results
- Tune the parameter for
rpart
and evaluate again
- Make predictions on testing data and assess the prediction accuracy - report the confusion matrix
- Comment on the classification performance
- Try to apply Random Forest classification and report the variables importance plot, predictions on testing data, and assess the prediction accuracy.
SOCR Resource Visitor
number