Explain these concepts:
- Information Gain Measure
- Impurity
- Entropy
- Gini
Decision Tree Partitioning
Load the SOCR Neonatal Infant Pain score data and follow these steps:
- Collect and preprocessing the data, e.g., data conversion and variable selection.
- Randomly split the data into training and testing sets.
- Train decision tree models on the data using
C5.0
and rpart
.
- Evaluate and compare the two models.
- Tune the
rpart
parameter and repeat the evaluation and comparison again.
- Assess the prediction accuracy and report the confusion matrix.
- Comment on different aspects of the prediction performance.
- Use various impurity measures and re-estimate the models.
- Try to use the
RWeka
package to train decision models and compare the results.
- Try to apply Random Forest and obtain variables importance plot.
SOCR Resource Visitor
number