Regression Forecasting for Numerical Data
Use the Quality of Life data (Case06_QoL_Symptom_ChronicIllness) to fit several different Multiple Linear Regression models predicting clinically relevant outcomes, e.g., Chronic Disease Score
. Complete the following protocol:
- Collect data and preprocess it carefully.
- Summarize and visualize the data using
summary
, str
, pairs.panels
, ggplot
.
- Report correlations for numerical features and try to visualize these associations (e.g heatmap, pairs plot etc.)
- Examine potential dependences of the predictors and the dependent response variable.
- Fit several Multiple Linear Regression models, report your results, and explain the summary, residuals, effect-size coefficients and the coefficient of determination, \(R^2\).
- Draw various model diagnostic plots, including QQ plot, residuals plot and leverage plot (half norm plot).
- Interpret the results in terms of the data.
- Predict the outcomes for new data and assess the prediction using several criteria (e.g.,correlation coefficient, MSRE, etc.)
- Try to improve the model performance using the
step
function and interpret both AIC and BIC.
- Fit a regression tree model, visualize the model and compare it with the earlier OLS model.
- Use
M5P
in RWeka
to obtain a better model.
SOCR Resource Visitor
number