SOCR ≫ | DSPA ≫ | Topics ≫ |
Use the ALS dataset. This case-study examines the patterns, symmetries, associations and causality in a rare but devastating disease, amyotrophic lateral sclerosis (ALS). A major clinically relevant question in this biomedical study is: What patient phenotypes can be automatically and reliably identified and used to predict the change of the ALSFRS slope over time?. This problem aims to explore the data set by unsupervised learning.
Load and prepare the data.
Perform summary and preliminary visualization.
Train a k-Means model on the data, select \(k\) as we mentioned in Chapter 12.
Evaluating the model performance by report the center of clusters and silhouette and explain details (since 100 dimensions, it is messy to use bar plot show the centers).
Tune parameters and plot with k-means++.
Rerun the model with optimal parameters and interpret the clustering results.
Apply Hierarchical Clustering on three different linkages and compare the corresponding Silhouette plots.
Fit a Gaussian mixture model, select the optimal model and draw BIC and Silhouette plots.(Hint, you need to sample part of data or it could be very time consuming).
Compare the result of the above methods.
Try some additional datasets from the list of our Case-Studies.