| SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
This is Part 2 of the larger DSPA Visualization Chapter, which is difficult to render in a single browser window due to extreme memory demands. Visualization Chapter Part 1 includes data handling, statistical measures of centrality and dispersion, understanding categorical and numeric data, uniform and normal distributions, missing data imputation, web page parsing, visualization of tabular HTML data, and cohort-rebalancing (for imbalanced groups).
In this chapter, we will present a number of complementary strategies for data wrangling, harmonization, manipulation, aggregation, visualization, and graphical exploration. Specifically, we will discuss alternative methods for loading and saving computable data objects, importing and exporting different data structures, measuring sample statistics for quantitative variables, plotting sample histograms and model distribution functions, and scraping data from websites. In addition, we will cover exploratory data analytical (EDA) techniques, handling of incomplete (missing) data, and cohort-rebalancing of imbalanced groups.
In this section, we will see a broad range of simulations and hands-on activities to highlight some of the basic data visualization techniques using R. A brief discussion of alternative visualization methods is followed by demonstrations of histograms, density, pie, jitter, bar, line and scatter plots, as well as strategies for displaying trees and graphs and 3D surface plots. Many of these are also used throughout the textbook in the context of addressing the graphical needs of specific case-studies.
It is practically impossible to cover all options of every different
visualization routine. Readers are encouraged to experiment with each
visualization type, change input data and parameters, explore the
function documentation using R-help (e.g., ?plot), and
search for new R visualization packages and new functionality, which are
continuously being developed.
Scientific data-driven or simulation-driven visualization methods are hard to classify. The following list of criteria can be used for classification:
Also, we have the following table for common data visualization methods according to task types: