User-data driven visual analytics of extremely high-dimensional studies using TensorBoard

SOCR Team

This SOCR HTML5 resource demonstrates:
  1. t-distributed stochastic neighbor embedding (t-SNE) statistical method for manifold dimension reduction,
  2. The TensorBoard machine learning platform, and
  3. Hands-on Big Data Analytics activity using the UK Biobank data,
  4. Interactive Visual Analytics using user-provided data.
In this case-study, we use the (tab-delimited) SOCR Longitudinal data - human electrocardiogram (ECG) signals.
This ECG data structure represents a 2D array/tensor, [1:162, 1:3500]. The complete data is available here, [1:162, 1:65536].
Each row [1:162, ] is an ECG recording representing 65536 temproral measurements over a period of 512 seconds, sampled at 128 Hz.
The Labels file represents a vector of 162 diagnostic labels, one for each row of Data. The three diagnostic categories are: 'ARR', 'CHF', and 'NSR'. Labels: ARR: 96 recordings from persons with arrhythmia. CHF: 30 recordings from persons with congestive heart failure. NSR 36 recordings from persons with normal sinus rhythms. Research Goal: Train a classifier to distinguish between the 3 clinical phenotypes: ARR, CHF, and NSR. ECGDataTensor_T3500.tsv includes only 3500 timepoints ECGDataTensor.tsv includes the complete 65536 temproral data.
Note that longutudinal data has to be in the wide format where each columsn represents a time index for the observed calue in the corresponding cell.
Before you begin, review the SOCR hands-on high-dimensional t-SNE Data Analytics Learning Module and the DSPA Dimensionality Reduction Chapter.

Similarly to the analysis of the UK Biobank study and the User-Specified Case-Study, you can use your own dataset.
This will require you to provide a pair of ASCII text files that can be loaded from your computer.

The first file contains tab-delimited (TSV) data including the predictor vectors (row=case * column=features).
The second file is an optional TSV file including metadata like labels for each case (row), if any.
Examples of the two data formats that can be loaded from your computer are included below. Once the data is loaded in the app, you can run the analysis on your own data much like we did in the similar dimensionality reduction activity using UK Biobank.

Video Demonstration using UKBB Data


You can see the complete UK Biobank activity here.