Class Notes » | (Video) DSPA Overview » (Video) DSPA Motivation » |
Examples of Big Biomedical Challenges (AD, PD, ALS, AWD)
Brain Visualization
Neurodegeneration
Genomics computing
Neuroimaging-genetics
Common Characteristics of Big (Biomedical and Health) Data
High-throughput Big Data Analytics
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 1 » |
Statistical Software – Pros/Cons Comparison
Getting started
Install Basic Shell-based R
GUI based R Invocation (RStudio)
RStudio GUI Layout
Help
Simple Long-to-Wide Data format translation
Data generation
I/O
Slicing and extracting data
Variable conversion
Variable information
Data selection and manipulation
Math Functions
Matrix Operations
Advanced Data Processing
Strings
Plotting
QQ Normal Probability Plots
Low-level plotting commands
Graphics parameters
Optimization and model fitting
Statistics
Distributions
Programming
Data Simulation Primer
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 2 » |
Managing Data in R
Saving and Loading R Data Structures
Importing and Saving Data from CSV Files
Exploring the Structure of Data
Exploring Numeric Variables
Measuring the Central Tendency - mean and median
Measuring Spread - quartiles and the five-number summary
Visualizing Numeric Variables - boxplots
Visualizing Numeric Variables - histograms
Understanding Numeric Data - uniform and normal distributions
Measuring Spread - variance and standard deviation
Exploring Categorical Variables
Measuring the Central Tendency - the mode
Exploring Relationships Between Variables
Missing Data
Parsing webpages and visualizing tabular HTML data
Cohort-Rebalancing (for Imbalanced Groups)
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 3 » |
Classification of visualization methods
Composition
Histograms and density plots
Pie Chart
Heat map
Comparison
Paired ScatterPlots
Barplots
Trees and Graphs
Correlation Plots
Relationships
Line plots using ggplot
Density Plots
Distributions
2D Kernel Density and 3D Surface Plots
Jitter plot
Appendix
Hands-on Activity (Health Behavior Risks)
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 4 » |
Linear Algebra & Matrix Computing
Building Matrices
Create matrices
Adding columns and rows
Matrix subscripts
Matrix Operations
Addition
Subtraction
Multiplication
Elementwise multiplication
Matrix multiplication
Division
Transpose
Inverse
Matrix Operations
Matrix Algebra Notation
Matrix Notation
Solving Systems of Equations
The identity matrix
Vectors, Matrices, and Scalars
Sample Statistics
Mean
Variance
Applications of Matrix Algebra: Linear modeling
Finding function extrema (min/max) using calculus
Least Square Estimation
The R lm Function
Eigenvalues and Eigenvectors
Other important functions
Matrix notation
Linear regression
Sample covariance matrix
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 5 » |
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Factor Analysis (FA)
Singular Value Decomposition (SVD)
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 6 » |
Understanding classification using nearest neighbors
The kNN algorithm
Calculating distance
Choosing an appropriate k
Preparing data for use with kNN
Why is the kNN algorithm lazy?
Predictive Diagnostics
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 7 » |
The Naive Bayes Algorithm
Assumptions
Bayes Formula
The Laplace Estimator
Case Study: Head and Neck Cancer Medication
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 8 » |
Understanding decision trees
Divide and conquer
The C5.0 decision tree algorithm
Choosing the best split
Pruning the decision tree
Boosting the accuracy of decision trees
Making some mistakes more costly than others
Understanding classification rules
Separate and conquer
The One Rule algorithm
The RIPPER algorithm
Rules from decision trees
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 9 » |
Simple linear regression
Ordinary least squares estimation
Correlations
Multiple Linear Regression
Case Study 1: Baseball Players
Step 2 - exploring and preparing the data
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 - improving model performance
Regression trees and model trees
Heart Attack Data
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 10 » |
Neural Networks
Network topology
Training neural networks with backpropagation
Case Study 1: Google Trends and the Stock Market
Support Vector Machines (SVM)
Case Study 2: Optical Character Recognition (OCR)
Case Study 3: Iris Flowers
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 11 » |
Association Rules
Rule support and confidence
Case Study 1: Head and Neck Cancer Medications
Practice Problems: Groceries
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 12 » |
Clustering as a machine learning task
The k-Means Clustering Algorithm
Case Study 1: Divorce and Consequences on Young Adults
Case study 2: Pediatric Trauma
Practice Problem: Youth Development
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 13 » |
Measuring performance for classification
Working with classification prediction data
Evaluation: Confusion matrices
Other performance measures
Visualizing performance tradeoffs
Estimating future performance (internal statistical validation)
The holdout method
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 14 » |
Tuning stock models for better performance
Using caret for automated parameter tuning
Creating a simple tuned model
Customizing the tuning process
Improving model performance with meta-learning
Understanding ensembles
Bagging
Boosting
Random forests
Training random forests
Evaluating random forest performance
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 15 » |
Working with specialized data and databases
Querying data in SQL databases
Downloading the complete text of web pages
Web-page Data Scraping
Parsing JSON from web APIs
Reading and writing Microsoft Excel spreadsheets using XLSX
Visualizing network data
Data Streams and Streaming Classification
Optimization and improving the computational performance
Generalizing tabular data structures with dplyr
Parallel computing
GPU computing
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 16 » |
Variable selection methods
Filtering-based, wrapper-based, and embedded methods
Comparing random forest classification, recursive feature elimination, and stepwise variable selection
Case Study - ALS
Evaluating model performance
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 17 » |
Regularized Linear Modeling
Ridge Regression
Least Absolute Shrinkage and Selection Operator (LASSO) Regression
Linear Regression
Assessing Prediction Accuracy
Estimating Prediction Error
Improving Prediction Accuracy
General Regularization Framework
Example: Neuroimaging-genetics study of Parkinson's Disease Dataset
Computational Complexity
n-Fold Cross Validation
Controlled Variable Selection: Knockoff Filtering: Simulated Example
PD Neuroimaging-genetics Case-Study
Visualization
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 18 » |
Time series analysis
Identifying the Diff, AR and MA parameters
Structural Equation Modeling (SEM)
Case study - Parkinson's Disease (PD)
Linear Mixed model
GLMM and GEE Longitudinal data analysis
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 19 » |
Term Frequency (TF), Inverse Document Frequency (IDF)
Document Term Matrix (DTM)
Case-Study: Job ranking
NLP
Cosine similarity
Sentiment Analysis
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 20 » |
Forecasting types and assessment approaches
Overfitting
Internal Statistical Cross-validation is an iterative process
Example (Linear Regression)
Cross-validation methods
Case-Studies
Summary of CS output
Alternative predictor functions
Prediction Models
Appendix: R Debugging
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 21 » |
Free (unconstrained) optimization
Constrained Optimization
Equality nand Inequality constraints
Lagrange Multipliers
Linear and Quadratic Programming
Manual vs. Automated Lagrange Multiplier Optimization
Data Denoising
Class Notes » | R Code » | Assignment » | (Video) DSPA Chapter 22 » |
Perceptrons
Biological Relevance
Simple Neural Net Examples XOR and NAND Operators
Sonar data example
Schizophrenia Neuroimaging Study
Spirals 2D Data
IBS Study
Country QoL Ranking Data
Handwritten Digits Classification
Classifying Real-World Images