Class Notes » |
Examples of Big Biomedical Challenges (AD, PD, ALS, AWD)
Brain Visualization
Neurodegeneration
Genomics computing
Neuroimaging-genetics
Common Characteristics of Big (Biomedical and Health) Data
High-throughput Big Data Analytics
Class Notes » | R Code » | Assignment » |
Statistical Software – Pros/Cons Comparison
Getting started
Install Basic Shell-based R
GUI based R Invocation (RStudio)
RStudio GUI Layout
Help
Simple Long-to-Wide Data format translation
Data generation
I/O
Slicing and extracting data
Variable conversion
Variable information
Data selection and manipulation
Math Functions
Matrix Operations
Advanced Data Processing
Strings
Plotting
QQ Normal Probability Plots
Low-level plotting commands
Graphics parameters
Optimization and model fitting
Statistics
Distributions
Programming
Data Simulation Primer
Class Notes » | R Code » | Assignment » |
Managing Data in R
Saving and Loading R Data Structures
Importing and Saving Data from CSV Files
Exploring the Structure of Data
Exploring Numeric Variables
Measuring the Central Tendency - mean and median
Measuring Spread - quartiles and the five-number summary
Visualizing Numeric Variables - boxplots
Visualizing Numeric Variables - histograms
Understanding Numeric Data - uniform and normal distributions
Measuring Spread - variance and standard deviation
Exploring Categorical Variables
Measuring the Central Tendency - the mode
Exploring Relationships Between Variables
Missing Data
Parsing webpages and visualizing tabular HTML data
Cohort-Rebalancing (for Imbalanced Groups)
Class Notes » | R Code » | Assignment » |
Classification of visualization methods
Composition
Histograms and density plots
Pie Chart
Heat map
Comparison
Paired ScatterPlots
Barplots
Trees and Graphs
Correlation Plots
Relationships
Line plots using ggplot
Density Plots
Distributions
2D Kernel Density and 3D Surface Plots
Jitter plot
Appendix
Hands-on Activity (Health Behavior Risks)
Class Notes » | R Code » | Assignment » |
Linear Algebra & Matrix Computing
Building Matrices
Create matrices
Adding columns and rows
Matrix subscripts
Matrix Operations
Addition
Subtraction
Multiplication
Elementwise multiplication
Matrix multiplication
Division
Transpose
Inverse
Matrix Operations
Matrix Algebra Notation
Matrix Notation
Solving Systems of Equations
The identity matrix
Vectors, Matrices, and Scalars
Sample Statistics
Mean
Variance
Applications of Matrix Algebra: Linear modeling
Finding function extrema (min/max) using calculus
Least Square Estimation
The R lm Function
Eigenvalues and Eigenvectors
Other important functions
Matrix notation
Linear regression
Sample covariance matrix
Class Notes » | R Code » | Assignment » |
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Factor Analysis (FA)
Singular Value Decomposition (SVD)
Class Notes » | R Code » | Assignment » |
Understanding classification using nearest neighbors
The kNN algorithm
Calculating distance
Choosing an appropriate k
Preparing data for use with kNN
Why is the kNN algorithm lazy?
Predictive Diagnostics
Class Notes » | R Code » | Assignment » |
The Naive Bayes Algorithm
Assumptions
Bayes Formula
The Laplace Estimator
Case Study: Head and Neck Cancer Medication
Class Notes » | R Code » | Assignment » |
Understanding decision trees
Divide and conquer
The C5.0 decision tree algorithm
Choosing the best split
Pruning the decision tree
Boosting the accuracy of decision trees
Making some mistakes more costly than others
Understanding classification rules
Separate and conquer
The One Rule algorithm
The RIPPER algorithm
Rules from decision trees
Class Notes » | R Code » | Assignment » |
Simple linear regression
Ordinary least squares estimation
Correlations
Multiple Linear Regression
Case Study 1: Baseball Players
Step 2 - exploring and preparing the data
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 - improving model performance
Regression trees and model trees
Heart Attack Data
Class Notes » | R Code » | Assignment » |
Neural Networks
Network topology
Training neural networks with backpropagation
Case Study 1: Google Trends and the Stock Market
Support Vector Machines (SVM)
Case Study 2: Optical Character Recognition (OCR)
Case Study 3: Iris Flowers
Class Notes » | R Code » | Assignment » |
Association Rules
Rule support and confidence
Case Study 1: Head and Neck Cancer Medications
Practice Problems: Groceries
Class Notes » | R Code » | Assignment » |
Clustering as a machine learning task
The k-Means Clustering Algorithm
Case Study 1: Divorce and Consequences on Young Adults
Case study 2: Pediatric Trauma
Practice Problem: Youth Development
Class Notes » | R Code » | Assignment » |
Measuring performance for classification
Working with classification prediction data
Evaluation: Confusion matrices
Other performance measures
Visualizing performance tradeoffs
Estimating future performance (internal statistical validation)
The holdout method
Class Notes » | R Code » | Assignment » |
Tuning stock models for better performance
Using caret for automated parameter tuning
Creating a simple tuned model
Customizing the tuning process
Improving model performance with meta-learning
Understanding ensembles
Bagging
Boosting
Random forests
Training random forests
Evaluating random forest performance
Class Notes » | R Code » | Assignment » |
Working with specialized data and databases
Querying data in SQL databases
Downloading the complete text of web pages
Web-page Data Scraping
Parsing JSON from web APIs
Reading and writing Microsoft Excel spreadsheets using XLSX
Visualizing network data
Optimization and improving the computational performance
Generalizing tabular data structures with dplyr
Parallel computing
GPU computing
Class Notes » | R Code » | Assignment » |
Variable selection methods
Case Study - ALS
Evaluating model performance
Class Notes » | R Code » | Assignment » |
Regularized Linear Modeling
Ridge Regression
Least Absolute Shrinkage and Selection Operator (LASSO) Regression
Linear Regression
Assessing Prediction Accuracy
Estimating Prediction Error
Improving Prediction Accuracy
General Regularization Framework
Example: Neuroimaging-genetics study of Parkinson's Disease Dataset
Computational Complexity
n-Fold Cross Validation
Controlled Variable Selection: Knockoff Filtering: Simulated Example
PD Neuroimaging-genetics Case-Study
Visualization
Class Notes » | R Code » | Assignment » |
Time series analysis
Identifying the Diff, AR and MA parameters
Structural Equation Modeling (SEM)
Case study - Parkinson's Disease (PD)
Linear Mixed model
GLMM and GEE Longitudinal data analysis
Class Notes » | R Code » | Assignment » |
Term Frequency (TF), Inverse Document Frequency (IDF)
Document Term Matrix (DTM)
Case-Study: Job ranking
NLP
Class Notes » | R Code » | Assignment » |
Forecasting types and assessment approaches
Overfitting
Internal Statistical Cross-validation is an iterative process
Example (Linear Regression)
Cross-validation methods
Case-Studies
Summary of CS output
Alternative predictor functions
Prediction Models
Appendix: R Debugging
Class Notes » | R Code » | Assignment » |
Free (unconstrained) optimization
Constrained Optimization
Equality nand Inequality constraints
Lagrange Multipliers
Linear and Quadratic Programming
Manual vs. Automated Lagrange Multiplier Optimization
Data Denoising
Class Notes » | R Code » | Assignment » |
Perceptrons
Biological Relevance
Simple Neural Net Examples XOR and NAND Operators
Sonar data example
Schizophrenia Neuroimaging Study
Spirals 2D Data
IBS Study
Country QoL Ranking Data
Handwritten Digits Classification
Classifying Real-World Images