The Data Science and Predictive Analytics (DSPA) course aims to build computational
abilities, inferential thinking, and practical
skills for tackling core data scientific challenges. It explores foundational concepts in
data management, processing, statistical computing, and dynamic visualization using modern
programming tools and agile web-services. Concepts, ideas, and protocols are illustrated
through examples of real observational, simulated and research-derived datasets. Some prior
quantitative experience in programming, calculus, statistics, mathematical models, or linear
algebra will be necessary.
This open graduate course will provide a general overview of the principles, concepts,
techniques, tools and services for managing, harmonizing, aggregating, preprocessing, modeling,
analyzing and interpreting large, multi-source, incomplete, incongruent, and heterogeneous data
(Big Data). The focus will be to expose students to common challenges related to handling
Big Data and present the enormous opportunities and power associated with our ability to
interrogate such complex datasets, extract useful information, derive knowledge, and provide
actionable forecasting. Biomedical, healthcare, and social datasets will provide context
for addressing specific driving challenges. Students will learn about modern data analytic
techniques and develop skills for importing and exporting, cleaning and fusing, modeling
and visualizing, analyzing and synthesizing complex datasets. The collaborative design,
implementation, sharing and community validation of high-throughput analytic workflows
will be emphasized throughout the course.
You can view the General DSPA Prerequisites. To ensure students are comfortable in this DSPA course, consider taking the self-assessment (pretest) prior to enrolling in the course.
To summarize, students should have prior experience with college level (undergrad) mathematical modeling, statistical analysis, or programming courses or permission of the instructor. Some MOOCs may be taken as prerequisites, e.g., Corsera, EdX1, EdX2. Additional examples of remediation courses are provided in the self-assessment (pretest).
Trainees successfully completing the course will:
(1) Gain understanding of the computational foundations of Big Data Science
(2) Develop critical inferential thinking
(3) Gather a tool chest of R libraries for managing and interrogating raw, derived,
observed, experimental, and simulated big healthcare datasets
(4) Possess practical skills for handling complex datasets.
This course will be appropriate for trainees who have significant interest in learning data scientific and predictive analytic methods that can commit substantial amount of time to focus an undivided attention to study, practice and interact with other trainees in the course. Review the DSPA Topics to decide in the course coverage is of interest to you.
Class notes, datasets, and learning materials will be provided. This course will cover topics like managing data with R, various Learning Classifiers, model-based and model free forecasting and predictive analytics, evaluation of classification performance, and ensemble methods.
The following topics will be covered in varying degree of depth.
Ivo D. Dinov, SOCR, MIDAS, HBBS/UMSN.
This course is designed to build specific data science skills and predictive analytic competencies.