Introduction to Statistics: From Data to Knowledge to Decisions

Summary: This short course is targeted towards engineers interested in learning how to use statistics, data, and machine learning techniques to tackle industrially relevant problems. The course focuses on foundations of statistics and optimization that are broadly applicable beyond the techniques covered in the course. Numerical examples implemented in Matlab are provided.

Statistics and Probability:

- Properties of Random Variables (e.g., density, cumulative density)

- Types of Random Variables (e.g., discrete, continuous, univariate, multivariate)

- Descriptive Statistics (e.g., mean, variance, moments)

- Gaussian Random Variables

- Modeling Phenomena with Random Variables (e.g., reliability, failure)

Estimation:

- Monte Carlo Sampling

- Maximum Likelihood

- Law of Large Numbers

- Central Limit Theorem

- Extreme Value Theorem

Data-Driven Models:

- Linear Regression

- Nonlinear Regression

- Classification (Logistic Regression, Support Vector Machines)

- Kernel Methods

- Neural Networks

Data Analysis:

- Principal Component Analysis (PCA)

- Singular Value Decomposition

- Dynamic Mode Decomposition

- Convolutional Neural Networks

Decision-Making:

- Risk Measures

- Stochastic Dominance

Course Dates:

- September 2019, UW-Madison

- January 2021, Guanajuato

- June 2022, Guanajuato

About the instructor: Victor M. Zavala is the Baldovin-DaPra Professor in the Department of Chemical and Biological Engineering at the University of Wisconsin-Madison and a senior computational mathematician in the Mathematics and Computer Science Division at Argonne National Laboratory. He holds a B.Sc. degree from Universidad Iberoamericana and a Ph.D. degree from Carnegie Mellon University, both in chemical engineering. He is the recipient of a Department of Energy Early Career Award under which he develops scalable optimization algorithms. He is also a technical editor of the Mathematical Programming Computation journal. His research interests are in the areas of mathematical modeling of energy systems, high-performance computing, stochastic optimization, and predictive control.