| SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
This DSPA Appendix presents the mathematical foundations, computational algorithms, and analytical applications of Reinforcement Learning (RL).
In principle, most modern artificial intelligence (AI) can be classified as:
Supervised learning methods: These are applicable when a meaningful outcome feature can be identified, measured and modeled. SUpervised strategies aim to expose the (multivariate) relationships between a set of predictor variables (covariates) and the observed outcome variable of interest. In general, we want to accurately model, predict and track the outcome with respect to the other covariates. The most frequently used supervised learning approaches generate classification or regression models that can be estimated (fitted or trained) on some a priori observed (training) dataset that includes instances of the covariates and the outcome. Typically, some parametric assumption, independence requirements, and/or normalization constraints are assumed.
Unsupervised learning methods: These methods consider equally all features in the data without assuming there is a specific outcome variable that needs to be forecasted in terms of the remaining features. The goals of unsupervised learning methods are to (1) model the joint distribution of all variables, or (2) uncover hidden (latent) structure in the high-dimensional data that may suggest intricate (mechanistic, causal, or relational) interdependences between all features included in the observed information.
Reinforcement learning techniques: These approaches are useful for adaptive, i.e., temporally-dynamic, decision-making based on stochastic interactions between an agent and an ambient environment where the agent makes discrete time actions within the state-space within the constraint environment. The (AI) agent is driven by learning algorithm that rewards (carrot) or penalizes (stick) the agent to reinforce optimal decision-making, balancing short- and long-term benefits (instant- and delayed-gratification). In this learning process, the agent is subjected to some predefined rules, and the environment controls all other aspects iteratively curtailing the actions/decisions of the agent during the RL process. In general, outside of the penalty/reward reinforcement, the agent only has limited information about the ambient environment or the end-game. During the temporally-dynamic exploration of the environment, the agent is simply trying to decide on the most advantageous behavior to increase the total reward (balancing short- and long-term gratification). RL learning using deep neural networks tries to mimic biological behavior and human brain plasticity, especially during neurodevelopment.
In previous DPSA chapters, we introduced many alternative model-based and model-free methods for supervised and unsupervised regression, classification, clustering, and forecasting. The following figure shows a schematic of the main components of any reinforcement learning technique - agent, state-space, environment, action and reward.