SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |

SOCR/MIDAS (Ivo Dinov)

Machine learning relies heavily on entropy-based (e.g., Renyi-entropy) information theory and kernel-based methods. For instance, Parzen-kernel windows may be used for estimation of various probability density functions, which facilitates the expression of information theoretic concepts as kernel matrices or statistics, e.g., mean vectors, in a Mercer kernel feature space. The parallels between machine learning and information theory allows the interpretation and understanding of computational methods from one field in terms of their dual representations in the other.

Machine learning (ML) is the process of data-driven estimation
(quantitative evidence-based learning) of *optimal* parameters of
a model, network or system, that lead to output prediction,
classification, regression or forecasting based on a specific input
(prospective, validation or testing data, which may or may not be
related to the original training data). Parameter optimality is tracked
and assessed iteratively by a learning criterion depending on the
specific type of ML problem. Classical learning assessment criteria,
including mean squared error (MSE), accuracy, \(R^2\), see Chapter
9, may only capture low-order statistics of the data, e.g., first or
second order. Higher-order learning criteria enable solving problems
where sensitivity to higher-moments is important (e.g., matching
skewness or kurtosis for non-linear clustering, classification,
dimensionality reduction).

This Figure provides a schematic workflow description of machine learning.