SOCR ≫ DSPA ≫ DSPA2 Topics ≫

In this chapter, we are going to cover several powerful black-box machine learning and artificial intelligence techniques. These techniques have complex mathematical formulations, however, efficient algorithms and reliable software packages have been developed to utilize them for various practical applications. We will (1) describe Neural Networks as analogues of biological neurons, (2) develop hands-on a neural network that can be trained to compute the square-root function, (3) describe support vector machine (SVM) classification, (4) present the random forest as an ensemble ML technique, and (5) analyze several case-studies, including optical character recognition (OCR), the Iris flowers, Google Trends and the Stock Market, and Quality of Life in chronic disease.

Later, in Chapter 14, we will provide more details and additional examples of deep neural network learning. For now, let’s start by exploring the mechanics inside black box machine learning approaches.

1 Neural Networks

1.1 From biological to artificial neurons

An Artificial Neural Network (ANN) model mimics the biological brain response to multisource (sensory-motor) stimuli (inputs). ANN simulates the brain using a network of interconnected neuron cells to create a massive parallel processor. Indeed, ANNs rely on graphs of artificial nodes, not brain cells, to model intrinsic process characteristics using observational data.

The basic ANN component is a cell node. Suppose we have the input \(x=\{x_i\}\) to the node feeding information from upstream network nodes, and one output propagating the information downstream through the network. The first step in fitting an ANN involves estimation of the weight coefficients for each input feature. These weights (\(w\)’s) correspond to the relative importance of each input. Then, the weighted signals are summed by the “neuron cell” and this sum is passed on according to an activation function denoted by \(f(\cdot)\). The last step is generating an output \(y\) at the end of each node. A typical output will have the following mathematical relationship to the inputs. The weights \(\{w_i\}_{i\ge 1}\) control the weight-averaging of the inputs, \(\{x_i\}\), used to assess the activation function. The constant factor weight \(w_o\) and the corresponding bias term \(b\) allows us to shift or offset the entire activation function (left or right). \[\underbrace{y(x)}_{output}=f\left (w_o \underbrace{b}_{bias}+\sum_{i=1}^n \overbrace{w_i}^{weights} \underbrace{x_i}_{inputs}\right ).\]

There are three important components for building a neural network:

  • Activation function: transforms weighted and aggregated inputs into an output.
  • Network topology: describes the number of “neuron cells”, the number of layers, nodes per layer, and manner in which the cells are connected.
  • Training algorithm: optimization strategy to estimate the network weights \(\{w_i\}\).

Let’s look at each of these components one by one.

1.2 Activation functions

There are many alternative activation functions. One example is a threshold activation function that results in an output signal only when a specified input threshold has been attained.

\[f(x)= \left\{ \begin{array}{ll} 0 & x<0 \\ 1 & x\geq 0 \\ \end{array} \right. .\]

This is the simplest form of an activation function. It may be rarely used in real world situations. Most commonly used alternative is the sigmoid activation function where \(f(x)=\frac{1}{1+e^{-x}}\). The Euler number \(e\) is defined by the limit of \(e=\displaystyle\lim_{n\longrightarrow\infty}{\left ( 1+\frac{1}{n}\right )^n}\). The output signal is no longer binary but can be any real number ranging from 0 to 1.