SOCR ≫ BPAD Website ≫ BPAD GitHub ≫

# Install the `eegUtils` package
# library(remotes)
# remotes::install_github("craddm/eegUtils")
library(eegUtils)

1 Potentials

2 Brain and Heart Electrical Signaling

3 Electrocardiography Model

4 Exterior Conductivity

5 Isotropic and Anisotropic Conductivity

6 Electrical Stimulation

7 Electroencephalography (EEG)

The R package eegUtils can be used to model EEG data and meta-data, e.g., channel locations. There are different types of EEG data, e.g., classes for epoched (eeg_epochs) and continuous (eeg_data), and time-frequency representation data (eeg_tfr).

7.1 Data Format

The generic eeg_data object represents all data types. The raw data class is a list of entries:

  • signals - A data frame containing the actual EEG data in wide format, where each column contains data from a single electrode and each row represents a single point in time.
  • srate - A single numeric value giving the sampling rate of the data in \(Hz\).
  • events - A 3-column data frame (tibble) describing the events recorded in the data; event_onset (in samples), relative to recording onset; event_time (in seconds), relative to recording onset; and event_type (integer or other).
  • chan_info - Initially NULL, since BioSemi data format (BDF) and Neuroscan CNT (CNT) data format types do not contain (usable) electrode locations. Use head(eegUtils:::electrodeLocs) to see format of this data frame (tibble), which stores channel location information similar to EEGLAB (Electrode locations) style including the following columns:
    • electrode - electrode names
    • radius - Spherical coordinates (Radius is typically normalized to 1)
    • theta - Spherical coordinates (theta)
    • phi - Spherical coordinates (theta)
    • cart_x - Cartesian 3D coordinates
    • cart_y - Cartesian 3D coordinates
    • cart_z - Cartesian 3D coordinates
    • x and y - 2D Stereographic projection of the spherical coordinates.
  • timings - A data frame describing each row in time (\(s\)) and sample numbers (\(samples\)).
  • epochs - For epoched data, a data frame with information about the epochs in the data.
  • reference - Initially NULL list, without a recorded reference channel. After the data is referenced, it includes pairs of ref_chans (labels for channels used to calculate the reference data or “average”) and excluded (labels for any channels excluded from the reference data).

The eeg_epochs objects have the same structure as eeg_data objects, however, the events table has two additional columns, epoch (storing the epoch number to which a given event belongs) and time (codes the time point at which the event occurs relative to the epoch onset. The event_time contains the time point of event occurrence relative to the recording onset, see events(demo_epochs).

The eeg_tfr objects contain the time-frequency representations of the eeg_epochs objects including:

  • signals - A 4D tensor - \(epoch \times time \times electrode \times frequency\),
  • dimensions - Tracking matrix dimensions and corresponding properties.

The eeg_ICA objects store analytical results, e.g., ICA or SSD decomposition, applied to an eeg_epochs object:

  • mixing_matrix- weights converting the source data into the original data,
  • unmixing_matrix - weights converting the original data into source data,
  • signals - individual component activations.

7.2 EEG Preprocessing

The eegUtils package performs basic EEG preprocessing and plotting of EEG signals. EEG processing starts with importing raw data, import_raw(), in .BDF (BioSemi), .CNT (32-bit Neuroscan), or .vhdr/.vmrk/.dat (Brain Vision Analyzer 2.0) file formats. Internally, the structure of eeg_data objects consists of the raw data and the corresponding meta-data.

There are many EEG, MRI, PET and other free imaging datasets available here. Let’s use this eegUtils example data, in .bdf format, which represents an experiment with volunteers attending to left or right visual fields indicated by specific visual cues (\(\leftarrow\) arrows \(\rightarrow\)). About a second following the visual cue, a Gabor patch target appears in the left or right visual field and the participants have to determine if the target patch shows a vertical or a horizontal grating. Most of the time (\(\frac{4}{5}\)), the target appears in the cued location and the remaining times (\(\frac{1}{5}\)) in the contralateral location. The data file “Matt-task-spatcue.bdf” is available here.

In the experiments below, we will use the Matt-task-spatcue.bdf dataset. More information about this EEG study design is available in this article (DOI: 10.1523/JNEUROSCI.1993-18.2019). This is a study of alpha brain oscillations (\(8–12 Hz\)), which can be used to study human attention and distraction. This study of alpha oscillations aims to identify differences between focused (\(0/100\%\)) vs. divided (\(40/60\%\)) attention. It demonstrates that human attention is associated with spatially distinct synchronized alpha sources in occipital and parietal brain regions.

library(eegUtils)
temp_dir <- tempdir()
temp_file <- file.path(temp_dir, "Matt-task-spatcue.bdf")
download.file("https://osf.io/hy5wq/download", temp_file, mode = "wb")
eegExample <- import_raw(temp_file)
## Importing C:\Users\Dinov\AppData\Local\Temp\RtmpysPBBi/Matt-task-spatcue.bdf as BDF
eegExample
## EEG data
## 
## Number of channels   : 72 
## Electrode names      : Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6 EXG7 EXG8 
## Sampling rate        : 256 Hz
## Reference        : 
## Signal length: 0 1468.996 seconds

To reduce the size of the data, the original EEG data recorded at \(1024 Hz\) was downsampled to \(256 Hz\). The data is acquired using \(64\) electrodes that are placed and named according to the EEG 10-05 international system convention. Eight additional electrodes included (EXG1-EXG4 positioned around the eyes to record eye movements), reference electrodes (EXG5 and EXG6 positioned on the left and right mastoids), and empty channels with no attached electrodes (EXG7 and EXG8).

To begin with, let’s first re-reference the data by using the eegUtils::eeg_reference() command. The reference electrode used in recording EEG data represents a common atlas reference for the data, including TP10 in the 10-20 System, digitally-linked mastoids, computed post-hoc, the vertex electrode (Cz), single or linked earlobes, the nose tip, etc. Some systems with active electrodes (BIOSEMI Active Two) may record data without reference locations and rely on post-hoc processing during data import to avoid the presence of \(40 dB\) noise in the data. When the data are recorded in one reference framework, they can be re-referenced to another atlas, reference channel, or channel combination. If no electrodes meta-data is available, the data may be referenced to a common average calculated from all the electrodes in the data.

In our case, using the select_elecs() function, we can remove the two empty channels, EXG7 and EXG8 (down to 70 channels), and then re-reference the data using the eeg_reference() method (the prior empty reference changes to “average”).

eegExample <- select_elecs(eegExample, electrode = c("EXG7", "EXG8"), keep = FALSE)
eegExample <- eeg_reference(eegExample,ref_chans = "average")
eegExample
## EEG data
## 
## Number of channels   : 70 
## Electrode names      : Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6 
## Sampling rate        : 256 Hz
## Reference        : average 
## Signal length: 0 1468.996 seconds

Next, we can filter the data using the eeg_filter() function, which utilizes infinite impulse response (IIR) or finite impulse response (FIR) filters to modify the frequency response of the raw signal, and as necessary, remove low or high frequency fluctuations in the time-courses. As a demonstration, we can use IIR filtering – high-pass filter at \(0.2 Hz\) and low-pass filter at \(70 Hz\) and plot the power spectral density of the data before and after filtering.

library(plotly)
# Raw data power spectral density (PSD)
# plot_psd(eegExample, freq_range = c(0, 60), legend = FALSE)
# Filtered data power spectral density
eegExample <- eeg_filter(eegExample, method = "iir", low_freq = 0.2, high_freq = 70,
                          filterOrder = 4) # specify a bandpass filter
# Create a PSD plot
#' @param psd_out PSD to plot
#' @param freq_range Frequency range to plot
#' @return plotly object showing power spectral density
psdPlot <- function(psd_out, freq_range, chan_names) {
  if (!is.null(freq_range)) {
    if (length(freq_range) < 2 | length(freq_range) > 2) {
      message("freq_range must be a vector of length 2 ...")
    } else {
      rows <- psd_out$frequency >= freq_range[[1]] & psd_out$frequency <= freq_range[[2]]
      psd_out <- psd_out[rows, ]
    }
  }

  psd_out <- tidyr::pivot_longer(psd_out, cols = dplyr::all_of(chan_names),
                        names_to = "electrode", values_to = "power")
  # str(psd_out)
  # tibble [4,200 x 3] (S3: tbl_df/tbl/data.frame)
  # $ frequency: num [1:4200] 1 1 1 1 1 1 1 1 1 1 ...
  # $ electrode: chr [1:4200] "Fp1" "AF7" "AF3" "F1" ...
  # $ power    : num [1:4200] 64.22 60.07 16.16 5.15 4.75 ...

  # ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
  #   stat_summary(geom = "line", fun = mean) + # geom_line() +
  #   theme_bw() +
  #   ylab(expression(paste(mu, V^2, "/ Hz(dB)"))) +
  #   xlab("Frequency (Hz)") +
  #   scale_x_continuous(expand = c(0, 0))
  psd_out %>% plot_ly(x=~frequency, y=~(10*log10(power)), color=~electrode,
                      type="scatter", mode="lines") %>%
    layout(title="EEG data power spectral density (PSD)", xaxis=list(title="Frequency (Hz)"),
           yaxis=list(title="mu*V^2/Hz(dB)"), legend = list(orientation='h'))
}

# Compute the data power spectral density (PSD)
psd_out <- eegUtils::compute_psd(eegExample, keep_trials=TRUE, n_fft=256, seg_length=NULL, noverlap=NULL)
psdPlot(psd_out, freq_range = c(0, 70), chan_names = channel_names(eegExample))  

EEG data can be epoched around events or triggers using the epoch_data() function. The resulting eeg_epochs objects represent lists of the event triggers discovered in the data. We can examine these triggers by the list_events() or events(eegExample) methods. We can epoch around event types \(120\) and \(132\) - these correspond to onsets of a visual target on the left and right of fixation, respectively. The length of epochs around the trigger is specified by the time_lim variable. In this process, we will baseline-correct the data using the average of the time points between \(-0.1s\) and to \(0.0s\) (stimulus onset). Then, we can use the epochs() method to retrieve meta-information for the data epochs.

list_events(eegExample)
##    event_type
## 1         254
## 2         100
## 3          60
## 4         200
## 5         122
## 6          20
## 7          62
## 8         120
## 9          25
## 10        132
## 11        130
epochedExample <- epoch_data(eegExample, events = c(120,132),  baseline = c(-0.1, 0.0),
    epoch_labels = c("valid_left", "valid_right"), time_lim = c(-0.1, 0.4))
epochs(epochedExample)
## # A tibble: 160 x 5
##    epoch participant_id recording         event_type epoch_labels
##    <dbl> <lgl>          <chr>                  <dbl> <chr>       
##  1     1 NA             Matt-task-spatcue        120 valid_left  
##  2     2 NA             Matt-task-spatcue        132 valid_right 
##  3     3 NA             Matt-task-spatcue        120 valid_left  
##  4     4 NA             Matt-task-spatcue        120 valid_left  
##  5     5 NA             Matt-task-spatcue        120 valid_left  
##  6     6 NA             Matt-task-spatcue        120 valid_left  
##  7     7 NA             Matt-task-spatcue        120 valid_left  
##  8     8 NA             Matt-task-spatcue        120 valid_left  
##  9     9 NA             Matt-task-spatcue        132 valid_right 
## 10    10 NA             Matt-task-spatcue        120 valid_left  
## # ... with 150 more rows

Averaging over epochs or electrodes (e.g., in parietal lobe), the eeg_epochs data can be plotted using plot_ly(), plot_butterfly(), or plot_timecourse(). Baseline correction can also be applied for plotting only using the baseline parameter in the plotting call. Below is a schematic of the EEG electrode positions in the 10-10 system combinatorial nomenclature,

A butterfly plot of the EEG data, over a short time-frame for all locations, is shown below.

plotButterfly <- function(data, time_lim=NULL, baseline=NULL, continuous=FALSE, allow_facets=FALSE, ...) {
  data <- dplyr::group_by(data, time, electrode)
  data <- dplyr::summarise(data, amplitude = mean(amplitude))
  data <- dplyr::ungroup(data)
 

  # select time-range of interest
  if (!is.null(time_lim)) data <- select_times(data, time_lim)

  if (!is.null(baseline)) data <- rm_baseline(data, baseline)

  plot_ly(data=data, x=~time, y=~amplitude, color=~electrode, type="scatter", mode="lines") %>%
      layout(title="EEG Data Butterfly Plot", xaxis=list(title="time"),
           yaxis=list(title="Amplitude (mu*V)"), legend = list(orientation='h'))
}

bf_DF <- function(data, time_lim = NULL, baseline = NULL, quantity = "coefficients") {
  # Select specific times
  if (!is.null(time_lim)) data <- select_times(data, time_lim = time_lim)

  # If asked, perform baseline correction
  if (!is.null(baseline)) data <- rm_baseline(data, time_lim = baseline)

  data <- as.data.frame(data, long = TRUE, coords = FALSE, values = {{quantity}})
  data
}

# Display the Butterfly Plot
data1 <- bf_DF(epochedExample)
plotButterfly(data1)

We can also plot the individual time-course of one specific electrode or a list of electrodes (e.g., occupying the same neighborhood).

# Wrapper to the time-course-plot method (tcPlot)
plotTimecourse_DF <- function(data, electrode = NULL, time_lim = NULL, 
                              add_CI = FALSE, baseline = NULL, title="", ...) {
  if (!is.null(electrode))  data <- select_elecs(data,electrode)
  if (!is.null(baseline)) data <- rm_baseline(data,time_lim = baseline)
  if (!is.null(time_lim)) data <- select_times(data,time_lim)
  
  tc_plot <- tcPlot(data,add_CI = add_CI, title=title)
  tc_plot
}

# Actual time-course plotter
tcPlot <- function(data, add_CI, quantity = amplitude, title="") {
  #  GGPlot solution
  tc_plot <- ggplot2::ggplot(data, aes(x = time, y = {{quantity}}))

  if (add_CI) {  tc_plot <- tc_plot +
          stat_summary(fun.data = mean_cl_normal,geom = "ribbon",linetype = "dashed",
                       fill = NA, colour = "black", size = 1,alpha = 0.5)
  }

  tc_plot <- tc_plot +
    stat_summary(fun = "mean", geom = "line", size = 1.2)

  if(title=="") title = "Electrode = POz (Amplitude +/- SE"
  
  tc_plot +
    labs(title=title, x="Time (s)", y="Amplitude (muV)", fill = "") +
    geom_vline(xintercept = 0, linetype = "solid", size = 0.5) +
    geom_hline(yintercept = 0, linetype = "solid", size = 0.5) +
    scale_x_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
    scale_y_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
    theme_minimal(base_size = 11) +
    theme(panel.grid = element_blank(), axis.ticks = element_line(size = .5)) +
    guides(colour = guide_legend(override.aes = list(alpha = 1)))

  ggplotly(tc_plot) %>%
    layout(title=title, 
           xaxis=list(title="Time (s)"), yaxis=list(title="Amplitude (muV)"))
  
  # # Plot_ly() Solution 
  # data <- dplyr::group_by(data, time, electrode)
  # data <- dplyr::summarise(data, amplitude = mean(amplitude))
  # data <- dplyr::ungroup(data)
  # 
  # tc_plot <- plot_ly(data=data, x = ~time, y = ~amplitude, type="scatter", mode="lines")
  # if (add_CI) {  
  #   data2 <- smean.cl.normal(data1$amplitude)
  #   data$amplitudeLowerBound <- data$amplitude - data2["Lower"]
  #   data$amplitudeUpperBound <- data$amplitude - data2["Upper"]
  #   
  #   tc_plot <- tc_plot %>%
  #     add_trace(y=~data$amplitudeLowerBound, name="Lower-Bound") %>%
  #     add_trace(y=~data$amplitudeUpperBound, name="Upper-Bound")
  # }
  # 
  # tc_plot <- tc_plot +
  #   stat_summary(fun = "mean", geom = "line", size = 1.2)
  # 
  # tc_plot +
  #   labs(x = "Time (s)", y = expression(paste("Amplitude (", mu, "V)")), fill = "") +
  #   geom_vline(xintercept = 0, linetype = "solid", size = 0.5) +
  #   geom_hline(yintercept = 0, linetype = "solid", size = 0.5) +
  #   scale_x_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
  #   scale_y_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
  #   theme_minimal(base_size = 12) +
  #   theme(panel.grid = element_blank(), axis.ticks = element_line(size = .5)) +
  #   guides(colour = guide_legend(override.aes = list(alpha = 1)))
  #
  # Tester
  # plotTimecourse_DF(data1, electrode = "POz") # Plot POz
}

# Display some time-sources at specific electrodes
plotTimecourse_DF(epochedExample, add_CI=TRUE, electrode = "POz",
                  title="Electrode = POz (Amplitude +/- SE") # Plot POz
plotTimecourse_DF(epochedExample, electrode=c("POz", "Oz", "O1", "O2"),
                  title="Average over Electrodes POz, Oz, O1, and O2") # average over four occipital electrodes

The electrode_locations() method can be used to add standard channel locations. The topoplot() method is used to render a 2D flat topographical representation of the data.

epochedExample <- electrode_locations(epochedExample)
channels(epochedExample)
## # A tibble: 70 x 9
##    electrode radius theta   phi cart_x cart_y cart_z     x     y
##    <chr>      <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>
##  1 Fp1            1   -94   -71  -29.4   83.9  -6.99 -30.6  88.9
##  2 AF7            1   -97   -51  -54.8   68.6 -10.6  -61.0  75.4
##  3 AF3            1   -76   -66  -33.7   76.8  21.2  -30.9  69.4
##  4 F1             1   -46   -64  -27.5   56.9  60.3  -20.2  41.3
##  5 F3             1   -60   -47  -50.2   53.1  42.2  -40.9  43.9
##  6 F5             1   -78   -37  -64.5   48.0  16.9  -62.3  46.9
##  7 F7             1   -98   -31  -70.3   42.5 -11.4  -84    50.5
##  8 FT7            1   -98   -10  -80.8   14.1 -11.1  -96.5  17.0
##  9 FC5            1   -73   -14  -77.2   18.6  24.5  -70.8  17.7
## 10 FC3            1   -49   -21  -60.2   22.7  55.5  -45.8  17.6
## # ... with 60 more rows
ggplotly(topoplot(epochedExample, time_lim = c(0.22, 0.24)))

Note that eegUtils objects may be transformed into data frames, as needed.

library(ggplot2)
library(dplyr)
gp <- epochedExample %>%
  select_epochs(epoch_no = 1:10) %>%
  select_elecs(c("PO8", "Cz", "POz")) %>%
  as.data.frame(long = TRUE) %>%
  ggplot(aes(x = time, y = amplitude)) +
  geom_line(aes(group = epoch), alpha = 0.2) + 
  stat_summary(fun = mean,geom = "line", size = 2, aes(colour = electrode)) + 
  facet_wrap(~electrode) + 
  theme_classic()

ggplotly(gp)

The method dplyr::mutate() may be used to add new columns - effectively constructing regions of interest (ROIs) using multiple electrodes in a neighborhood (e.g., left parietal lobe).

gp <- epochedExample %>%
  rm_baseline(time_lim = c(-.1, 0)) %>%
  mutate(leftParietal = (Pz + P1 +P3 + P5 + P7 +P9) / 6) %>%
  select(Pz, P7, leftParietal) %>%
  filter(epoch <= 60, time > -0.1, time < 0.3) %>%
  as.data.frame(long = TRUE) %>%
  ggplot(aes(x = time, y = amplitude)) +
  geom_line(aes(group = epoch), alpha = 0.2) + 
  stat_summary(fun = mean, geom = "line", size = 2, aes(colour = electrode)) +
  facet_wrap(~electrode) + 
  scale_colour_viridis_d() +
  theme_classic()
ggplotly(gp) %>% layout(legend = list(orientation='h'))

7.3 Frequency analysis

EEG signals often exhibit periodicity, such as cyclic oscillations in the alpha frequency range (\(\sim 8-13 Hz\)) observed in human EEG. The Power Spectral Density (PSD) can be used to analyze periodicity via the Fast Fourier transform (FFT).

The eegUtils::compute_psd() method calculates the PSD for each trial separately, returning a dataframe with spectral power for each frequency and each electrode location. For epoched data, it computes the PSD for each epoch prior to averaging over all epochs and plotting the spectral harmonics.

psdExample <- compute_psd(epochedExample)
#  psdPlot(psdExample, freq_range = c(0, 70), chan_names = channel_names(eegExample))  
#  psdPlot(psdExample, freq_range=c(0,125), chan_names=channel_names(eegExample)) 

# plot_psd(epochedExample)
psd_out <- tidyr::pivot_longer(psdExample, cols = channel_names(eegExample), 
                               names_to = "electrode", values_to = "power")
ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
    stat_summary(geom = "line", fun = mean) +# geom_line() +
    theme_bw() +
    ylab("mu  V^2 / Hz(dB)") +
    xlab("Frequency (Hz)") +
    scale_x_continuous(expand = c(0, 0)) +
    theme(legend.position="bottom")

ggplotly(gp) %>%
  layout(title="Epoched EEG data power spectral density (PSD)", legend = list(orientation='h'))
# # Plot_ly() graph of epoched PSD
# psdEpochPlot <- function(psd_out, freq_range, chan_names) {
#   if (!is.null(freq_range)) {
#     if (length(freq_range) < 2 | length(freq_range) > 2) {
#       message("freq_range must be a vector of length 2 ...")
#     } else {
#       rows <- psd_out$frequency >= freq_range[[1]] & psd_out$frequency <= freq_range[[2]]
#       psd_out <- psd_out[rows, ]
#     }
#   }
# 
#   psd_out <- tidyr::pivot_longer(psd_out, cols = dplyr::all_of(chan_names),
#                                  names_to = "electrode", values_to = "power")
#   # ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
#   #   stat_summary(geom = "line", fun = mean) +# geom_line() +
#   #   theme_bw() +
#   #   ylab(expression(paste(mu, V^2, "/ Hz(dB)"))) +
#   #   xlab("Frequency (Hz)") +
#   #   scale_x_continuous(expand = c(0, 0))
#   psd_out %>% plot_ly(x=~frequency, y=~(10*log10(power)), color=~electrode,
#                       type="scatter", mode="lines") %>%
#     layout(title="Epoched EEG data power spectral density (PSD)", xaxis=list(title="Frequency (Hz)"),
#            yaxis=list(title="mu*V^2/Hz(dB)"), legend = list(orientation='h'))
# }
# psdEpochPlot(psdExample, freq_range = c(0, 125), chan_names = channel_names(eegExample))  

Assuming stationarity, i.e., stable process in terms of frequency and power across the entire time domain, frequency analysis may discard some temporal information. EEG data is rarely stationary and its temporal dynamics change across wider timescales.

Time-frequency analysis attempts to account for (globally) non-stationary behaviors. This can be done by moving windows that decompose the signal into time-frequency space tiles of regional stationary signals.

The method eegUtils::compute_tfr() calculates a time-frequency representation of the epoched EEG data using Morlet wavelets. Windowing the signal with these local oscillatory base functions controls for spectral leakage and provides time-frequency specificity. The support (temporal extent) of the Morlet wavelets specifies their cycles as integer numbers of cycles at each frequency level (frequency of interest, FOI).

tfrEpochedExample <- compute_tfr(epochedExample, method="morlet", foi=c(4,30), n_freq=12, n_cycles=3)
tfrEpochedExample
## Epoched EEG TFR data
## 
## Frequency range      :    4 6.36 8.73 11.09 13.45 15.82 18.18 20.55 22.91 25.27 27.64 30 
## Number of channels   :    70 
## Electrode names      :    Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6 
## Number of epochs :    160 
## Epoch limits     :    -0.102 - 0.398 seconds
## Sampling rate        :    256  Hz

We can report the temporal and frequency standard deviations as follows.

tfrEpochedExample$freq_info$morlet_resolution
##    frequency   sigma_f    sigma_t n_cycles
## 1   4.000000  1.333333 0.11936621        3
## 2   6.363636  2.121212 0.07503019        3
## 3   8.727273  2.909091 0.05470951        3
## 4  11.090909  3.696970 0.04305011        3
## 5  13.454545  4.484848 0.03548725        3
## 6  15.818182  5.272727 0.03018456        3
## 7  18.181818  6.060606 0.02626057        3
## 8  20.545455  6.848485 0.02323944        3
## 9  22.909091  7.636364 0.02084172        3
## 10 25.272727  8.424242 0.01889249        3
## 11 27.636364  9.212121 0.01727669        3
## 12 30.000000 10.000000 0.01591549        3

Then we can plot the resulting time-frequency wavelet transformation.

# https://rdrr.io/github/neuroconductor/eegUtils/src/R/frequency_plotting.R 
ggplotly(plot_tfr(tfrEpochedExample)) %>%
  layout(title="Epoched EEG time-frequency wavelet transformation (Morlet)", legend = list(orientation='h'))

EEG baseline correction to allow comparison, contrast, and statistical inference on the signals across participants. The plots below show baseline correction using the following two strategies:

  • db - uses subtraction to convert the EEG data to dB-change,
  • absolute - simple absolute value thresholding correction,
  • and other options such as “divide”, “ratio”, and “pc”.
ggplotly(plot_tfr(tfrEpochedExample, baseline_type = "absolute", baseline = c(-.1, 0))) %>%
  layout(title="Epoched EEG Baseline Correction by dB-change", legend = list(orientation='h'))
ggplotly(plot_tfr(tfrEpochedExample, baseline_type = "db", baseline = c(-.1, 0))) %>%
  layout(title="Epoched EEG Baseline Correction by Absolute Value Thresholding", 
         legend = list(orientation='h'))
# subplot(pl1, pl2)

7.4 EEG Linear Modeling, Prediction and Inference

For linear modeling and inference, instead of using the first EEG study of alpha brain oscillations we used above, we will now use a more specialized study specifically designed for linear modeling (LIMO). More information about the LIMO EEG Dataset is available here (DOI: 10.7488/ds/1556) and here (DOI: 10.3389/fnins.2020.610388).

The study-design represents a controlled experiment where participants discriminate between two different types of human faces with varying phase coherence in the range \(0\% - 85\%\). The phase coherence of the facial images determine how easy or hard it is to discriminate between faces.

Prior studies have reported that theta phase coherence in the frontal lobes represents a key EEG signature of working memory, indicating that visual imagery of faces is associated with frontal interhemispheric synchronization in the theta frequency range (DOI: 10.1038/s41598-021-81336-y).

In this experiment, the real data is very large, e.g., the raw data from subject 1 is over \(300Mb\) in size.

#  install.packages("hdf5r")
library(eegUtils)
library(R.matlab)  # needed to load in matlab data (*.mat files)
setwd("E:/Ivo.dir/Research/Proposals/SOCR/2021/IHPI_SOCR_Summer_2021_Fellowships_Apps/Matt_Ratanapanichkich_MedicalPhysics_2021/other/EEG_Data/")
limoSubj1 <- import_set("limo_dataset_S1.set")
limoContinuousVar <- R.matlab::readMat("continuous_variable.mat")
limoCategoricalVar <- readr::read_csv("categorical_variable.txt", 
                                      col_names = c("cond_lab"), show_col_types = FALSE)

The method eegUtils::fit_glm() fits a linear model using the epochs data field as predictors. In this LIMO experiment, we will use the categorical (limoCategoricalVar) and continuous (limoContinuousVar) predictors in addition to the epochs.

library(DT)
epochs(limoSubj1) <- dplyr::mutate(epochs(limoSubj1), phase_coherence = unlist(limoContinuousVar),
              face = factor(limoCategoricalVar$cond_lab,levels = c(1, 2), labels=c("Face_A","Face_B")))
epochs(limoSubj1)
## # A tibble: 1,055 x 5
##    epoch participant_id  recording       phase_coherence face  
##    <int> <chr>           <chr>                     <dbl> <fct> 
##  1     1 limo_dataset_S1 limo_dataset_S1            0.6  Face_B
##  2     2 limo_dataset_S1 limo_dataset_S1            0.55 Face_A
##  3     3 limo_dataset_S1 limo_dataset_S1            0.3  Face_B
##  4     4 limo_dataset_S1 limo_dataset_S1            0.1  Face_A
##  5     5 limo_dataset_S1 limo_dataset_S1            0.15 Face_A
##  6     6 limo_dataset_S1 limo_dataset_S1            0.1  Face_B
##  7     7 limo_dataset_S1 limo_dataset_S1            0.85 Face_A
##  8     8 limo_dataset_S1 limo_dataset_S1            0.8  Face_B
##  9     9 limo_dataset_S1 limo_dataset_S1            0.25 Face_A
## 10    10 limo_dataset_S1 limo_dataset_S1            0.65 Face_B
## # ... with 1,045 more rows
Hmisc::describe(epochs(limoSubj1))
## epochs(limoSubj1) 
## 
##  5  Variables      1055  Observations
## --------------------------------------------------------------------------------
## epoch 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1055        0     1055        1      528      352     53.7    106.4 
##      .25      .50      .75      .90      .95 
##    264.5    528.0    791.5    949.6   1002.3 
## 
## lowest :    1    2    3    4    5, highest: 1051 1052 1053 1054 1055
## --------------------------------------------------------------------------------
## participant_id 
##               n         missing        distinct           value 
##            1055               0               1 limo_dataset_S1 
##                           
## Value      limo_dataset_S1
## Frequency             1055
## Proportion               1
## --------------------------------------------------------------------------------
## recording 
##               n         missing        distinct           value 
##            1055               0               1 limo_dataset_S1 
##                           
## Value      limo_dataset_S1
## Frequency             1055
## Proportion               1
## --------------------------------------------------------------------------------
## phase_coherence 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1055        0       18    0.997    0.422   0.3006     0.00     0.05 
##      .25      .50      .75      .90      .95 
##     0.20     0.40     0.65     0.80     0.85 
## 
## lowest : 0.00 0.05 0.10 0.15 0.20, highest: 0.65 0.70 0.75 0.80 0.85
##                                                                             
## Value       0.00  0.05  0.10  0.15  0.20  0.25  0.30  0.35  0.40  0.45  0.50
## Frequency     63    59    58    60    59    58    55    59    61    62    58
## Proportion 0.060 0.056 0.055 0.057 0.056 0.055 0.052 0.056 0.058 0.059 0.055
##                                                     
## Value       0.55  0.60  0.65  0.70  0.75  0.80  0.85
## Frequency     58    57    55    56    58    60    59
## Proportion 0.055 0.054 0.052 0.053 0.055 0.057 0.056
## --------------------------------------------------------------------------------
## face 
##        n  missing distinct 
##     1055        0        2 
##                         
## Value      Face_A Face_B
## Frequency     524    531
## Proportion  0.497  0.503
## --------------------------------------------------------------------------------
DT::datatable(epochs(limoSubj1)[1:20,])

Using fit_glm() to fit a LM to each electrode, across all time-points, the model argument is blank on the left of the model specification \(\sim\). The specific covariates are listed as predictors on the right hand side of the model formula. The categorical variable face and the continuous predictor phase_coherence are used as an example, \(\sim face + phase\_coherence\).

The fitted LIMO will estimate 3 model coefficients:

  • an intercept term corresponding to the amplitude for the first level of the categorical predictor (e.g.,. Face A) at \(0\) phase coherence,
  • a face coefficient accounting for the amplitude difference between the intercept level Face A and the alternative level Face B,
  • a phase-coherence term modeling the increase in amplitude corresponding to image phase coherence increases from \(0\) to \(1\).

Let’s plot the three coefficients (Intercept, faceFace_B, and phase_coherence) for each electrode across all time points.

fitted_model <- fit_glm(~ face + phase_coherence, data = limoSubj1)

# GGplot display
# as.data.frame(fitted_model, long = TRUE) %>%
#   ggplot(aes(x = time, y = amplitude, colour = electrode)) +
#   geom_line() +
#   facet_wrap(~coefficient, scales = "free") +
#   theme(legend.position = "none")

# plotly graph
df <- as.data.frame(fitted_model, long = TRUE) 
df_intercept <- df[which(df$coefficient=='(Intercept)'), ]
df_faceFace_B <- df[which(df$coefficient=='faceFace_B'), ]
df_phase_coherence <- df[which(df$coefficient=='phase_coherence'), ]

namesInt <- paste0("Int:", df_intercept$electrode)
namesFaceB <- paste0("FaceB:", df_faceFace_B$electrode)
namesPhaCoh <- paste0("PhaCoh:", df_phase_coherence$electrode)

pl_int <- plot_ly(data=df_intercept, x=~time, y=~amplitude, color=~electrode, 
                  type="scatter", mode="lines", name=namesInt) %>% layout(showlegend = FALSE)
pl_faceB <- plot_ly(data=df_faceFace_B, x=~time, y=~amplitude, color=~electrode, 
                    type="scatter", mode="lines", name=namesFaceB)  %>% layout(showlegend = FALSE)
pl_phaseCoh <- plot_ly(data=df_phase_coherence, x=~time, y=~amplitude, color=~electrode, 
                       type="scatter", mode="lines", name=namesPhaCoh) %>% layout(showlegend = FALSE)

subplot(pl_int, pl_faceB, pl_phaseCoh, nrows=3, shareX = TRUE) %>%
  layout(title="LM Coefficient Estimates (Intercept, Face-B, Phase-Coherence)")

In addition to rendering the raw coefficient estimates, we can also display various statistics, e.g., “std_err”, “t_stats”, “r_sq”, captured in the fitted_model object. Below we show just the GLM model coefficient estimates for Electrode=A1, Epoch=2, and the categorical Factor=Face_B.

# Recall the model:   fitted_model <- fit_glm(~ face + phase_coherence, data = limoSubj1)

# dim(fitted_model$coefficients)  # dim(fitted_model$std_err)  #  [1] 603 117  # time * electrode
# dataA1 <- round(data.frame(time = fitted_model$timings$time, A1 = fitted_model$coefficients[,1],
#                            lower = fitted_model$coefficients[,1]-fitted_model$std_err[,1],
#                            upper = fitted_model$coefficients[,1]+fitted_model$std_err[,1]), 2)
# colnames(dataA1) <- c("time", "A1", "lower", "upper")

df_A1 <- df[which(df$coefficient=='faceFace_B' & df$electrode=='A1'), ]
err <- fitted_model$std_err$A1[seq(from=1, to=dim(fitted_model$std_err)[1], by=3)]
dataA1 <- round(data.frame(time = fitted_model$timings$time[which(fitted_model$timings$epoch==1)],
                           A1_faceFace_B = df_A1$amplitude, lower = df_A1$amplitude - err,
                           upper = df_A1$amplitude + err), 2)
# colnames(dataA1) <- c("time", "A1_faceFace_B", "lower", "upper")

plot_ly(data=dataA1, x=~time, y=~A1_faceFace_B, name = 'Electrode=A1, Epoch=2, Factor=faceFace_B', 
        type = 'scatter', mode = 'lines', line = list(width = 2)) %>% 
  add_trace(y=~lower, name = 'A1 lower CI', line = list(width = 1, dash = 'dash'))  %>% 
  add_trace(y=~upper, name = 'A1 upper CI', line = list(width = 1, dash = 'dash')) %>%
  layout(title="GLM Model Coeff: Electrode=A1, Epoch=2, Factor=faceFace_B", 
         xaxis=list(title="Frequency (Hz)"), yaxis=list(title="mu*V^2/Hz(dB)"), 
         legend = list(orientation='h'))

Alternatively, we can report separate model coefficient estimates for each of the two levels, Face A and Face B, which may facilitate a subsequent second-level data analysis.

fitted_model_NoInt <- fit_glm( ~ 0 + face + phase_coherence, data = limoSubj1)
# as.data.frame(fitted_model_NoInt, long = TRUE) %>%
#   ggplot(aes(x = time, y = amplitude, colour = electrode)) +
#   geom_line() +
#   facet_wrap(~coefficient, scales = "free") +
#   theme(legend.position = "none")

#  retrieve the coefficients for each time point for each electrode.
df <- as.data.frame(fitted_model_NoInt, long = TRUE) 
df_faceFace_A <- df[which(df$coefficient=='faceFace_A'), ]
df_faceFace_B <- df[which(df$coefficient=='faceFace_B'), ]
df_phase_coherence <- df[which(df$coefficient=='phase_coherence'), ]

namesFaceA <- paste0("FaceA:", df_faceFace_A$electrode)
namesFaceB <- paste0("FaceB:", df_faceFace_B$electrode)
namesPhaCoh <- paste0("PhaCoh:", df_phase_coherence$electrode)

# plotly graph
pl_faceA <- plot_ly(data=df_faceFace_A, x=~time, y=~amplitude, color=~electrode, 
                  type="scatter", mode="lines", name=namesFaceA) %>% layout(showlegend = FALSE)
pl_faceB <- plot_ly(data=df_faceFace_B, x=~time, y=~amplitude, color=~electrode, 
                    type="scatter", mode="lines", name=namesFaceB)  %>% layout(showlegend = FALSE)
pl_phaseCoh <- plot_ly(data=df_phase_coherence, x=~time, y=~amplitude, color=~electrode, 
                       type="scatter", mode="lines", name=namesPhaCoh) %>% layout(showlegend = FALSE)

subplot(pl_faceA, pl_faceB, pl_phaseCoh, nrows=3, shareX = TRUE) %>%
  layout(title="LM Coefficient Estimates (Face-A, Face-B, Phase-Coherence)", hovermode = 'x unified')

If necessary, we can scale continuous covariates (e.g., phase_coherence) using the scale() function to make these unitless quantities, like standard z-scores. Then we can report the linear model \(R^2\) value, which quantifies the model fit. \[R^2 =\frac{Variance\ Explained\ by\ the \ Model}{Total\ Variance\ in\ the\ Data}.\]

fitted_scaledModel <- fit_glm( ~ 0 + face + scale(phase_coherence), data=limoSubj1)
# as.data.frame(fitted_scaledModel, long = TRUE) %>%
#   ggplot(aes(x = time, y = amplitude, colour = electrode)) +
#   geom_line() +
#   facet_wrap(~coefficient, scales = "free") +
#   theme(legend.position = "none")

## ggplot graph
# fitted_scaledModel$r_sq %>%
#   tidyr::pivot_longer(cols=channel_names(limoSubj1), names_to="electrode", values_to="r_sq") %>%
#   ggplot(aes(x = time, y = r_sq, colour = electrode)) +
#   geom_line() +
#   theme(legend.position = "none")

df <- fitted_scaledModel$r_sq %>%
  tidyr::pivot_longer(cols=channel_names(limoSubj1), names_to="electrode", values_to="r_sq")

plot_ly(data=df, x=~time, y=~r_sq, color=~electrode, 
                  type="scatter", mode="lines", name=~electrode) %>% 
  layout(title="GLM Model (Y ~ 0 + face + scale(phase_coherence))\n R^2 for all Electrodes", 
         xaxis=list(title="time"), yaxis=list(title="R^2"), 
         legend = list(orientation='h'))

Let’s define a new glm.predict() method to forecast the EEG channel amplitudes using the GLM model. Recall that electrodes (\(EXG1-EXG4\)) are placed around the eyes to record eye movements.

fitted_Model <- fit_glm( ~ 0 + face + phase_coherence, data=limoSubj1)
# there is no predict(fitted_scaledModel, limoSubj1) method in eegUtils package

df <-  as.data.frame(fitted_scaledModel, long = TRUE)

df_faceFace_A <- df[which(df$coefficient=='faceFace_A'), ]
df_faceFace_B <- df[which(df$coefficient=='faceFace_B'), ]
df_phase_coherence <- df[which(df$coefficient=='phase_coherence'), ]

# Matrix multiplication model estimation, 
# Note that if the design matrix x has an intercept, we need to include a first column of 1's
#     x <- cbind(1, face, df_phase_coherence)
# match the order of predictors/covariates
# X <- cbind(face, df_phase_coherence)
# fittedValues <- X %*% coeff_df

#  face           : Factor w/ 2 levels "Face_A","Face_B": 2 1 2 1 1 ...
#  phase_coherence: Named num [1:1055] 0.6 0.55 0.3 0.1  ............
fitted_model <- fit_glm(~ face + phase_coherence, data = limoSubj1)

df <- as.data.frame(fitted_model, long = TRUE) 
# stratify/subset the model DF, as desired
# electrodes <- c("EXG1", "EXG2", "EXG3", "EXG4")
# df1 <- select_elecs(df, electrodes)
# # df_2 <- select_times(df, time_lim=c(-0.3, 0.1))
# 
# df_intercept <- df1[which(df1$coefficient=='(Intercept)'), ]            ##  & df1$electrode=='EXG1'), ]
# df_faceFace_B <- df1[which(df1$coefficient=='faceFace_B'), ]            ##  & df1$electrode=='EXG1'), ]
# df_phase_coherence <- df1[which(df1$coefficient=='phase_coherence'), ]  ## & df1$electrode=='EXG1'), ]

#### Model Coefficients (\hat{\beta})
df_intercept <- df[which(df$coefficient=='(Intercept)'& df$electrode=='EXG1'), ]
df_faceFace_B <- df[which(df$coefficient=='faceFace_B'& df$electrode=='EXG1'), ]
df_phase_coherence <- df[which(df$coefficient=='phase_coherence' & df$electrode=='EXG1'), ]
coeff_df <- as.data.frame(cbind(interc=1, face=df_faceFace_B$amplitude, phaCoh=df_phase_coherence$amplitude))
dim(coeff_df)  #  [1] 201   3
## [1] 201   3
#### Design matrix (X)
X <- select_elecs(limoSubj1, electrode='EXG1')

## Note that length of vector X is 212055 = 201 (timepoints) * 1055 (epochs) # str(X)
## The "face" and "phase_coherence" variables only change with epoch, not with time!
X <- select_epochs(X, epoch_no=2)
Y <- X$signals$EXG1
X1 <- cbind(rep.int(1, length(Y)), 
            rep.int(X$epochs$face, length(Y)), rep.int(X$epochs$phase_coherence, length(Y)))

fittedValues <- array()

for (time in 1:dim(coeff_df)[1]) {
  fittedValues[time] <- X1[time,] %*% t(coeff_df)[, time]
}

times <- X$timings$time

# plotly graph
plot_ly(x=~times, y=~Y, type="scatter", mode="lines", name="Data") %>% 
  add_trace(y=~fittedValues, name="LIMO Fitted Values")  %>% 
  layout(title=paste0("Data vs. LIMO Model Prediction (corr=", round(cor(Y, fittedValues), 2),")"),
         xaxis=list(title="time"), yaxis=list(title="value"), legend = list(orientation='h'))

Once we select an electrode, e.g., ‘EXG1’, the length of the EEG signal \(X\), as a vector X, \(212,055 \equiv 201 (timepoints) \times 1055 (epochs)\). Also, the face and phase_coherence covariates (predictive variables) only change with epoch, but are unchanged over time.

The predictions \(\hat{y}\) are computed using the coefficient estimates, \(\hat{\beta}(Time)_{3\times 1}\) that are time-dependent and obtained via the GLM model fitting.

\[\underbrace{\hat{y}_{Time\times 1}}_{fitted\\ values} = {\overbrace{X_{Time\times 3}}^{design\ matrix}} \times {\underbrace{\ \hat{\beta}(Time)_{3\times 1}\ }_{LIMO \\ coefficients}}.\]

8 Electrocardiography (ECG or EKG)

Electrocardiograms (ECG or EKG) capture the electrical conduction of the heart and serve as proxy measures of normal and pathological cardiac processes.

ECG can be interpreted using pattern recognition or by ML/AI modeling of the electrical ECG signal related to cardiac electrophysiology. Standard ECG has 12 leads - the 6 limb leads (I, II, III, aVL, aVR, and aVF) are placed on the extremities (arms, legs), and the other 6 precordial (torso) leads are plaved on the precordium (V1, V2, V3, V4, V5 and V6). For limb leads, the augmented label a refers to leads that calculate a combination of leads I, II, and III.

The image below shows a normal 12-lead ECG trace of a 26-year-old male with an incomplete right bundle branch block (RBBB), i.e., a heart block in the right bundle branch of the electrical conduction system. The ECG waves and the intervals between them have (1) an expected patterns (morphology) and time durations (periods); and (2) a range of acceptable amplitudes (millivoltages). Deviation from acceptable normal tracing may be indicative of potentially pathological states. When tracking ECG signals, the amplitudes and intervals are printed on a standard scale graphing paper where a pixel of \(1 mm^2\) spatial resolution corresponds to \(40\ ms\) (milliseconds) of time along the longitudinal \(x\)-axis and \(0.1 millivolts\) along the amplitude \(y\)-axis.

Normal ECGs contain waves, intervals, segments and a single complex:

  • Waves - positive or negative deflection from a baseline indicative of different electrical events. ECG waves include P waves, Q waves, R waves, S waves, T waves, and U waves. In general, the QRS complex is about \(80-100 ms\), but is shorter in children. The quick repetition of the Q, R, and S waves may not be visible in all 12 leads but it reflects synchronous cardiac events. P waves indicate atrial depolarization. The QRS complex, composed of sequential Q, R, and S waves, represents the ventricular depolarization pattern. T waves trail the QRS complex to capture ventricular repolarization.
    • P waves - represent depolarization of the atria which spreads from the SA node towards the AV node, and from the right atrium to the left atrium. Typical duration is \(<80 ms\); unusually long P waves may indicate atrial enlargement.In principle, a large right atrium yields a tall and peaked P wave, whereas a large left atrium may manifest as a two-humped P wave.
    • T waves - represent the repolarization of the ventricles. Commonly, these are \(\sim 160 ms\) and appear upright in all leads except aVR and V1. Inverted T waves may reflect myocardial ischemia, left ventricular hypertrophy, high intracranial pressure, or other metabolic abnormalities. Peaked T waves can indicate hyperkalemia or very early myocardial infarction.
  • Intervals - time periods between two specific ECG events, which include PR intervals, QRS intervals (QRS durations), QT intervals, or RR intervals.
    • PR interval are usually of duration \([120,200] ms\), from the beginning of the P wave to the beginning of the QRS complex. It represents the time it takes the electrical impulse to travel from the sinus node through the AV node. Shorter (\(< 120 ms\)) or longer (\(>200 ms\)) PR intervals correspond to electrical impulses bypassing the AV node, and first degree atrioventricular block, resp[ectively. The PR segment is expected to be mostly completely flat, with depressions indicative of potential pericarditis.
  • Segments - represent the length between two specific time points on the ECG that are expected to be at the baseline amplitude (neither negative nor positive) including PR segments, ST segments, and TP segments. The ST segment represents the period of ventriclar depolarization. It is usually electrically neutral, with depressed or elevated amplitudes indicative of myocardial infarction, ischemia, pericarditis, or a normal phenotype.
  • Complex - an aggregate of multiple waves pooled together. The most central and visually noticeable part of the ECG is the QRS complex. It combines three ECG deflections corresponding to (1) the depolarization of the right heart ventricle; (2) depolarization of the left ventricles; and (3) contraction of the large ventricular muscles.
  • Point - the J point on the ECG indicates the ending location of the QRS complex and the start of the ST segment.

The main ECG parts include P waves, theQRS complex, and T waves. The model ECG framework is shown below.

Electrical activities of the heart are tracked via electrodes strategically positioned to measure and display the cardiac cycle. ECG graphs show the overall cardiac electrical potential (voltage) across the 12 leads representing 12 complementary view angles. The six limb leads the heart activity in a vertical plane. Voltage measures require a pair of poles (one negative and one positive) to track electrical currents. The negative poles serve as baseline references and the placement of their corresponding positive poles indicate “point of view” vectors. Some leads (e.g., I, II, and III) are bi-polar, measure electrical potential between two of the three limb electrodes. For instance, the heart view from the left is tracked by lead I, which captures the electrical potential between the right arm (a negative pole) and the left arm (a positive pole). Similarly, to capture an inferior view of the heart dynamics, lead II detects voltage between the right arm (negative) and the left leg (positive).

The augmented limb leads (e.g., aVR, aVL, and aVF) are uni-polar; solely using a single limb electrode as the positive pole paired with a reference negative pole computed as the average of inputs from the other two leads. For instance, aVR detects the upper-right side of the heart, whereas aVL tracks the upper-left side of the heart. Electrical depolarization towards a lead generates a positive signal (deflection), while depolarization away from a lead yields a negative deflection. The opposite is true for electrical repolarization where leads capturing the heart from different angles may have waves pointing in different directions.

The animated gif below shows the dynamics of the heart electrical conduction and ECG wave generation in the limb leads. The positive deflection on the ECG signal manifests as positive currents created by depolarization of cardiac cells travel from negative to positive electrodes.

8.1 Common ECG tasks

There are a number of clinically important phenotypes can be traced and interpreted using ECG signals. Examples include:

  • Determining the heart rate. The heart rate is important for monitoring for bradycardia (slower than normal heart rate), normal heart rate rhythm, or tachycardia (higher than normal rate, typically \(>100\) beats per second). The atrial rate is measured by the rate of the P wave, whereas the ventricular rate is tracked by the rate of the QRS complex.
  • Determining Axis on the ECG. Different ECG axes provide clues to some states of cardiopathologies. The main ECG axis to measure is the QRS complex axis. Each axis is computationally estimated in terms of degrees of deviation from zero. The QRS axis represents the mean electrical vector in the frontal plane and indicates the general direction of the ventricular depolarization. Its tertiary classification (normal, left deviated, or right deviated) represents (1) \(< -30^o\) representing left axis deviation; (2) normal QRS axis from \([−30,105^o]\), where \(0^o\) is along lead I with positive=inferior and negative=superior; and (3) \(>+105^o\) corresponding to right axis deviation. A rule-of-thumb in determining if the QRS axis is normal involves inspecting if the QRS complex is mostly positive in lead I and lead II. Generally, normal QRS axes point down and to the left, reflecting the anatomical orientation of the heart within the chest. Atypical QRS complex axis may be indicative of a change in the heart shape and orientation, or an abnormal electric conduction affecting the normal process of ventricular depolarization.
  • Searching for Abnormal Heart Rhythms. Arrhythmia is an abnormal heartbeat which may be harmless or may indicate a life-threatening condition. Brief or lasting arrhythmia may increase or decrease the heart rate, or manifest as erratic heart rhythm. The most common type is atrial fibrillation (AFib), where the upper heart chambers contract irregularly. Overtime, AFib may lead to blood clots, stroke, heart failure and other heart-related complications.
  • Identifying Acute MI and Ischemia. Some of the clear acute MI ECG readings include anterior ST segment elevations and inferior ST segment elevation MIs. However, identifying more subtle ECG changes is particularly difficult. Also is is hard to determine whether ST segment elevation may be due to ischemia of other causes, e.g., left ventricular aneurysm or left ventricular hypertrophy, as well as discriminating between ST segment depressions in the ECG signal due to digoxin (a medication used to treat heart failure and arrhythmias) or other causes.

This 2021 article demonstrated the use of standard 12-lead ECG data for rhythm-type ECG classification problems.

Let’s use the following PhysioNet ECG data (ecgiddb) to demonstrate some data ingestion, processing, modeling, and visualization. This study is based on 2005 Master’s thesis by Lugovaya TS, entitled Biometric human identification based on electrocardiogram, at the Faculty of Computing Technologies and Informatics, Electrotechnical University in Saint-Petersburg, Russia. The data is managed at PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.

The ECG database includes 310 signals from 90 people where each recording consists of:

  • ECG lead I (.dat file) recorded for 20 seconds, digitized at 500 Hz with 12-bit resolution over a nominal \(±10 mV\) range,
  • 10 annotated beats (.atr file) with unaudited R- and T-wave peaks annotations from an automated detector, and
  • Meta-data (header) information (.hea file) containing age, gender and recording date.

The raw ECG signals are noisy and contain high and low frequency noise. Each record includes raw and filtered signals:

  • Signal 0: ECG I (raw signal),
  • Signal 1: ECG I filtered (filtered signal).

More about the MIT ECG data format is available here.

8.1.1 Detecting R peaks in ECG data

### 1. Load ECG data 
path1 <- paste0(tempdir(),"rec_1.dat")
download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_1.dat?download", path1)
ecg1 <- readBin(path1, integer(), 500*30)
ecg1.df <- data.frame(ECG = ecg1, Seconds = c(1:length(ecg1))/500)

path2 <- paste0(tempdir(),"rec_2.dat")
download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_2.dat?download", path2)
ecg2 <- readBin(path2,integer(),500*30)
ecg.df <- data.frame(ECG1=ecg1[1:10000], ECG2=ecg2[1:10000], Seconds=(c(1:length(ecg2))/500)[1:10000])
head(ecg.df)
##       ECG1    ECG2 Seconds
## 1 -1441809 -720916   0.002
## 2 -1441808 -458769   0.004
## 3 -1507342 -262146   0.006
## 4 -1507343 -196608   0.008
## 5 -1507347  -65541   0.010
## 6 -1507346 -131079   0.012
#### 2. Detect R peaks in ECG data
# implements the part of the Pan & Tompkins algorithm for R peak detection in raw ECG data
# Pan, Jiapu, and Willis J. Tompkins. "A real-time QRS detection algorithm." IEEE Trans. Biomed. Eng.
# https://ieeexplore.ieee.org/document/4122029
# @param numerical vector of ECG signal
# @param sRate ECG sample rate
# @param lowCut Butterworth bandpass filter low cut value
# @param highCut Butterworth bandpass filter high cut value
# @param filterOrder Butterworth bandpass filter order value
# @param integrationWindow Convolution window size
# @param refractory Minimal space between peaks in milliseconds
# @param returnIndex If TRUE, the index for each R peak is returned, If FALSE, the corresponding time 
# @return numeric vector of detected R peaks 
detectRpeaks <- function(signal, sRate, lowCut = 0, highCut = 15, filterOrder = 1,
                         integrationWindow = 15, refractory = 200, returnIndex = FALSE){
  nyquist_freq = 0.5 * sRate
  low = lowCut / nyquist_freq
  high = highCut / nyquist_freq

  # Apply bandpass butterworth filter.
  bandpass <- signal::butter(n = filterOrder, W = c((lowCut/(sRate/2)), high), type = "pass")
  signal_filt <- signal::filtfilt(bandpass, c(rep(signal[1],sRate),signal,rep(signal[length(signal)],sRate)))
  signal_filt <- signal_filt[(sRate+1):(length(signal_filt)-sRate)]

  # Apply lag difference
  signal_diff <- diff(signal_filt)

  # Square signal
  signal_squared <- signal_diff^2
  signal_squared <- c(signal_squared[1], signal_squared)

  # Apply convolution on chunks of signal (100000 values)
  split_ecg <- split(signal_squared, ceiling(seq_along(signal_squared)/100000))
  split_ecg2 <- lapply(split_ecg, function(x) {
      xc <- stats::convolve(x,rep(1,integrationWindow),type="open")
      difflen <- length(xc) - length(x)
      xc <- xc[(difflen/2+1):(length(xc)-(difflen/2))]
      xc})
  signal_conv <- unlist(split_ecg2, use.names=FALSE)

  # Detect peaks on pre processed signal
  peaks <- c(0)
  limit <- mean(signal_conv)*3

  refractory <- sRate*(refractory/1000)
  x <- signal_conv
  for(i in c(2:(length(x)-1))){
    if((x[i] > limit) && (x[i]>x[i-1]) && (x[i]>x[i+1]) && ((i-peaks[length(peaks)])>refractory)) {
      peaks <- c(peaks,i)
    }
  }
  peaks <- peaks[2:length(peaks)]
  if(returnIndex) return(peaks)
  return(peaks/sRate)
}

peakRtimes1 <- detectRpeaks(ecg.df$ECG1, sRate=500) # these are times in seconds
peakRtimes2 <- detectRpeaks(ecg.df$ECG2, sRate=500)

# peakRindices1 <- detectRpeaks(ecg.df$ECG1, sRate=500, returnIndex=TRUE)   # for time indices!
# peakRindices2 <- detectRpeaks(ecg.df$ECG2, sRate=500, returnIndex=TRUE)

### 3. Plot R-peaks on top of ECG
library(plotly)
ecg.df %>% plot_ly(x=~Seconds, y=~ECG1, type="scatter", mode="lines", name="ECG1", 
                   line = list(color = 'red', width = 1)) %>%
  add_trace(y=~ECG2, type="scatter", mode="lines", name="ECG2", 
            line = list(color = 'blue', width = 1))  %>%
  add_segments(x=peakRtimes1, y=min(ecg.df$ECG1)/2, xend=peakRtimes1+0.01, yend=max(ecg.df$ECG1)/2,
               line = list(color = 'gray', width = 4), opacity=0.3, name="R Peak ECG 1") %>%
  add_segments(x=peakRtimes2, y=min(ecg.df$ECG2)/2, xend=peakRtimes2+0.01, yend=max(ecg.df$ECG2)/2,
               line = list(color = 'brown', width = 4), opacity=0.3, name="R Peak ECG 2") %>%
  layout(title="Two ECG Signals with Detected R-Peaks", xaxis=list(title="seconds"),
           yaxis=list(title="Amplitude"), legend = list(orientation='h'))

8.1.2 Detecting Heart Beats

The QRS complex occurrences determine the heart rate variability (HRV) time series capturing the R-to-R series (RR intervals) between consecutive heart beats, inter-beat intervals or interval function. The RR intervals are the differences between successive R-wave oscillations. Recall that above we calculated the times \(t_n\) of the R-wave occurrences.

The \(n^{th}\) RR interval is \[RR_n = \alpha (t_n − t_{n−1}),\] where the normalizing parameter \(\alpha\) accounts for the desired measuring unit of the RR time series. The standard RR interval measuring unit is \(ms\) (millisecond). When the time occurrences are tracked in seconds (\(s\)), then \(\alpha = 1,000\).

Complementary to the HRV, we can define the sequence of instantaneous heart rates

\[HR_n = \frac{\beta}{t_n − t_{n−1}},\] where a normalization parameter \(\beta=60\) expresses the heart rate in Beats Per Minute (bpm) and corresponds to time occurrence measures in seconds.

The \(\{RR_n\}_n\) time-series consist of paired time and heart rates \(\{(t_n, RR_n)\}\), where the time series \(\{t_n\}\) must be explicitly specified as R-wave occurrences are not necessarily regular. This irregular, non-equidistant, sampling requires tracking the (time-value, heart rates) pairs, not just the RR intervals.

Prior to frequency-domain analysis (e.g., using Fourier or Wavelet base representations), we need to take care of this incongruence, e.g., by uniformly resampling the time series signal. Multiple alternative approaches for homogenizing the sampling rate can be applied. The package RHRV uses signal-interpolation for transforming the non-uniformly sampled RR series into regularly sampled signal.

If we assume that the sampling times of the RR signals are regular (equidistant), we can construct a tachogram, of the RR intervals as a function of the heart beat index. The result of this assumption would lead to power spectrum signal representation that is not a function of the frequency, but a function of cycles per beat.

The RHRV handbook provides significant details about spatial and frequency domain HRV analytics.

We can use the following heart beat detection function heartBeat() to compute the wavelet decomposition of the ECG time series signal and called with the following input arguments:

heartBeat() Inputs:

  • data - ECG Data frame
  • SampleFreq - sampling frequency of ECG signal in Hz
  • waveBase - specification of a wavelet basis to use in the wavelet decomposition of the ECG signal
  • waveThr - wavelet coefficient threshold value to detect heart beats in the wavelet domain.

heartBeat() Outputs:

  • dataECG - original ECG data frame
  • coeff - coefficients of the 2nd level (W2) wavelet decomposition of the ECG signal (dataECG.ecg),
  • Rpeaks - indices of R peaks in {coeff},
  • Rall - data frame with the detected heart-beat results including the following columns: Rtrue_idx (indices of R peaks in {dataECG}), Rtrue_sec (time moments of {Rtrue_idx} in seconds), RtoRext (R-R intervals in index space, number of samples), and RtoR_secext (R-R intervals in time, seconds).
### install.packages("RHRV")
### install.packages("wavelets")

path1 <- paste0(tempdir(),"rec_1.dat")
download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_1.dat?download", path1)
ecg1 <- readBin(path1, integer(), 500*30)
ecg1.df <- data.frame(ecg = ecg1, time = c(1:length(ecg1))/500)

path2 <- paste0(tempdir(),"rec_2.dat")
download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_2.dat?download", path2)
ecg2 <- readBin(path2,integer(),500*30)
ecg.df <- data.frame(ECG1=ecg1[1:10000], ECG2=ecg2[1:10000], Seconds=(c(1:length(ecg2))/500)[1:10000])
head(ecg.df)
##       ECG1    ECG2 Seconds
## 1 -1441809 -720916   0.002
## 2 -1441808 -458769   0.004
## 3 -1507342 -262146   0.006
## 4 -1507343 -196608   0.008
## 5 -1507347  -65541   0.010
## 6 -1507346 -131079   0.012
################# Define a heartBeat estimating function
#' @param data ECG Data frame 
#' @param SampleFreq ECG sampling frequency in Hz
#' @param waveBase  wavelet basis to use in the wavelet decomposition of the ECG signal
#' @param waveThr wavelet coefficient threshold value to detect heart beats in the wavelet domain
#' @return returns a data object containing 4 members:
#'    {dataECG}: original ECG data frame
#'    {coeff}:  coefficients of the 2nd level (L2) wavelet decomposition of the ECG signal (dataECG$ecg),
#'    {Rpeaks}: indices of R peaks in {coeff},
#'    {Rall}:   data frame with the heart-beat detection results. The columns include"
#'              {Rtrue_idx}:  indices of R peaks in {dataECG},
#'              {Rtrue_sec}:  time moments of {Rtrue_idx} in seconds,
#'              {RtoRext}:    R-R intervals in index space (number of samples), starting with zero,
#'              {RtoR_secext}: R-R intervals in time (seconds), starting with zero
heartBeat <- function(data, SampleFreq = 500, waveBase="d10", waveThr = 9) {

  # Choose a wavelet basis (wavelets::wt.filter())
  # Daubechies      (d)   2,4,6,8,10,12,14,16,18,20
  # Least Asymetric (as)  8,10,12,14,16,18,20
  # Best Localized  (bl)  14,18,20
  # Coiflet         (c)   6,12,18,24,30
  waveletBaseList <- c("d2","d4","d6","d8","d10","d12","d14","d16","d18","d20",
                       "as8","as10","as12","as14","as16","as18","as20",
                        "bl14","bl18","bl20",
                        "c6","c12","c18","c24","c30")
  if (!(waveBase %in%  waveletBaseList)) wavelet <- "d10" # deafult Daubechies d10 wavelet decomposition
  else wavelet <- waveBase
  level <- 5       # presenting the highest frequency level of wavelet decomposition
  # message(sprintf("wavelet-bases set to %s", wavelet))
  
  df <- data.frame(matrix(NA, nrow = sum(!is.na(data$ecg)), ncol = 2))
  names(df) <- c("idx", "ecg")
  df$idx <- which(!is.na(data$ecg))
  df$ecg <- data$ecg[df$idx]

  X <- as.numeric(df$ecg)

  ecg_wav <- wavelets::dwt(X, filter=wavelet, n.levels=level, boundary="periodic", fast=TRUE)

  # for R peak detection, use the coefficients of the second level (W2) of wavelet decomposition 
  x <- ecg_wav@W$W2

  # initialize the vector for detected R peaks
  Rpeaks <- matrix(0,1,length(x))

  # Iterate sweeping the L2 coefficients for local maxima
  i <- 2
  while (i < length(x)-1) {
    if ((x[i]-x[i-1]>=0) && (x[i+1]-x[i]<0) && x[i]>waveThr)  Rpeaks[i] <- i
    i <- i+1
  }

  # Clear all zero values from Rpeaks vector
  Rpeaks <- Rpeaks[Rpeaks!=0]

  # Since L2 coeffs are used, the multiply the results by 4, to scale the results to the original signal length
  Rtrue <- Rpeaks*4

  # Check results with the original signal
  for (k in 1:length(Rtrue)) {
    if (Rtrue[k] > 10) Rtrue[k] <- Rtrue[k]-10+(which.max(X[(Rtrue[k]-10):(Rtrue[k]+10)]))-1
    else  Rtrue[k] <- which.max(X[1:(Rtrue[k]+10)])
  }

  Rtrue <- unique(Rtrue)
  Rtrue_idx <- df$idx[Rtrue]

  # Determine the Rpeak-Rpeak intervals in samples and seconds and average heart rate
  RtoR <- Rtrue_idx[-1]-Rtrue_idx[1:length(Rtrue_idx)-1]
  RtoR_sec <- (data$time[Rtrue_idx[-1]] - data$time[Rtrue_idx[1:length(Rtrue_idx)-1]])/1000

  avgHR = 60/mean(RtoR_sec)
  avgHR = as.integer(avgHR)

  # save the information about detected R peaks to Rsec_data (ascii file)
  Rtrue_sec <- (data$time[Rtrue_idx] - data$time[1])/1000;
  Rtrue_sec <- round(Rtrue_sec, 3)
  # write(Rtrue_sec,"Rsec_data.txt", 1)

  RtoRext <- c(0, RtoR)
  RtoR_secext <- c(0, RtoR_sec)
  Rall <- data.frame(Rtrue_idx, Rtrue_sec, RtoRext, RtoR_secext)

  return(list(dataECG = df, coeff = x, Rpeaks = Rpeaks, Rall = Rall))
}

hr_df <- heartBeat(ecg1.df, SampleFreq = 500, waveBase="d10", waveThr = 9)

# Plot the detected heart-beats on top of the ECG time-course
ecg1.df %>% plot_ly(x=~time, y=~ecg, type="scatter", mode="lines", name="Raw ECG", 
                   line = list(color = 'red', width = 1)) %>%
  add_segments(x=~time[hr_df$Rall$Rtrue_idx], y=min(ecg.df$ECG1)/2, 
               xend=~(time[hr_df$Rall$Rtrue_idx]+0.0001), yend=max(ecg.df$ECG1)/2,
               line = list(color = 'gray', width = 4), opacity=0.3, name="R Peaks ECG") %>%
  layout(title="Raw ECG Signal with Detected R-Peaks", xaxis=list(title="seconds"),
           yaxis=list(title="Amplitude"), legend = list(orientation='h'))

SOCR Resource Visitor number Web Analytics SOCR Email