SOCR ≫ | BPAD Website ≫ | BPAD GitHub ≫ |
# Install the `eegUtils` package
# library(remotes)
# remotes::install_github("craddm/eegUtils")
library(eegUtils)
The R package eegUtils
can be used to model EEG data and meta-data, e.g., channel locations. There are different types of EEG data, e.g., classes for epoched (eeg_epochs
) and continuous (eeg_data
), and time-frequency representation data (eeg_tfr
).
The generic eeg_data
object represents all data types. The raw data class is a list of entries:
signals
- A data frame containing the actual EEG data in wide format, where each column contains data from a single electrode and each row represents a single point in time.srate
- A single numeric value giving the sampling rate of the data in \(Hz\).events
- A 3-column data frame (tibble) describing the events recorded in the data; event_onset
(in samples), relative to recording onset; event_time
(in seconds), relative to recording onset; and event_type
(integer or other).chan_info
- Initially NULL, since BioSemi data format (BDF) and Neuroscan CNT (CNT) data format types do not contain (usable) electrode locations. Use head(eegUtils:::electrodeLocs)
to see format of this data frame (tibble), which stores channel location information similar to EEGLAB (Electrode locations) style including the following columns:
electrode
- electrode namesradius
- Spherical coordinates (Radius is typically normalized to 1)theta
- Spherical coordinates (theta)phi
- Spherical coordinates (theta)cart_x
- Cartesian 3D coordinatescart_y
- Cartesian 3D coordinatescart_z
- Cartesian 3D coordinatesx
and y
- 2D Stereographic projection of the spherical coordinates.timings
- A data frame describing each row in time (\(s\)) and sample numbers (\(samples\)).epochs
- For epoched data, a data frame with information about the epochs in the data.reference
- Initially NULL list, without a recorded reference channel. After the data is referenced, it includes pairs of ref_chans
(labels for channels used to calculate the reference data or “average”) and excluded
(labels for any channels excluded from the reference data).The eeg_epochs
objects have the same structure as eeg_data
objects, however, the events
table has two additional columns, epoch
(storing the epoch number to which a given event belongs) and time
(codes the time point at which the event occurs relative to the epoch onset. The event_time
contains the time point of event occurrence relative to the recording onset, see events(demo_epochs)
.
The eeg_tfr
objects contain the time-frequency representations of the eeg_epochs
objects including:
signals
- A 4D tensor - \(epoch \times time \times electrode \times frequency\),dimensions
- Tracking matrix dimensions and corresponding properties.The eeg_ICA
objects store analytical results, e.g., ICA or SSD decomposition, applied to an eeg_epochs
object:
mixing_matrix
- weights converting the source data into the original data,unmixing_matrix
- weights converting the original data into source data,signals
- individual component activations.The eegUtils
package performs basic EEG preprocessing and plotting of EEG signals. EEG processing starts with importing raw data, import_raw()
, in .BDF (BioSemi), .CNT (32-bit Neuroscan), or .vhdr/.vmrk/.dat (Brain Vision Analyzer 2.0) file formats. Internally, the structure of eeg_data
objects consists of the raw data and the corresponding meta-data.
There are many EEG, MRI, PET and other free imaging datasets available here. Let’s use this eegUtils example data, in .bdf format, which represents an experiment with volunteers attending to left or right visual fields indicated by specific visual cues (\(\leftarrow\) arrows \(\rightarrow\)). About a second following the visual cue, a Gabor patch target appears in the left or right visual field and the participants have to determine if the target patch shows a vertical or a horizontal grating. Most of the time (\(\frac{4}{5}\)), the target appears in the cued location and the remaining times (\(\frac{1}{5}\)) in the contralateral location. The data file “Matt-task-spatcue.bdf” is available here.
In the experiments below, we will use the Matt-task-spatcue.bdf dataset. More information about this EEG study design is available in this article (DOI: 10.1523/JNEUROSCI.1993-18.2019). This is a study of alpha brain oscillations (\(8–12 Hz\)), which can be used to study human attention and distraction. This study of alpha oscillations aims to identify differences between focused (\(0/100\%\)) vs. divided (\(40/60\%\)) attention. It demonstrates that human attention is associated with spatially distinct synchronized alpha sources in occipital and parietal brain regions.
library(eegUtils)
<- tempdir()
temp_dir <- file.path(temp_dir, "Matt-task-spatcue.bdf")
temp_file download.file("https://osf.io/hy5wq/download", temp_file, mode = "wb")
<- import_raw(temp_file) eegExample
## Importing C:\Users\Dinov\AppData\Local\Temp\RtmpysPBBi/Matt-task-spatcue.bdf as BDF
eegExample
## EEG data
##
## Number of channels : 72
## Electrode names : Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6 EXG7 EXG8
## Sampling rate : 256 Hz
## Reference :
## Signal length: 0 1468.996 seconds
To reduce the size of the data, the original EEG data recorded at \(1024 Hz\) was downsampled to \(256 Hz\). The data is acquired using \(64\) electrodes that are placed and named according to the EEG 10-05 international system convention. Eight additional electrodes included (EXG1-EXG4 positioned around the eyes to record eye movements), reference electrodes (EXG5 and EXG6 positioned on the left and right mastoids), and empty channels with no attached electrodes (EXG7 and EXG8).
To begin with, let’s first re-reference the data by using the eegUtils::eeg_reference()
command. The reference electrode used in recording EEG data represents a common atlas reference for the data, including TP10 in the 10-20 System, digitally-linked mastoids, computed post-hoc, the vertex electrode (Cz), single or linked earlobes, the nose tip, etc. Some systems with active electrodes (BIOSEMI Active Two) may record data without reference locations and rely on post-hoc processing during data import to avoid the presence of \(40 dB\) noise in the data. When the data are recorded in one reference framework, they can be re-referenced to another atlas, reference channel, or channel combination. If no electrodes meta-data is available, the data may be referenced to a common average calculated from all the electrodes in the data.
In our case, using the select_elecs()
function, we can remove the two empty channels, EXG7 and EXG8 (down to 70 channels), and then re-reference the data using the eeg_reference()
method (the prior empty reference changes to “average”).
<- select_elecs(eegExample, electrode = c("EXG7", "EXG8"), keep = FALSE)
eegExample <- eeg_reference(eegExample,ref_chans = "average")
eegExample eegExample
## EEG data
##
## Number of channels : 70
## Electrode names : Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6
## Sampling rate : 256 Hz
## Reference : average
## Signal length: 0 1468.996 seconds
Next, we can filter the data using the eeg_filter()
function, which utilizes infinite impulse response (IIR) or finite impulse response (FIR) filters to modify the frequency response of the raw signal, and as necessary, remove low or high frequency fluctuations in the time-courses. As a demonstration, we can use IIR filtering – high-pass filter at \(0.2 Hz\) and low-pass filter at \(70 Hz\) and plot the power spectral density of the data before and after filtering.
library(plotly)
# Raw data power spectral density (PSD)
# plot_psd(eegExample, freq_range = c(0, 60), legend = FALSE)
# Filtered data power spectral density
<- eeg_filter(eegExample, method = "iir", low_freq = 0.2, high_freq = 70,
eegExample filterOrder = 4) # specify a bandpass filter
# Create a PSD plot
#' @param psd_out PSD to plot
#' @param freq_range Frequency range to plot
#' @return plotly object showing power spectral density
<- function(psd_out, freq_range, chan_names) {
psdPlot if (!is.null(freq_range)) {
if (length(freq_range) < 2 | length(freq_range) > 2) {
message("freq_range must be a vector of length 2 ...")
else {
} <- psd_out$frequency >= freq_range[[1]] & psd_out$frequency <= freq_range[[2]]
rows <- psd_out[rows, ]
psd_out
}
}
<- tidyr::pivot_longer(psd_out, cols = dplyr::all_of(chan_names),
psd_out names_to = "electrode", values_to = "power")
# str(psd_out)
# tibble [4,200 x 3] (S3: tbl_df/tbl/data.frame)
# $ frequency: num [1:4200] 1 1 1 1 1 1 1 1 1 1 ...
# $ electrode: chr [1:4200] "Fp1" "AF7" "AF3" "F1" ...
# $ power : num [1:4200] 64.22 60.07 16.16 5.15 4.75 ...
# ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
# stat_summary(geom = "line", fun = mean) + # geom_line() +
# theme_bw() +
# ylab(expression(paste(mu, V^2, "/ Hz(dB)"))) +
# xlab("Frequency (Hz)") +
# scale_x_continuous(expand = c(0, 0))
%>% plot_ly(x=~frequency, y=~(10*log10(power)), color=~electrode,
psd_out type="scatter", mode="lines") %>%
layout(title="EEG data power spectral density (PSD)", xaxis=list(title="Frequency (Hz)"),
yaxis=list(title="mu*V^2/Hz(dB)"), legend = list(orientation='h'))
}
# Compute the data power spectral density (PSD)
<- eegUtils::compute_psd(eegExample, keep_trials=TRUE, n_fft=256, seg_length=NULL, noverlap=NULL)
psd_out psdPlot(psd_out, freq_range = c(0, 70), chan_names = channel_names(eegExample))
EEG data can be epoched around events or triggers using the epoch_data()
function. The resulting eeg_epochs
objects represent lists of the event triggers discovered in the data. We can examine these triggers by the list_events()
or events(eegExample)
methods. We can epoch around event types \(120\) and \(132\) - these correspond to onsets of a visual target on the left and right of fixation, respectively. The length of epochs around the trigger is specified by the time_lim
variable. In this process, we will baseline-correct the data using the average of the time points between \(-0.1s\) and to \(0.0s\) (stimulus onset). Then, we can use the epochs()
method to retrieve meta-information for the data epochs.
list_events(eegExample)
## event_type
## 1 254
## 2 100
## 3 60
## 4 200
## 5 122
## 6 20
## 7 62
## 8 120
## 9 25
## 10 132
## 11 130
<- epoch_data(eegExample, events = c(120,132), baseline = c(-0.1, 0.0),
epochedExample epoch_labels = c("valid_left", "valid_right"), time_lim = c(-0.1, 0.4))
epochs(epochedExample)
## # A tibble: 160 x 5
## epoch participant_id recording event_type epoch_labels
## <dbl> <lgl> <chr> <dbl> <chr>
## 1 1 NA Matt-task-spatcue 120 valid_left
## 2 2 NA Matt-task-spatcue 132 valid_right
## 3 3 NA Matt-task-spatcue 120 valid_left
## 4 4 NA Matt-task-spatcue 120 valid_left
## 5 5 NA Matt-task-spatcue 120 valid_left
## 6 6 NA Matt-task-spatcue 120 valid_left
## 7 7 NA Matt-task-spatcue 120 valid_left
## 8 8 NA Matt-task-spatcue 120 valid_left
## 9 9 NA Matt-task-spatcue 132 valid_right
## 10 10 NA Matt-task-spatcue 120 valid_left
## # ... with 150 more rows
Averaging over epochs or electrodes (e.g., in parietal lobe), the eeg_epochs
data can be plotted using plot_ly()
, plot_butterfly()
, or plot_timecourse()
. Baseline correction can also be applied for plotting only using the baseline
parameter in the plotting call. Below is a schematic of the EEG electrode positions in the 10-10 system combinatorial nomenclature,
A butterfly plot of the EEG data, over a short time-frame for all locations, is shown below.
<- function(data, time_lim=NULL, baseline=NULL, continuous=FALSE, allow_facets=FALSE, ...) {
plotButterfly <- dplyr::group_by(data, time, electrode)
data <- dplyr::summarise(data, amplitude = mean(amplitude))
data <- dplyr::ungroup(data)
data
# select time-range of interest
if (!is.null(time_lim)) data <- select_times(data, time_lim)
if (!is.null(baseline)) data <- rm_baseline(data, baseline)
plot_ly(data=data, x=~time, y=~amplitude, color=~electrode, type="scatter", mode="lines") %>%
layout(title="EEG Data Butterfly Plot", xaxis=list(title="time"),
yaxis=list(title="Amplitude (mu*V)"), legend = list(orientation='h'))
}
<- function(data, time_lim = NULL, baseline = NULL, quantity = "coefficients") {
bf_DF # Select specific times
if (!is.null(time_lim)) data <- select_times(data, time_lim = time_lim)
# If asked, perform baseline correction
if (!is.null(baseline)) data <- rm_baseline(data, time_lim = baseline)
<- as.data.frame(data, long = TRUE, coords = FALSE, values = {{quantity}})
data
data
}
# Display the Butterfly Plot
<- bf_DF(epochedExample)
data1 plotButterfly(data1)
We can also plot the individual time-course of one specific electrode or a list of electrodes (e.g., occupying the same neighborhood).
# Wrapper to the time-course-plot method (tcPlot)
<- function(data, electrode = NULL, time_lim = NULL,
plotTimecourse_DF add_CI = FALSE, baseline = NULL, title="", ...) {
if (!is.null(electrode)) data <- select_elecs(data,electrode)
if (!is.null(baseline)) data <- rm_baseline(data,time_lim = baseline)
if (!is.null(time_lim)) data <- select_times(data,time_lim)
<- tcPlot(data,add_CI = add_CI, title=title)
tc_plot
tc_plot
}
# Actual time-course plotter
<- function(data, add_CI, quantity = amplitude, title="") {
tcPlot # GGPlot solution
<- ggplot2::ggplot(data, aes(x = time, y = {{quantity}}))
tc_plot
if (add_CI) { tc_plot <- tc_plot +
stat_summary(fun.data = mean_cl_normal,geom = "ribbon",linetype = "dashed",
fill = NA, colour = "black", size = 1,alpha = 0.5)
}
<- tc_plot +
tc_plot stat_summary(fun = "mean", geom = "line", size = 1.2)
if(title=="") title = "Electrode = POz (Amplitude +/- SE"
+
tc_plot labs(title=title, x="Time (s)", y="Amplitude (muV)", fill = "") +
geom_vline(xintercept = 0, linetype = "solid", size = 0.5) +
geom_hline(yintercept = 0, linetype = "solid", size = 0.5) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
theme_minimal(base_size = 11) +
theme(panel.grid = element_blank(), axis.ticks = element_line(size = .5)) +
guides(colour = guide_legend(override.aes = list(alpha = 1)))
ggplotly(tc_plot) %>%
layout(title=title,
xaxis=list(title="Time (s)"), yaxis=list(title="Amplitude (muV)"))
# # Plot_ly() Solution
# data <- dplyr::group_by(data, time, electrode)
# data <- dplyr::summarise(data, amplitude = mean(amplitude))
# data <- dplyr::ungroup(data)
#
# tc_plot <- plot_ly(data=data, x = ~time, y = ~amplitude, type="scatter", mode="lines")
# if (add_CI) {
# data2 <- smean.cl.normal(data1$amplitude)
# data$amplitudeLowerBound <- data$amplitude - data2["Lower"]
# data$amplitudeUpperBound <- data$amplitude - data2["Upper"]
#
# tc_plot <- tc_plot %>%
# add_trace(y=~data$amplitudeLowerBound, name="Lower-Bound") %>%
# add_trace(y=~data$amplitudeUpperBound, name="Upper-Bound")
# }
#
# tc_plot <- tc_plot +
# stat_summary(fun = "mean", geom = "line", size = 1.2)
#
# tc_plot +
# labs(x = "Time (s)", y = expression(paste("Amplitude (", mu, "V)")), fill = "") +
# geom_vline(xintercept = 0, linetype = "solid", size = 0.5) +
# geom_hline(yintercept = 0, linetype = "solid", size = 0.5) +
# scale_x_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
# scale_y_continuous(breaks = scales::pretty_breaks(n = 4), expand = c(0, 0)) +
# theme_minimal(base_size = 12) +
# theme(panel.grid = element_blank(), axis.ticks = element_line(size = .5)) +
# guides(colour = guide_legend(override.aes = list(alpha = 1)))
#
# Tester
# plotTimecourse_DF(data1, electrode = "POz") # Plot POz
}
# Display some time-sources at specific electrodes
plotTimecourse_DF(epochedExample, add_CI=TRUE, electrode = "POz",
title="Electrode = POz (Amplitude +/- SE") # Plot POz
plotTimecourse_DF(epochedExample, electrode=c("POz", "Oz", "O1", "O2"),
title="Average over Electrodes POz, Oz, O1, and O2") # average over four occipital electrodes
The electrode_locations()
method can be used to add standard channel locations. The topoplot()
method is used to render a 2D flat topographical representation of the data.
<- electrode_locations(epochedExample)
epochedExample channels(epochedExample)
## # A tibble: 70 x 9
## electrode radius theta phi cart_x cart_y cart_z x y
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Fp1 1 -94 -71 -29.4 83.9 -6.99 -30.6 88.9
## 2 AF7 1 -97 -51 -54.8 68.6 -10.6 -61.0 75.4
## 3 AF3 1 -76 -66 -33.7 76.8 21.2 -30.9 69.4
## 4 F1 1 -46 -64 -27.5 56.9 60.3 -20.2 41.3
## 5 F3 1 -60 -47 -50.2 53.1 42.2 -40.9 43.9
## 6 F5 1 -78 -37 -64.5 48.0 16.9 -62.3 46.9
## 7 F7 1 -98 -31 -70.3 42.5 -11.4 -84 50.5
## 8 FT7 1 -98 -10 -80.8 14.1 -11.1 -96.5 17.0
## 9 FC5 1 -73 -14 -77.2 18.6 24.5 -70.8 17.7
## 10 FC3 1 -49 -21 -60.2 22.7 55.5 -45.8 17.6
## # ... with 60 more rows
ggplotly(topoplot(epochedExample, time_lim = c(0.22, 0.24)))
Note that eegUtils
objects may be transformed into data frames, as needed.
library(ggplot2)
library(dplyr)
<- epochedExample %>%
gp select_epochs(epoch_no = 1:10) %>%
select_elecs(c("PO8", "Cz", "POz")) %>%
as.data.frame(long = TRUE) %>%
ggplot(aes(x = time, y = amplitude)) +
geom_line(aes(group = epoch), alpha = 0.2) +
stat_summary(fun = mean,geom = "line", size = 2, aes(colour = electrode)) +
facet_wrap(~electrode) +
theme_classic()
ggplotly(gp)
The method dplyr::mutate()
may be used to add new columns - effectively constructing regions of interest (ROIs) using multiple electrodes in a neighborhood (e.g., left parietal lobe).
<- epochedExample %>%
gp rm_baseline(time_lim = c(-.1, 0)) %>%
mutate(leftParietal = (Pz + P1 +P3 + P5 + P7 +P9) / 6) %>%
select(Pz, P7, leftParietal) %>%
filter(epoch <= 60, time > -0.1, time < 0.3) %>%
as.data.frame(long = TRUE) %>%
ggplot(aes(x = time, y = amplitude)) +
geom_line(aes(group = epoch), alpha = 0.2) +
stat_summary(fun = mean, geom = "line", size = 2, aes(colour = electrode)) +
facet_wrap(~electrode) +
scale_colour_viridis_d() +
theme_classic()
ggplotly(gp) %>% layout(legend = list(orientation='h'))
EEG signals often exhibit periodicity, such as cyclic oscillations in the alpha frequency range (\(\sim 8-13 Hz\)) observed in human EEG. The Power Spectral Density (PSD) can be used to analyze periodicity via the Fast Fourier transform (FFT).
The eegUtils::compute_psd()
method calculates the PSD for each trial separately, returning a dataframe with spectral power for each frequency and each electrode location. For epoched data, it computes the PSD for each epoch prior to averaging over all epochs and plotting the spectral harmonics.
<- compute_psd(epochedExample)
psdExample # psdPlot(psdExample, freq_range = c(0, 70), chan_names = channel_names(eegExample))
# psdPlot(psdExample, freq_range=c(0,125), chan_names=channel_names(eegExample))
# plot_psd(epochedExample)
<- tidyr::pivot_longer(psdExample, cols = channel_names(eegExample),
psd_out names_to = "electrode", values_to = "power")
ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
stat_summary(geom = "line", fun = mean) +# geom_line() +
theme_bw() +
ylab("mu V^2 / Hz(dB)") +
xlab("Frequency (Hz)") +
scale_x_continuous(expand = c(0, 0)) +
theme(legend.position="bottom")
ggplotly(gp) %>%
layout(title="Epoched EEG data power spectral density (PSD)", legend = list(orientation='h'))
# # Plot_ly() graph of epoched PSD
# psdEpochPlot <- function(psd_out, freq_range, chan_names) {
# if (!is.null(freq_range)) {
# if (length(freq_range) < 2 | length(freq_range) > 2) {
# message("freq_range must be a vector of length 2 ...")
# } else {
# rows <- psd_out$frequency >= freq_range[[1]] & psd_out$frequency <= freq_range[[2]]
# psd_out <- psd_out[rows, ]
# }
# }
#
# psd_out <- tidyr::pivot_longer(psd_out, cols = dplyr::all_of(chan_names),
# names_to = "electrode", values_to = "power")
# # ggplot(psd_out, aes(x = frequency, y = 10 * log10(power), colour = electrode)) +
# # stat_summary(geom = "line", fun = mean) +# geom_line() +
# # theme_bw() +
# # ylab(expression(paste(mu, V^2, "/ Hz(dB)"))) +
# # xlab("Frequency (Hz)") +
# # scale_x_continuous(expand = c(0, 0))
# psd_out %>% plot_ly(x=~frequency, y=~(10*log10(power)), color=~electrode,
# type="scatter", mode="lines") %>%
# layout(title="Epoched EEG data power spectral density (PSD)", xaxis=list(title="Frequency (Hz)"),
# yaxis=list(title="mu*V^2/Hz(dB)"), legend = list(orientation='h'))
# }
# psdEpochPlot(psdExample, freq_range = c(0, 125), chan_names = channel_names(eegExample))
Assuming stationarity, i.e., stable process in terms of frequency and power across the entire time domain, frequency analysis may discard some temporal information. EEG data is rarely stationary and its temporal dynamics change across wider timescales.
Time-frequency analysis attempts to account for (globally) non-stationary behaviors. This can be done by moving windows that decompose the signal into time-frequency space tiles of regional stationary signals.
The method eegUtils::compute_tfr()
calculates a time-frequency representation of the epoched EEG data using Morlet wavelets. Windowing the signal with these local oscillatory base functions controls for spectral leakage and provides time-frequency specificity. The support (temporal extent) of the Morlet wavelets specifies their cycles as integer numbers of cycles at each frequency level (frequency of interest, FOI).
<- compute_tfr(epochedExample, method="morlet", foi=c(4,30), n_freq=12, n_cycles=3)
tfrEpochedExample tfrEpochedExample
## Epoched EEG TFR data
##
## Frequency range : 4 6.36 8.73 11.09 13.45 15.82 18.18 20.55 22.91 25.27 27.64 30
## Number of channels : 70
## Electrode names : Fp1 AF7 AF3 F1 F3 F5 F7 FT7 FC5 FC3 FC1 C1 C3 C5 T7 TP7 CP5 CP3 CP1 P1 P3 P5 P7 P9 PO7 PO3 O1 Iz Oz POz Pz CPz Fpz Fp2 AF8 AF4 AFz Fz F2 F4 F6 F8 FT8 FC6 FC4 FC2 FCz Cz C2 C4 C6 T8 TP8 CP6 CP4 CP2 P2 P4 P6 P8 P10 PO8 PO4 O2 EXG1 EXG2 EXG3 EXG4 EXG5 EXG6
## Number of epochs : 160
## Epoch limits : -0.102 - 0.398 seconds
## Sampling rate : 256 Hz
We can report the temporal and frequency standard deviations as follows.
$freq_info$morlet_resolution tfrEpochedExample
## frequency sigma_f sigma_t n_cycles
## 1 4.000000 1.333333 0.11936621 3
## 2 6.363636 2.121212 0.07503019 3
## 3 8.727273 2.909091 0.05470951 3
## 4 11.090909 3.696970 0.04305011 3
## 5 13.454545 4.484848 0.03548725 3
## 6 15.818182 5.272727 0.03018456 3
## 7 18.181818 6.060606 0.02626057 3
## 8 20.545455 6.848485 0.02323944 3
## 9 22.909091 7.636364 0.02084172 3
## 10 25.272727 8.424242 0.01889249 3
## 11 27.636364 9.212121 0.01727669 3
## 12 30.000000 10.000000 0.01591549 3
Then we can plot the resulting time-frequency wavelet transformation.
# https://rdrr.io/github/neuroconductor/eegUtils/src/R/frequency_plotting.R
ggplotly(plot_tfr(tfrEpochedExample)) %>%
layout(title="Epoched EEG time-frequency wavelet transformation (Morlet)", legend = list(orientation='h'))
EEG baseline correction to allow comparison, contrast, and statistical inference on the signals across participants. The plots below show baseline correction using the following two strategies:
db
- uses subtraction to convert the EEG data to dB-change,absolute
- simple absolute value thresholding correction,ggplotly(plot_tfr(tfrEpochedExample, baseline_type = "absolute", baseline = c(-.1, 0))) %>%
layout(title="Epoched EEG Baseline Correction by dB-change", legend = list(orientation='h'))
ggplotly(plot_tfr(tfrEpochedExample, baseline_type = "db", baseline = c(-.1, 0))) %>%
layout(title="Epoched EEG Baseline Correction by Absolute Value Thresholding",
legend = list(orientation='h'))
# subplot(pl1, pl2)
For linear modeling and inference, instead of using the first EEG study of alpha brain oscillations we used above, we will now use a more specialized study specifically designed for linear modeling (LIMO). More information about the LIMO EEG Dataset is available here (DOI: 10.7488/ds/1556) and here (DOI: 10.3389/fnins.2020.610388).
The study-design represents a controlled experiment where participants discriminate between two different types of human faces with varying phase coherence in the range \(0\% - 85\%\). The phase coherence of the facial images determine how easy or hard it is to discriminate between faces.
In this experiment, the real data is very large, e.g., the raw data from subject 1 is over \(300Mb\) in size.
# install.packages("hdf5r")
library(eegUtils)
library(R.matlab) # needed to load in matlab data (*.mat files)
setwd("E:/Ivo.dir/Research/Proposals/SOCR/2021/IHPI_SOCR_Summer_2021_Fellowships_Apps/Matt_Ratanapanichkich_MedicalPhysics_2021/other/EEG_Data/")
<- import_set("limo_dataset_S1.set")
limoSubj1 <- R.matlab::readMat("continuous_variable.mat")
limoContinuousVar <- readr::read_csv("categorical_variable.txt",
limoCategoricalVar col_names = c("cond_lab"), show_col_types = FALSE)
The method eegUtils::fit_glm()
fits a linear model using the epochs
data field as predictors. In this LIMO experiment, we will use the categorical (limoCategoricalVar
) and continuous (limoContinuousVar
) predictors in addition to the epochs
.
library(DT)
epochs(limoSubj1) <- dplyr::mutate(epochs(limoSubj1), phase_coherence = unlist(limoContinuousVar),
face = factor(limoCategoricalVar$cond_lab,levels = c(1, 2), labels=c("Face_A","Face_B")))
epochs(limoSubj1)
## # A tibble: 1,055 x 5
## epoch participant_id recording phase_coherence face
## <int> <chr> <chr> <dbl> <fct>
## 1 1 limo_dataset_S1 limo_dataset_S1 0.6 Face_B
## 2 2 limo_dataset_S1 limo_dataset_S1 0.55 Face_A
## 3 3 limo_dataset_S1 limo_dataset_S1 0.3 Face_B
## 4 4 limo_dataset_S1 limo_dataset_S1 0.1 Face_A
## 5 5 limo_dataset_S1 limo_dataset_S1 0.15 Face_A
## 6 6 limo_dataset_S1 limo_dataset_S1 0.1 Face_B
## 7 7 limo_dataset_S1 limo_dataset_S1 0.85 Face_A
## 8 8 limo_dataset_S1 limo_dataset_S1 0.8 Face_B
## 9 9 limo_dataset_S1 limo_dataset_S1 0.25 Face_A
## 10 10 limo_dataset_S1 limo_dataset_S1 0.65 Face_B
## # ... with 1,045 more rows
::describe(epochs(limoSubj1)) Hmisc
## epochs(limoSubj1)
##
## 5 Variables 1055 Observations
## --------------------------------------------------------------------------------
## epoch
## n missing distinct Info Mean Gmd .05 .10
## 1055 0 1055 1 528 352 53.7 106.4
## .25 .50 .75 .90 .95
## 264.5 528.0 791.5 949.6 1002.3
##
## lowest : 1 2 3 4 5, highest: 1051 1052 1053 1054 1055
## --------------------------------------------------------------------------------
## participant_id
## n missing distinct value
## 1055 0 1 limo_dataset_S1
##
## Value limo_dataset_S1
## Frequency 1055
## Proportion 1
## --------------------------------------------------------------------------------
## recording
## n missing distinct value
## 1055 0 1 limo_dataset_S1
##
## Value limo_dataset_S1
## Frequency 1055
## Proportion 1
## --------------------------------------------------------------------------------
## phase_coherence
## n missing distinct Info Mean Gmd .05 .10
## 1055 0 18 0.997 0.422 0.3006 0.00 0.05
## .25 .50 .75 .90 .95
## 0.20 0.40 0.65 0.80 0.85
##
## lowest : 0.00 0.05 0.10 0.15 0.20, highest: 0.65 0.70 0.75 0.80 0.85
##
## Value 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
## Frequency 63 59 58 60 59 58 55 59 61 62 58
## Proportion 0.060 0.056 0.055 0.057 0.056 0.055 0.052 0.056 0.058 0.059 0.055
##
## Value 0.55 0.60 0.65 0.70 0.75 0.80 0.85
## Frequency 58 57 55 56 58 60 59
## Proportion 0.055 0.054 0.052 0.053 0.055 0.057 0.056
## --------------------------------------------------------------------------------
## face
## n missing distinct
## 1055 0 2
##
## Value Face_A Face_B
## Frequency 524 531
## Proportion 0.497 0.503
## --------------------------------------------------------------------------------
::datatable(epochs(limoSubj1)[1:20,]) DT
Using fit_glm() to fit a LM to each electrode, across all time-points, the model argument is blank on the left of the model specification \(\sim\). The specific covariates are listed as predictors on the right hand side of the model formula. The categorical variable face
and the continuous predictor phase_coherence
are used as an example, \(\sim face + phase\_coherence\).
The fitted LIMO will estimate 3 model coefficients:
Face A
) at \(0\) phase coherence,Face A
and the alternative level Face B
,Let’s plot the three coefficients (Intercept, faceFace_B, and phase_coherence) for each electrode across all time points.
<- fit_glm(~ face + phase_coherence, data = limoSubj1)
fitted_model
# GGplot display
# as.data.frame(fitted_model, long = TRUE) %>%
# ggplot(aes(x = time, y = amplitude, colour = electrode)) +
# geom_line() +
# facet_wrap(~coefficient, scales = "free") +
# theme(legend.position = "none")
# plotly graph
<- as.data.frame(fitted_model, long = TRUE)
df <- df[which(df$coefficient=='(Intercept)'), ]
df_intercept <- df[which(df$coefficient=='faceFace_B'), ]
df_faceFace_B <- df[which(df$coefficient=='phase_coherence'), ]
df_phase_coherence
<- paste0("Int:", df_intercept$electrode)
namesInt <- paste0("FaceB:", df_faceFace_B$electrode)
namesFaceB <- paste0("PhaCoh:", df_phase_coherence$electrode)
namesPhaCoh
<- plot_ly(data=df_intercept, x=~time, y=~amplitude, color=~electrode,
pl_int type="scatter", mode="lines", name=namesInt) %>% layout(showlegend = FALSE)
<- plot_ly(data=df_faceFace_B, x=~time, y=~amplitude, color=~electrode,
pl_faceB type="scatter", mode="lines", name=namesFaceB) %>% layout(showlegend = FALSE)
<- plot_ly(data=df_phase_coherence, x=~time, y=~amplitude, color=~electrode,
pl_phaseCoh type="scatter", mode="lines", name=namesPhaCoh) %>% layout(showlegend = FALSE)
subplot(pl_int, pl_faceB, pl_phaseCoh, nrows=3, shareX = TRUE) %>%
layout(title="LM Coefficient Estimates (Intercept, Face-B, Phase-Coherence)")
In addition to rendering the raw coefficient estimates, we can also display various statistics, e.g., “std_err”, “t_stats”, “r_sq”, captured in the fitted_model
object. Below we show just the GLM model coefficient estimates for Electrode=A1, Epoch=2, and the categorical Factor=Face_B.
# Recall the model: fitted_model <- fit_glm(~ face + phase_coherence, data = limoSubj1)
# dim(fitted_model$coefficients) # dim(fitted_model$std_err) # [1] 603 117 # time * electrode
# dataA1 <- round(data.frame(time = fitted_model$timings$time, A1 = fitted_model$coefficients[,1],
# lower = fitted_model$coefficients[,1]-fitted_model$std_err[,1],
# upper = fitted_model$coefficients[,1]+fitted_model$std_err[,1]), 2)
# colnames(dataA1) <- c("time", "A1", "lower", "upper")
<- df[which(df$coefficient=='faceFace_B' & df$electrode=='A1'), ]
df_A1 <- fitted_model$std_err$A1[seq(from=1, to=dim(fitted_model$std_err)[1], by=3)]
err <- round(data.frame(time = fitted_model$timings$time[which(fitted_model$timings$epoch==1)],
dataA1 A1_faceFace_B = df_A1$amplitude, lower = df_A1$amplitude - err,
upper = df_A1$amplitude + err), 2)
# colnames(dataA1) <- c("time", "A1_faceFace_B", "lower", "upper")
plot_ly(data=dataA1, x=~time, y=~A1_faceFace_B, name = 'Electrode=A1, Epoch=2, Factor=faceFace_B',
type = 'scatter', mode = 'lines', line = list(width = 2)) %>%
add_trace(y=~lower, name = 'A1 lower CI', line = list(width = 1, dash = 'dash')) %>%
add_trace(y=~upper, name = 'A1 upper CI', line = list(width = 1, dash = 'dash')) %>%
layout(title="GLM Model Coeff: Electrode=A1, Epoch=2, Factor=faceFace_B",
xaxis=list(title="Frequency (Hz)"), yaxis=list(title="mu*V^2/Hz(dB)"),
legend = list(orientation='h'))
Alternatively, we can report separate model coefficient estimates for each of the two levels, Face A and Face B, which may facilitate a subsequent second-level data analysis.
<- fit_glm( ~ 0 + face + phase_coherence, data = limoSubj1)
fitted_model_NoInt # as.data.frame(fitted_model_NoInt, long = TRUE) %>%
# ggplot(aes(x = time, y = amplitude, colour = electrode)) +
# geom_line() +
# facet_wrap(~coefficient, scales = "free") +
# theme(legend.position = "none")
# retrieve the coefficients for each time point for each electrode.
<- as.data.frame(fitted_model_NoInt, long = TRUE)
df <- df[which(df$coefficient=='faceFace_A'), ]
df_faceFace_A <- df[which(df$coefficient=='faceFace_B'), ]
df_faceFace_B <- df[which(df$coefficient=='phase_coherence'), ]
df_phase_coherence
<- paste0("FaceA:", df_faceFace_A$electrode)
namesFaceA <- paste0("FaceB:", df_faceFace_B$electrode)
namesFaceB <- paste0("PhaCoh:", df_phase_coherence$electrode)
namesPhaCoh
# plotly graph
<- plot_ly(data=df_faceFace_A, x=~time, y=~amplitude, color=~electrode,
pl_faceA type="scatter", mode="lines", name=namesFaceA) %>% layout(showlegend = FALSE)
<- plot_ly(data=df_faceFace_B, x=~time, y=~amplitude, color=~electrode,
pl_faceB type="scatter", mode="lines", name=namesFaceB) %>% layout(showlegend = FALSE)
<- plot_ly(data=df_phase_coherence, x=~time, y=~amplitude, color=~electrode,
pl_phaseCoh type="scatter", mode="lines", name=namesPhaCoh) %>% layout(showlegend = FALSE)
subplot(pl_faceA, pl_faceB, pl_phaseCoh, nrows=3, shareX = TRUE) %>%
layout(title="LM Coefficient Estimates (Face-A, Face-B, Phase-Coherence)", hovermode = 'x unified')
If necessary, we can scale continuous covariates (e.g., phase_coherence
) using the scale()
function to make these unitless quantities, like standard z-scores. Then we can report the linear model \(R^2\) value, which quantifies the model fit. \[R^2 =\frac{Variance\ Explained\ by\ the \ Model}{Total\ Variance\ in\ the\ Data}.\]
<- fit_glm( ~ 0 + face + scale(phase_coherence), data=limoSubj1)
fitted_scaledModel # as.data.frame(fitted_scaledModel, long = TRUE) %>%
# ggplot(aes(x = time, y = amplitude, colour = electrode)) +
# geom_line() +
# facet_wrap(~coefficient, scales = "free") +
# theme(legend.position = "none")
## ggplot graph
# fitted_scaledModel$r_sq %>%
# tidyr::pivot_longer(cols=channel_names(limoSubj1), names_to="electrode", values_to="r_sq") %>%
# ggplot(aes(x = time, y = r_sq, colour = electrode)) +
# geom_line() +
# theme(legend.position = "none")
<- fitted_scaledModel$r_sq %>%
df ::pivot_longer(cols=channel_names(limoSubj1), names_to="electrode", values_to="r_sq")
tidyr
plot_ly(data=df, x=~time, y=~r_sq, color=~electrode,
type="scatter", mode="lines", name=~electrode) %>%
layout(title="GLM Model (Y ~ 0 + face + scale(phase_coherence))\n R^2 for all Electrodes",
xaxis=list(title="time"), yaxis=list(title="R^2"),
legend = list(orientation='h'))
Let’s define a new glm.predict()
method to forecast the EEG channel amplitudes using the GLM model. Recall that electrodes (\(EXG1-EXG4\)) are placed around the eyes to record eye movements.
<- fit_glm( ~ 0 + face + phase_coherence, data=limoSubj1)
fitted_Model # there is no predict(fitted_scaledModel, limoSubj1) method in eegUtils package
<- as.data.frame(fitted_scaledModel, long = TRUE)
df
<- df[which(df$coefficient=='faceFace_A'), ]
df_faceFace_A <- df[which(df$coefficient=='faceFace_B'), ]
df_faceFace_B <- df[which(df$coefficient=='phase_coherence'), ]
df_phase_coherence
# Matrix multiplication model estimation,
# Note that if the design matrix x has an intercept, we need to include a first column of 1's
# x <- cbind(1, face, df_phase_coherence)
# match the order of predictors/covariates
# X <- cbind(face, df_phase_coherence)
# fittedValues <- X %*% coeff_df
# face : Factor w/ 2 levels "Face_A","Face_B": 2 1 2 1 1 ...
# phase_coherence: Named num [1:1055] 0.6 0.55 0.3 0.1 ............
<- fit_glm(~ face + phase_coherence, data = limoSubj1)
fitted_model
<- as.data.frame(fitted_model, long = TRUE)
df # stratify/subset the model DF, as desired
# electrodes <- c("EXG1", "EXG2", "EXG3", "EXG4")
# df1 <- select_elecs(df, electrodes)
# # df_2 <- select_times(df, time_lim=c(-0.3, 0.1))
#
# df_intercept <- df1[which(df1$coefficient=='(Intercept)'), ] ## & df1$electrode=='EXG1'), ]
# df_faceFace_B <- df1[which(df1$coefficient=='faceFace_B'), ] ## & df1$electrode=='EXG1'), ]
# df_phase_coherence <- df1[which(df1$coefficient=='phase_coherence'), ] ## & df1$electrode=='EXG1'), ]
#### Model Coefficients (\hat{\beta})
<- df[which(df$coefficient=='(Intercept)'& df$electrode=='EXG1'), ]
df_intercept <- df[which(df$coefficient=='faceFace_B'& df$electrode=='EXG1'), ]
df_faceFace_B <- df[which(df$coefficient=='phase_coherence' & df$electrode=='EXG1'), ]
df_phase_coherence <- as.data.frame(cbind(interc=1, face=df_faceFace_B$amplitude, phaCoh=df_phase_coherence$amplitude))
coeff_df dim(coeff_df) # [1] 201 3
## [1] 201 3
#### Design matrix (X)
<- select_elecs(limoSubj1, electrode='EXG1')
X
## Note that length of vector X is 212055 = 201 (timepoints) * 1055 (epochs) # str(X)
## The "face" and "phase_coherence" variables only change with epoch, not with time!
<- select_epochs(X, epoch_no=2)
X <- X$signals$EXG1
Y <- cbind(rep.int(1, length(Y)),
X1 rep.int(X$epochs$face, length(Y)), rep.int(X$epochs$phase_coherence, length(Y)))
<- array()
fittedValues
for (time in 1:dim(coeff_df)[1]) {
<- X1[time,] %*% t(coeff_df)[, time]
fittedValues[time]
}
<- X$timings$time
times
# plotly graph
plot_ly(x=~times, y=~Y, type="scatter", mode="lines", name="Data") %>%
add_trace(y=~fittedValues, name="LIMO Fitted Values") %>%
layout(title=paste0("Data vs. LIMO Model Prediction (corr=", round(cor(Y, fittedValues), 2),")"),
xaxis=list(title="time"), yaxis=list(title="value"), legend = list(orientation='h'))
Once we select an electrode, e.g., ‘EXG1’, the length of the EEG signal \(X\), as a vector X, \(212,055 \equiv 201 (timepoints) \times 1055 (epochs)\). Also, the face and phase_coherence covariates (predictive variables) only change with epoch, but are unchanged over time.
The predictions \(\hat{y}\) are computed using the coefficient estimates, \(\hat{\beta}(Time)_{3\times 1}\) that are time-dependent and obtained via the GLM model fitting.
\[\underbrace{\hat{y}_{Time\times 1}}_{fitted\\ values} = {\overbrace{X_{Time\times 3}}^{design\ matrix}} \times {\underbrace{\ \hat{\beta}(Time)_{3\times 1}\ }_{LIMO \\ coefficients}}.\]
Electrocardiograms (ECG or EKG) capture the electrical conduction of the heart and serve as proxy measures of normal and pathological cardiac processes.
ECG can be interpreted using pattern recognition or by ML/AI modeling of the electrical ECG signal related to cardiac electrophysiology. Standard ECG has 12 leads - the 6 limb leads (I, II, III, aVL, aVR
, and aVF
) are placed on the extremities (arms, legs), and the other 6 precordial (torso) leads are plaved on the precordium (V1, V2, V3, V4, V5
and V6
). For limb leads, the augmented label a
refers to leads that calculate a combination of leads I, II
, and III
.
The image below shows a normal 12-lead ECG trace of a 26-year-old male with an incomplete right bundle branch block (RBBB), i.e., a heart block in the right bundle branch of the electrical conduction system. The ECG waves and the intervals between them have (1) an expected patterns (morphology) and time durations (periods); and (2) a range of acceptable amplitudes (millivoltages). Deviation from acceptable normal tracing may be indicative of potentially pathological states. When tracking ECG signals, the amplitudes and intervals are printed on a standard scale graphing paper where a pixel of \(1 mm^2\) spatial resolution corresponds to \(40\ ms\) (milliseconds) of time along the longitudinal \(x\)-axis and \(0.1 millivolts\) along the amplitude \(y\)-axis.
Normal ECGs contain waves, intervals, segments and a single complex:
The main ECG parts include P waves, theQRS complex, and T waves. The model ECG framework is shown below.
Electrical activities of the heart are tracked via electrodes strategically positioned to measure and display the cardiac cycle. ECG graphs show the overall cardiac electrical potential (voltage) across the 12 leads representing 12 complementary view angles. The six limb leads the heart activity in a vertical plane. Voltage measures require a pair of poles (one negative and one positive) to track electrical currents. The negative poles serve as baseline references and the placement of their corresponding positive poles indicate “point of view” vectors. Some leads (e.g., I, II, and III) are bi-polar, measure electrical potential between two of the three limb electrodes. For instance, the heart view from the left is tracked by lead I, which captures the electrical potential between the right arm (a negative pole) and the left arm (a positive pole). Similarly, to capture an inferior view of the heart dynamics, lead II detects voltage between the right arm (negative) and the left leg (positive).
The augmented limb leads (e.g., aVR, aVL, and aVF) are uni-polar; solely using a single limb electrode as the positive pole paired with a reference negative pole computed as the average of inputs from the other two leads. For instance, aVR detects the upper-right side of the heart, whereas aVL tracks the upper-left side of the heart. Electrical depolarization towards a lead generates a positive signal (deflection), while depolarization away from a lead yields a negative deflection. The opposite is true for electrical repolarization where leads capturing the heart from different angles may have waves pointing in different directions.
The animated gif below shows the dynamics of the heart electrical conduction and ECG wave generation in the limb leads. The positive deflection on the ECG signal manifests as positive currents created by depolarization of cardiac cells travel from negative to positive electrodes.
There are a number of clinically important phenotypes can be traced and interpreted using ECG signals. Examples include:
Let’s use the following PhysioNet ECG data (ecgiddb) to demonstrate some data ingestion, processing, modeling, and visualization. This study is based on 2005 Master’s thesis by Lugovaya TS, entitled Biometric human identification based on electrocardiogram, at the Faculty of Computing Technologies and Informatics, Electrotechnical University in Saint-Petersburg, Russia. The data is managed at PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.
The ECG database includes 310 signals from 90 people where each recording consists of:
The raw ECG signals are noisy and contain high and low frequency noise. Each record includes raw and filtered signals:
More about the MIT ECG data format is available here.
### 1. Load ECG data
<- paste0(tempdir(),"rec_1.dat")
path1 download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_1.dat?download", path1)
<- readBin(path1, integer(), 500*30)
ecg1 <- data.frame(ECG = ecg1, Seconds = c(1:length(ecg1))/500)
ecg1.df
<- paste0(tempdir(),"rec_2.dat")
path2 download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_2.dat?download", path2)
<- readBin(path2,integer(),500*30)
ecg2 <- data.frame(ECG1=ecg1[1:10000], ECG2=ecg2[1:10000], Seconds=(c(1:length(ecg2))/500)[1:10000])
ecg.df head(ecg.df)
## ECG1 ECG2 Seconds
## 1 -1441809 -720916 0.002
## 2 -1441808 -458769 0.004
## 3 -1507342 -262146 0.006
## 4 -1507343 -196608 0.008
## 5 -1507347 -65541 0.010
## 6 -1507346 -131079 0.012
#### 2. Detect R peaks in ECG data
# implements the part of the Pan & Tompkins algorithm for R peak detection in raw ECG data
# Pan, Jiapu, and Willis J. Tompkins. "A real-time QRS detection algorithm." IEEE Trans. Biomed. Eng.
# https://ieeexplore.ieee.org/document/4122029
# @param numerical vector of ECG signal
# @param sRate ECG sample rate
# @param lowCut Butterworth bandpass filter low cut value
# @param highCut Butterworth bandpass filter high cut value
# @param filterOrder Butterworth bandpass filter order value
# @param integrationWindow Convolution window size
# @param refractory Minimal space between peaks in milliseconds
# @param returnIndex If TRUE, the index for each R peak is returned, If FALSE, the corresponding time
# @return numeric vector of detected R peaks
<- function(signal, sRate, lowCut = 0, highCut = 15, filterOrder = 1,
detectRpeaks integrationWindow = 15, refractory = 200, returnIndex = FALSE){
= 0.5 * sRate
nyquist_freq = lowCut / nyquist_freq
low = highCut / nyquist_freq
high
# Apply bandpass butterworth filter.
<- signal::butter(n = filterOrder, W = c((lowCut/(sRate/2)), high), type = "pass")
bandpass <- signal::filtfilt(bandpass, c(rep(signal[1],sRate),signal,rep(signal[length(signal)],sRate)))
signal_filt <- signal_filt[(sRate+1):(length(signal_filt)-sRate)]
signal_filt
# Apply lag difference
<- diff(signal_filt)
signal_diff
# Square signal
<- signal_diff^2
signal_squared <- c(signal_squared[1], signal_squared)
signal_squared
# Apply convolution on chunks of signal (100000 values)
<- split(signal_squared, ceiling(seq_along(signal_squared)/100000))
split_ecg <- lapply(split_ecg, function(x) {
split_ecg2 <- stats::convolve(x,rep(1,integrationWindow),type="open")
xc <- length(xc) - length(x)
difflen <- xc[(difflen/2+1):(length(xc)-(difflen/2))]
xc
xc})<- unlist(split_ecg2, use.names=FALSE)
signal_conv
# Detect peaks on pre processed signal
<- c(0)
peaks <- mean(signal_conv)*3
limit
<- sRate*(refractory/1000)
refractory <- signal_conv
x for(i in c(2:(length(x)-1))){
if((x[i] > limit) && (x[i]>x[i-1]) && (x[i]>x[i+1]) && ((i-peaks[length(peaks)])>refractory)) {
<- c(peaks,i)
peaks
}
}<- peaks[2:length(peaks)]
peaks if(returnIndex) return(peaks)
return(peaks/sRate)
}
<- detectRpeaks(ecg.df$ECG1, sRate=500) # these are times in seconds
peakRtimes1 <- detectRpeaks(ecg.df$ECG2, sRate=500)
peakRtimes2
# peakRindices1 <- detectRpeaks(ecg.df$ECG1, sRate=500, returnIndex=TRUE) # for time indices!
# peakRindices2 <- detectRpeaks(ecg.df$ECG2, sRate=500, returnIndex=TRUE)
### 3. Plot R-peaks on top of ECG
library(plotly)
%>% plot_ly(x=~Seconds, y=~ECG1, type="scatter", mode="lines", name="ECG1",
ecg.df line = list(color = 'red', width = 1)) %>%
add_trace(y=~ECG2, type="scatter", mode="lines", name="ECG2",
line = list(color = 'blue', width = 1)) %>%
add_segments(x=peakRtimes1, y=min(ecg.df$ECG1)/2, xend=peakRtimes1+0.01, yend=max(ecg.df$ECG1)/2,
line = list(color = 'gray', width = 4), opacity=0.3, name="R Peak ECG 1") %>%
add_segments(x=peakRtimes2, y=min(ecg.df$ECG2)/2, xend=peakRtimes2+0.01, yend=max(ecg.df$ECG2)/2,
line = list(color = 'brown', width = 4), opacity=0.3, name="R Peak ECG 2") %>%
layout(title="Two ECG Signals with Detected R-Peaks", xaxis=list(title="seconds"),
yaxis=list(title="Amplitude"), legend = list(orientation='h'))
The QRS complex occurrences determine the heart rate variability (HRV) time series capturing the R-to-R series (RR intervals) between consecutive heart beats, inter-beat intervals or interval function. The RR intervals are the differences between successive R-wave oscillations. Recall that above we calculated the times \(t_n\) of the R-wave occurrences.
The \(n^{th}\) RR interval is \[RR_n = \alpha (t_n − t_{n−1}),\] where the normalizing parameter \(\alpha\) accounts for the desired measuring unit of the RR time series. The standard RR interval measuring unit is \(ms\) (millisecond). When the time occurrences are tracked in seconds (\(s\)), then \(\alpha = 1,000\).
Complementary to the HRV, we can define the sequence of instantaneous heart rates
\[HR_n = \frac{\beta}{t_n − t_{n−1}},\] where a normalization parameter \(\beta=60\) expresses the heart rate in Beats Per Minute (bpm) and corresponds to time occurrence measures in seconds.
The \(\{RR_n\}_n\) time-series consist of paired time and heart rates \(\{(t_n, RR_n)\}\), where the time series \(\{t_n\}\) must be explicitly specified as R-wave occurrences are not necessarily regular. This irregular, non-equidistant, sampling requires tracking the (time-value, heart rates) pairs, not just the RR intervals.
Prior to frequency-domain analysis (e.g., using Fourier or Wavelet base representations), we need to take care of this incongruence, e.g., by uniformly resampling the time series signal. Multiple alternative approaches for homogenizing the sampling rate can be applied. The package RHRV
uses signal-interpolation for transforming the non-uniformly sampled RR series into regularly sampled signal.
If we assume that the sampling times of the RR signals are regular (equidistant), we can construct a tachogram, of the RR intervals as a function of the heart beat index. The result of this assumption would lead to power spectrum signal representation that is not a function of the frequency, but a function of cycles per beat.
The RHRV handbook provides significant details about spatial and frequency domain HRV analytics.
We can use the following heart beat detection function heartBeat()
to compute the wavelet decomposition of the ECG time series signal and called with the following input arguments:
heartBeat()
Inputs:
heartBeat()
Outputs:
### install.packages("RHRV")
### install.packages("wavelets")
<- paste0(tempdir(),"rec_1.dat")
path1 download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_1.dat?download", path1)
<- readBin(path1, integer(), 500*30)
ecg1 <- data.frame(ecg = ecg1, time = c(1:length(ecg1))/500)
ecg1.df
<- paste0(tempdir(),"rec_2.dat")
path2 download.file("https://physionet.org/files/ecgiddb/1.0.0/Person_01/rec_2.dat?download", path2)
<- readBin(path2,integer(),500*30)
ecg2 <- data.frame(ECG1=ecg1[1:10000], ECG2=ecg2[1:10000], Seconds=(c(1:length(ecg2))/500)[1:10000])
ecg.df head(ecg.df)
## ECG1 ECG2 Seconds
## 1 -1441809 -720916 0.002
## 2 -1441808 -458769 0.004
## 3 -1507342 -262146 0.006
## 4 -1507343 -196608 0.008
## 5 -1507347 -65541 0.010
## 6 -1507346 -131079 0.012
################# Define a heartBeat estimating function
#' @param data ECG Data frame
#' @param SampleFreq ECG sampling frequency in Hz
#' @param waveBase wavelet basis to use in the wavelet decomposition of the ECG signal
#' @param waveThr wavelet coefficient threshold value to detect heart beats in the wavelet domain
#' @return returns a data object containing 4 members:
#' {dataECG}: original ECG data frame
#' {coeff}: coefficients of the 2nd level (L2) wavelet decomposition of the ECG signal (dataECG$ecg),
#' {Rpeaks}: indices of R peaks in {coeff},
#' {Rall}: data frame with the heart-beat detection results. The columns include"
#' {Rtrue_idx}: indices of R peaks in {dataECG},
#' {Rtrue_sec}: time moments of {Rtrue_idx} in seconds,
#' {RtoRext}: R-R intervals in index space (number of samples), starting with zero,
#' {RtoR_secext}: R-R intervals in time (seconds), starting with zero
<- function(data, SampleFreq = 500, waveBase="d10", waveThr = 9) {
heartBeat
# Choose a wavelet basis (wavelets::wt.filter())
# Daubechies (d) 2,4,6,8,10,12,14,16,18,20
# Least Asymetric (as) 8,10,12,14,16,18,20
# Best Localized (bl) 14,18,20
# Coiflet (c) 6,12,18,24,30
<- c("d2","d4","d6","d8","d10","d12","d14","d16","d18","d20",
waveletBaseList "as8","as10","as12","as14","as16","as18","as20",
"bl14","bl18","bl20",
"c6","c12","c18","c24","c30")
if (!(waveBase %in% waveletBaseList)) wavelet <- "d10" # deafult Daubechies d10 wavelet decomposition
else wavelet <- waveBase
<- 5 # presenting the highest frequency level of wavelet decomposition
level # message(sprintf("wavelet-bases set to %s", wavelet))
<- data.frame(matrix(NA, nrow = sum(!is.na(data$ecg)), ncol = 2))
df names(df) <- c("idx", "ecg")
$idx <- which(!is.na(data$ecg))
df$ecg <- data$ecg[df$idx]
df
<- as.numeric(df$ecg)
X
<- wavelets::dwt(X, filter=wavelet, n.levels=level, boundary="periodic", fast=TRUE)
ecg_wav
# for R peak detection, use the coefficients of the second level (W2) of wavelet decomposition
<- ecg_wav@W$W2
x
# initialize the vector for detected R peaks
<- matrix(0,1,length(x))
Rpeaks
# Iterate sweeping the L2 coefficients for local maxima
<- 2
i while (i < length(x)-1) {
if ((x[i]-x[i-1]>=0) && (x[i+1]-x[i]<0) && x[i]>waveThr) Rpeaks[i] <- i
<- i+1
i
}
# Clear all zero values from Rpeaks vector
<- Rpeaks[Rpeaks!=0]
Rpeaks
# Since L2 coeffs are used, the multiply the results by 4, to scale the results to the original signal length
<- Rpeaks*4
Rtrue
# Check results with the original signal
for (k in 1:length(Rtrue)) {
if (Rtrue[k] > 10) Rtrue[k] <- Rtrue[k]-10+(which.max(X[(Rtrue[k]-10):(Rtrue[k]+10)]))-1
else Rtrue[k] <- which.max(X[1:(Rtrue[k]+10)])
}
<- unique(Rtrue)
Rtrue <- df$idx[Rtrue]
Rtrue_idx
# Determine the Rpeak-Rpeak intervals in samples and seconds and average heart rate
<- Rtrue_idx[-1]-Rtrue_idx[1:length(Rtrue_idx)-1]
RtoR <- (data$time[Rtrue_idx[-1]] - data$time[Rtrue_idx[1:length(Rtrue_idx)-1]])/1000
RtoR_sec
= 60/mean(RtoR_sec)
avgHR = as.integer(avgHR)
avgHR
# save the information about detected R peaks to Rsec_data (ascii file)
<- (data$time[Rtrue_idx] - data$time[1])/1000;
Rtrue_sec <- round(Rtrue_sec, 3)
Rtrue_sec # write(Rtrue_sec,"Rsec_data.txt", 1)
<- c(0, RtoR)
RtoRext <- c(0, RtoR_sec)
RtoR_secext <- data.frame(Rtrue_idx, Rtrue_sec, RtoRext, RtoR_secext)
Rall
return(list(dataECG = df, coeff = x, Rpeaks = Rpeaks, Rall = Rall))
}
<- heartBeat(ecg1.df, SampleFreq = 500, waveBase="d10", waveThr = 9)
hr_df
# Plot the detected heart-beats on top of the ECG time-course
%>% plot_ly(x=~time, y=~ecg, type="scatter", mode="lines", name="Raw ECG",
ecg1.df line = list(color = 'red', width = 1)) %>%
add_segments(x=~time[hr_df$Rall$Rtrue_idx], y=min(ecg.df$ECG1)/2,
xend=~(time[hr_df$Rall$Rtrue_idx]+0.0001), yend=max(ecg.df$ECG1)/2,
line = list(color = 'gray', width = 4), opacity=0.3, name="R Peaks ECG") %>%
layout(title="Raw ECG Signal with Detected R-Peaks", xaxis=list(title="seconds"),
yaxis=list(title="Amplitude"), legend = list(orientation='h'))
…