| SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
This DSPA2 module represents Part 2 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks). Learners are encouraged to first complete Part 1 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks) prior to continuing with transfer learning and this Part 2.
Part 3 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks) and Part 4 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks) are predicated on this Part 2 and covers the Torch and Tensorflow Image Pre-processing Image Classification Pipelines.
Humans learn complex tasks by capitalizing on their prior experiences, no matter how remote these previous encounters may appear to be. By the age of 5, most kids can learn how to ride a bicycle in a couple of training sessions. This riding ability is acquired after they have already mastered the arts of running, also known as controlled falling, navigating complex 3D environments, and anticipating dynamic 4D spatio-temporal events. In effect, before kids start pedaling, their many prior holistic training experiences ensure that they “know” the basics of bike balancing. Children’s formative years include a very large number of trial-and-errors, parental guidance sessions, and societal cues. These events already provide the basic building blocks necessary to learn bicycle riding. And this is well in advance of the actual “bicycle training” experience, which we typically associate with bicycle riding.
This learning process is very different for machines. It’s extremely difficult to train a machine (a robot) to ride a bike, because these prior experiences kids go through are missing and can not be easily built and transferred to complete the new task of learning how to balance a bike. In a way, humans learn new tasks easily as (1) they already have a large collection of skills they have already mastered, and (2) they can transfer, mix, match, integrate, and harness their prior experiences to the process of learning a new task. Transfer machine learning attempts to replicate this human transfer learning process into the domain of artificial intelligence. The goals are to expedite the ML training process by capitalizing on prior knowledge, expand the realm of ML/AI applications, and enable their “last mile” training to ensure they generate “reasonable decisions and actions” without starting with blank slate de novo learning.
One of the main challenges of AI/ML interpretation of free text is the extreme heterogeneity of the information and the unstructured format of the text content. This problem can be resolved by structurizing the input text and establishing homologies between multiple text samples (e.g., clinical notes). In a nutshell, transfer learning facilitates this process and enables (1) synthetic text generation (new data) that simulates realistic textual content (non-human data); and (2) transformation of unstructured text to structured data elements. For instance, if an \(input=clinical\ notes\), a DNN model generates \(output=vector\) representing a quantitative signature vector of the input text; think of it as a vector of principal components associated with the specific free text.
The result of this AI process is that independently of the text length or type, DNN always generates a numeric vector of a fixed size (say 128 values). This canonical representation establishes homologies between any given set of strings (character arrays).
Let’s demonstrate transfer machine learning using the medical specialty text-mining example of clinical notes example that we saw in Chapter 5. This data includes a binary outcome indicating whether the medical specialty unit (there are 40 such units) is a surgical unit or not. We’ll split the 4,999 cases, each containing 6 data elements, including the medical-specialty unit and clinical notes, into training and testing sets.
The key will be to use keras to build and train a ML
model for predicting surgical vs. non-surgical units from the content in
the corresponding medical notes by using a previously trained
text-mining DNN that quantizes text of any size. One also needs to
install the tfhub package
(TensorFlow Hub), which provides reusable machine learning
libraries.
venv_name <- "r-tensorflow"
reticulate::use_virtualenv(virtualenv = venv_name, required = TRUE)
library(reticulate)
# load the necessary libraries
# May need some installations first, e.g.,
# in conda%> pip install tensorflow_datasets
# install.packages("remotes")
# remotes::install_github("rstudio/tfds")
# tfds::install_tfds()
library(keras)
# library(reticulate)
# Install package TFhub: https://github.com/rstudio/tfhub
# devtools::install_github("rstudio/tfhub")
library(tfhub)
# library(tfds)
# Install TFdatasets: https://cran.r-project.org/web/packages/tfdatasets/vignettes/introduction.html
# devtools::install_github("rstudio/tfdatasets")
library(tfdatasets)
library(utf8)
# specify r-reticulate or r-tensorflow python Anaconda environment
# use_condaenv("r-tensorflow")
# use_condaenv("r-reticulate", required = TRUE)
# there are many ways to "finding" your conda environments, and using the reticulate package to set them
# conda_list()[[1]][1] %>% use_condaenv(required = TRUE)
# Check tensorflow install configuration
tensorflow::tf_config()## TensorFlow v2.15.1 (C:\Users\IvoD\DOCUME~1\VIRTUA~1\R-TENS~1\lib\site-packages\tensorflow_hub\__init__.p)
## Python v3.10 (C:/Users/IvoD/Documents/.virtualenvs/r-tensorflow/Scripts/python.exe)
# py_module_available("tensorflow_hub")
# py_install("tensorflow_hub", pip = TRUE) # py_install("tensorflow_hub")
# py_install("tfds", pip = TRUE) # py_install("tfds")
# py_install("tensorflow_datasets", pip = TRUE)
# py_module_available("tensorflow_datasets")
# py_module_available("tfds") # tensorflow_datasetsLet’s now design a full DNN binary-classification
model composed of 4 layers stacked sequentially. The first
transfer learning layer represents the pre-trained TensorFlow
Hub layer (prior model), which is loaded as the a priori left-most base
layer in the full DNN and maps clinical notes (description sentences)
into its embedding vector (canonical signature vector). There are a
number of pre-trained text embedding models we can choose in
this transfer-learning example. For instance, we can use google/tf2-preview/gnews-swivel-20dim/1,
which splits the sentences into tokens, embeds each token, and then
combines the embedding yielding an output of dimensions: (num_examples,
embedding_dimension). The output of this initial transfer learning
prior model layer is a fixed-length output vector, which is fed
into the next fully-connected (Dense) layer-2 with 16 hidden
units. Layer-2 output feeds into the next (dense) layer-3 with
6 nodes. Finally, Layer-3 output goes into the last layer-4,
which also is a densely connected layer with a single output (class
label). Using the sigmoid activation function, this output
represents a probability value between 0 and 1 indicating the model
predicted chance, or confidence level, that the medical note text was
written in a hospital surgical unit.
Other examples of pre-trained text mining models that can be used for transfer learning include:
We will demonstrate NN-augmentation (transfer learning) modifying the base-model using the pre-trained NN English Google News 200B corpus by adding 4 extra layers at the end, which will be tuned for our specific clinical text (medical notes). Of course, similarly, any of the other pre-trained models can be used as alternatives.
Download the clinical dataset and split it into training:training (80:20).
Note that in this clinical-notes example, the input data consists of medical text transcriptions stored as string sentences. In the first demonstration, we will try to predict a binary integer label, 0 or 1, representing a non-surgical or surgical clinical unit where the clinical note was transcribed. To structurize the free-text as a computable data object (a matrix), we will automatically convert sentences into embedding vectors. This can be accomplished using text2vec or keras::layer_text_vectorization() transformations, or by including a pre-trained text embedding as the first layer. This takes care of the text preprocessing, facilitates transfer learning, and makes the text-to-matrix independent of the text and the size of the clinical note.
# install.packages("SnowballC")
library(keras)
library(SnowballC)
dataCT <- read.csv('https://umich.instructure.com/files/21152999/download?download_frd=1', header=T)
str(dataCT)## 'data.frame': 4999 obs. of 6 variables:
## $ Index : int 0 1 2 3 4 5 6 7 8 9 ...
## $ description : chr " A 23-year-old white female presents with complaint of allergies." " Consult for laparoscopic gastric bypass." " Consult for laparoscopic gastric bypass." " 2-D M-Mode. Doppler. " ...
## $ medical_specialty: chr " Allergy / Immunology" " Bariatrics" " Bariatrics" " Cardiovascular / Pulmonary" ...
## $ sample_name : chr " Allergic Rhinitis " " Laparoscopic Gastric Bypass Consult - 2 " " Laparoscopic Gastric Bypass Consult - 1 " " 2-D Echocardiogram - 1 " ...
## $ transcription : chr "SUBJECTIVE:, This 23-year-old white female presents with complaint of allergies. She used to have allergies w"| __truncated__ "PAST MEDICAL HISTORY:, He has difficulty climbing stairs, difficulty with airline seats, tying shoes, used to p"| __truncated__ "HISTORY OF PRESENT ILLNESS: , I have seen ABC today. He is a very pleasant gentleman who is 42 years old, 344 "| __truncated__ "2-D M-MODE: , ,1. Left atrial enlargement with left atrial diameter of 4.7 cm.,2. Normal size right and left "| __truncated__ ...
## $ keywords : chr "allergy / immunology, allergic rhinitis, allergies, asthma, nasal sprays, rhinitis, nasal, erythematous, allegr"| __truncated__ "bariatrics, laparoscopic gastric bypass, weight loss programs, gastric bypass, atkin's diet, weight watcher's, "| __truncated__ "bariatrics, laparoscopic gastric bypass, heart attacks, body weight, pulmonary embolism, potential complication"| __truncated__ "cardiovascular / pulmonary, 2-d m-mode, doppler, aortic valve, atrial enlargement, diastolic function, ejection"| __truncated__ ...
## [1] "Index" "description" "medical_specialty"
## [4] "sample_name" "transcription" "keywords"
# Binarize the 40 hospital units as Surgery-type and Non-Surgery types
dataCT$surgLabel <- ifelse(grepl('Surg', dataCT$medical_specialty), 1, 0)
table(grepl('Surg', dataCT$medical_specialty))##
## FALSE TRUE
## 3869 1130
# Fix the descriptions to UTF-8 encoding
library(stringi)
# table(stri_enc_mark(dataCT$description)) # ASCII native # 4994 5
dataCT$description <- stri_encode(dataCT$description, "", "UTF-8")
dataCT$transcription <- stri_encode(dataCT$transcription, "", "UTF-8")
dataCT$clinicalNotes <- paste(dataCT$description, dataCT$transcription)
# Clean the clinical notes
library(tm)
## Vectorize the text
train_corpus <- VCorpus(VectorSource(dataCT$clinicalNotes))
## Remove Punctuation
train_corpus <- tm_map(train_corpus, content_transformer(removePunctuation))
## Remove numbers
train_corpus <- tm_map(train_corpus, removeNumbers)
## Convert text to lower case
train_corpus <- tm_map(train_corpus, content_transformer(tolower))
## Remove stop words
train_corpus <- tm_map(train_corpus, content_transformer(removeWords), stopwords("english"))
## Stemming
train_corpus <- tm_map(train_corpus, stemDocument)
## Remove multiple whitespaces
train_corpus <- tm_map(train_corpus, stripWhitespace)
# Extract only the simplified text from the complex train_corpus object
dataCT$clinicalNotes <- unlist(lapply(train_corpus, `[[`, 1))
# Split the data 80:20
train_set_ind <- sample(nrow(dataCT), floor(nrow(dataCT)*0.8)) # 80:20 split training:testing
train_data <- dataCT[train_set_ind , ]
test_data <- dataCT[-train_set_ind , ]
num_words <- 10000
max_length <- 300
text_vectorization <- layer_text_vectorization(max_tokens = num_words, output_sequence_length = max_length)
# # `adapt()` the Clinical Notes Text Vectorization layer. Calling adapt allows the input layer to learn about
# # the unique Medical Text in this dataset and assign an integer value for each word
# text_vectorization %>% adapt(train_data$clinicalNotes)
#
# # Confirm the Medical Notes vocabulary is in the text vectorization layer.
# get_vocabulary(text_vectorization)
#
# # Input Layer shape - the text vectorization layer transforms it’s inputs
# trainDataX <- text_vectorization(matrix(train_data$clinicalNotes, ncol = 1))
# trainDataY_one_hot_labels <- to_categorical(train_data$surgLabel, num_classes = 2)
text_vectorization %>% adapt(train_data$clinicalNotes)
# Define and fit the model - the input data consists of an array of word-indices.
# The predicted labels are either 0 or 1.
# The classifier is based on sequentially stacking the network layers
# The first embedding layer takes the integer-encoded vocabulary and looks up the embedding vector for each word-index.
# These vectors are learned as the model trains.
# The vectors add a dimension to the output array. The resulting dimensions are: (batch, sequence, embedding).
# A global_average_pooling_1d layer returns a fixed-length output vector for each example by averaging over the sequence dimension.
# This allows the model to handle *variable-length* inputs
# The fixed-length output vector is piped through a fully-connected (dense) layer with 16 hidden units.
# The last output layer is densely connected with a single output node.
# Sigmoid activation function yields a probability between 0 and 1 indicating the confidence of the binary level.model1 de novo# 1. Define a new fresh model1 de novo
# input <- layer_input(input_shape = input_shape = c(300)) # For numerical input, e.g., trainDataX
# library(reticulate)
# reticulate::repl_python()
# use_condaenv(condaenv = "pytorch_env", required = TRUE)
input <- layer_input(shape = c(1), dtype = "string") # for raw text input as string, needs to match exp next layer
output <- input %>%
text_vectorization() %>%
layer_embedding(input_dim = num_words + 1, output_dim = 32) %>%
layer_global_average_pooling_1d() %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dropout(0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
model1 <- keras_model(input, output)
model1 %>% compile(
optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = list('accuracy')
)
history <- model1 %>% fit(train_data$clinicalNotes,
as.numeric(train_data$surgLabel),
epochs = 10, batch_size = 512, validation_split = 0.2, verbose=0)
# Evaluate the model1 performance
results <- model1 %>% evaluate(test_data$clinicalNotes, as.numeric(test_data$surgLabel), verbose = 0)
results## loss accuracy
## 0.5366074 0.7750000
In a naive approach, we can even evaluate the performance of the
prior model (English
Google News 200B), i.e., assess transfer learning without any
additional add-on training using the new problem-specific data. Remember
that we have a univariate (binary) outcome and if we use
dataset_batch(32), the output will include a vector of 32
probability estimates.
We will see next that Keras knows how to extract elements from TensorFlow Datasets automatically making it a much more memory efficient alternative than loading the entire dataset to RAM before passing to Keras.
To build the DNN model, we need to specify the network topology as a stack of network layers that include (1) schema representing the unstructured text data (clinical note descriptions), and (2) Number and complexity of each subsequent layer in the model. For simplicity, in this example we will convert the 40 different medical units into binary “surgical” unit labels; 0 or 1 factors.
The unstructured text can be converted into embedding vectors of a fixed size, which simplifies the text processing. Using the transfer learning prior model, which includes a pre-trained text embedding and appears as the first DNN layer. This allows us to outsource the text preprocessing and transformation into quantitative information tensor. This is the key step illustrating the benefits of add-on based transfer-learning in fine-tuning previously trained models.
The result of using this transfer-learning prior is that the model is invariant with respect to the length of the input clinical text - the output shape of the embeddings is \((num\_examples\times embedding\_dimension)\).
# 2. Naive - out-of-the-box prior-model assessment (without any retraining)
# Transfer Learning based on nnlm-en-dim128 (prior model) Define only output layer structure
library(tfhub)
library(keras)
# Clear TF Hub cache
tfhub_cache <- path.expand("C:/Users/IvoD/AppData/Local/Temp/tfhub_modules")
if (dir.exists(tfhub_cache)) unlink(tfhub_cache, recursive = TRUE)
####### May have to remove outputs from prior runs!!!!! ########################
# remove folders here: C:\Users\IvoD\AppData\Local\Temp\tfhub_modules ....#####
model2 <- keras_model_sequential() %>%
layer_hub(
handle = "https://www.kaggle.com/models/google/nnlm/tensorFlow2/tf2-preview-en-dim128/1",
# handle = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1",
input_shape = list(),
dtype = tf$string,
trainable = FALSE # Set to TRUE for full model retraining, we use FALSE for quick transfer learning
) %>%
layer_dense(units = 1, activation = "sigmoid") # add the binary labeling output layer format
summary(model2)## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param # Trainable
## ================================================================================
## keras_layer (KerasLayer) (None, 128) 12464268 N
## 8
## dense_2 (Dense) (None, 1) 129 Y
## ================================================================================
## Total params: 124642817 (475.47 MB)
## Trainable params: 129 (516.00 Byte)
## Non-trainable params: 124642688 (475.47 MB)
## ________________________________________________________________________________
model2 %>% compile(
optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = list('accuracy')
)
# Just estimate the final 128+1 coefficients of the final layer
history <- model2 %>% fit(
train_data$clinicalNotes, train_data$surgLabel,
epochs = 5, ### increase epochs for better performance
batch_size = 128
)## Epoch 1/5
## 32/32 - 1s - loss: 0.7686 - accuracy: 0.3688 - 1s/epoch - 36ms/step
## Epoch 2/5
## 32/32 - 0s - loss: 0.6069 - accuracy: 0.7504 - 403ms/epoch - 13ms/step
## Epoch 3/5
## 32/32 - 0s - loss: 0.5589 - accuracy: 0.7734 - 413ms/epoch - 13ms/step
## Epoch 4/5
## 32/32 - 0s - loss: 0.5382 - accuracy: 0.7737 - 381ms/epoch - 12ms/step
## Epoch 5/5
## 32/32 - 0s - loss: 0.5217 - accuracy: 0.7737 - 356ms/epoch - 11ms/step
## 32/32 - 0s - loss: 0.5153 - accuracy: 0.7750 - 305ms/epoch - 10ms/step
## loss accuracy
## 0.5153179 0.7750000
## 32/32 - 0s - 262ms/epoch - 8ms/step
##
## y_pred 0 1
## 0 731 209
## 1 44 16
Clearly these surgical unit predictions can’t be expected to be very reliable, as the model is not fine-tuned yet to respond specifically to clinical text.
The next step is to compile the transfer-learning
model by specifying a loss function and an
optimizer to facilitate the transfer-learning during the
iterative network model fitting (fine-tuning). In this binary
classification problem, we will use the
binary_crossentropy() loss function. The model results in
generating a probability value, which is presented as the output of the
final DNN layer (the right-most single-unit layer with a sigmoid
activation).
Another possible loss function for binary outcome is
mean_squared_error(). However, binary_crossentropy is often
better for dealing with probabilities as it measures the “distances”
between probability distributions representing the predicted outcome and
the ground-truth in supervised problems. Yet,
mean_squared_error() is also applicable in a regression
model setting. We will also employ Adaptive
Moment Estimation (ADAM) as it’s an effective optimizer.
Let’s use the nnlm-en-dim128 (prior model) to define an
expanded DNN model by adding additional four layers at the end to
customize the deep neural network to our specific clinical data.
# 3. Transfer Learning based on the nnlm-en-dim128 (prior model) Define expanded DNN model structure + 4 layers
model3 <- keras_model_sequential() %>%
layer_hub(
handle = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1",
input_shape = list(),
dtype = tf$string,
trainable = FALSE # Set to TRUE for full model retraining, we use FALSE for quick transfer learning
) %>%
# modify default pre-trained model by adding 4 extra layers at the end tuned for our clinical text (medical notes)
layer_dense(units = 64, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 32, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 16, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
# layer_dense(units = 16, activation = "relu") %>%
# layer_dense(units = 6, activation = "relu") %>%
# layer_dense(units = 1, activation = "sigmoid")
summary(model3)## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param # Trainable
## ================================================================================
## keras_layer_1 (KerasLayer) (None, 128) 12464268 N
## 8
## dense_6 (Dense) (None, 64) 8256 Y
## dense_5 (Dense) (None, 32) 2080 Y
## dense_4 (Dense) (None, 16) 528 Y
## dense_3 (Dense) (None, 1) 17 Y
## ================================================================================
## Total params: 124653569 (475.52 MB)
## Trainable params: 10881 (42.50 KB)
## Non-trainable params: 124642688 (475.47 MB)
## ________________________________________________________________________________
model3 %>% compile(
optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = list('accuracy')
)
history <- model3 %>% fit(
train_data$clinicalNotes, train_data$surgLabel,
epochs = 10, ### increase epochs for better performance
batch_size = 128
)## Epoch 1/10
## 32/32 - 2s - loss: 0.5621 - accuracy: 0.7737 - 2s/epoch - 52ms/step
## Epoch 2/10
## 32/32 - 0s - loss: 0.5328 - accuracy: 0.7737 - 437ms/epoch - 14ms/step
## Epoch 3/10
## 32/32 - 0s - loss: 0.5257 - accuracy: 0.7737 - 440ms/epoch - 14ms/step
## Epoch 4/10
## 32/32 - 0s - loss: 0.5114 - accuracy: 0.7737 - 421ms/epoch - 13ms/step
## Epoch 5/10
## 32/32 - 0s - loss: 0.4846 - accuracy: 0.7737 - 419ms/epoch - 13ms/step
## Epoch 6/10
## 32/32 - 0s - loss: 0.4531 - accuracy: 0.7737 - 388ms/epoch - 12ms/step
## Epoch 7/10
## 32/32 - 0s - loss: 0.4285 - accuracy: 0.7727 - 401ms/epoch - 13ms/step
## Epoch 8/10
## 32/32 - 0s - loss: 0.4146 - accuracy: 0.7714 - 444ms/epoch - 14ms/step
## Epoch 9/10
## 32/32 - 0s - loss: 0.4046 - accuracy: 0.7707 - 404ms/epoch - 13ms/step
## Epoch 10/10
## 32/32 - 0s - loss: 0.3979 - accuracy: 0.7689 - 389ms/epoch - 12ms/step
## 32/32 - 0s - loss: 0.4004 - accuracy: 0.7690 - 372ms/epoch - 12ms/step
## loss accuracy
## 0.4004216 0.7690000
## 32/32 - 0s - 240ms/epoch - 8ms/step
##
## y_pred 0 1
## 0 664 121
## 1 111 104
Next we will use the structure/topology of the pre-trained model, but estimate all \(124M\) network parameters, not only the final \(11K\) parameters at the end, as we did earlier.
# 4. Full-scale Transfer learning using the skeleton of the pre-trained model, but estimating all parameters
model4 <- keras_model_sequential() %>%
layer_hub(
handle = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1",
input_shape = list(),
dtype = tf$string,
trainable = TRUE # Set to FALSE for simple TL-model retraining, we use TRUE for full-transfer learning
) %>%
# modify default pre-trained model by adding 4 extra layers at the end tuned for our clinical text (medical notes)
layer_dense(units = 64, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 32, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 16, activation = "sigmoid") %>%
#layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
# layer_dense(units = 16, activation = "relu") %>%
# layer_dense(units = 6, activation = "relu") %>%
# layer_dense(units = 1, activation = "sigmoid")
summary(model4)## Model: "sequential_2"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## keras_layer_2 (KerasLayer) (None, 128) 124642688
## dense_10 (Dense) (None, 64) 8256
## dense_9 (Dense) (None, 32) 2080
## dense_8 (Dense) (None, 16) 528
## dense_7 (Dense) (None, 1) 17
## ================================================================================
## Total params: 124653569 (475.52 MB)
## Trainable params: 124653569 (475.52 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model4 %>% compile(
optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = list('accuracy')
)
history <- model4 %>% fit(
train_data$clinicalNotes, train_data$surgLabel,
epochs = 10, ### increase epochs for better performance
batch_size = 128
)## Epoch 1/10
## 32/32 - 26s - loss: 0.7150 - accuracy: 0.4911 - 26s/epoch - 803ms/step
## Epoch 2/10
## 32/32 - 27s - loss: 0.5517 - accuracy: 0.7737 - 27s/epoch - 840ms/step
## Epoch 3/10
## 32/32 - 30s - loss: 0.5283 - accuracy: 0.7737 - 30s/epoch - 922ms/step
## Epoch 4/10
## 32/32 - 30s - loss: 0.4887 - accuracy: 0.7737 - 30s/epoch - 933ms/step
## Epoch 5/10
## 32/32 - 30s - loss: 0.4375 - accuracy: 0.7737 - 30s/epoch - 939ms/step
## Epoch 6/10
## 32/32 - 30s - loss: 0.4043 - accuracy: 0.7737 - 30s/epoch - 952ms/step
## Epoch 7/10
## 32/32 - 30s - loss: 0.3841 - accuracy: 0.7737 - 30s/epoch - 934ms/step
## Epoch 8/10
## 32/32 - 30s - loss: 0.3708 - accuracy: 0.7752 - 30s/epoch - 946ms/step
## Epoch 9/10
## 32/32 - 30s - loss: 0.3612 - accuracy: 0.7732 - 30s/epoch - 927ms/step
## Epoch 10/10
## 32/32 - 32s - loss: 0.3538 - accuracy: 0.7824 - 32s/epoch - 995ms/step
## 32/32 - 3s - loss: 0.4197 - accuracy: 0.7450 - 3s/epoch - 87ms/step
## loss accuracy
## 0.4197085 0.7450000
## 32/32 - 3s - 3s/epoch - 83ms/step
##
## y_pred 0 1
## 0 562 74
## 1 213 151
The final pair of steps include:
dataset_batch()) with 10 (for speed) or more (e.g., 100+,
for accuracy and precision) epochs. This process involves 10 (or 100+)
iterations over all samples in the dataset. During the fine-tuning
training process, the transfer learner will report the initial and each
subsequent model loss-value (optimization measure) and
accuracy (fidelity measure) on sets of 10,000 samples from the
validation set (see dataset_shuffle()).# Evaluate the model
# Examine the model performance.
# mind the trajectories of the Loss (representing the error),
# lower values are better), and accuracy, high values are better
library(plotly)
plot_ly(x = ~c(1:history$params$epochs), y = ~history$metrics$loss,
type = "scatter", mode="markers+lines", name="Loss") %>%
add_trace(x = ~c(1:history$params$epochs), y = ~history$metrics$accuracy,
type = "scatter", mode="markers+lines", name="Accuracy") %>%
layout(title="DNN Training Performance", xaxis=list(title="epoch"),
yaxis=list(title="Metric Value"), legend = list(orientation='h'),
hovermode = "x unified")This simple transfer learning approach achieves an accuracy of about
73-76%. More model customization and longer training are expected to
significantly improve the performance of the fine-tuned
transfer-learning DNN model. Additional information about R-based
tensorflow DNN modeling is available here and here.
Load all the appropriate R/Python packages and set up the RStudio environment.
The same clinical data can be used for multinomial classification,
where the outcome is the clinical specialty unit (there are 40
hospital units in this case-study), the input is the given
clinical text. Start by defining the special labels (clinical units).
The prediction of the 40-class labels will depend on the input \(x\) consisting of the string
clinicalNotes, representing the concatenated
transcriptions and descriptions.
In this transfer learning example of multiclass text classification, we will utilize the gnews-swivel-20dim model with text embedding trained on English Google News 130GB corpus.
library(stringi)
dataCT <- read.csv('https://umich.instructure.com/files/21152999/download?download_frd=1', header=T)
dataCT$description <- stri_encode(dataCT$description, "", "UTF-8")
dataCT$transcription <- stri_encode(dataCT$transcription, "", "UTF-8")
# Concatenate Transcriptions and Descriptions into one string/character: clinicalNotes
dataCT$clinicalNotes <- paste(dataCT$description, dataCT$transcription)
convert_specialty <- list()
keys <- unique(dataCT$medical_specialty)
medical_specialtyNames <- dataCT$medical_specialty
values <- 1:length(keys)
for(i in 1:length(keys)) { convert_specialty[keys[i]] <- values[i] }
specialty <- c()
for (i in 1:length(dataCT$medical_specialty)){
specialty[i] <- as.numeric(convert_specialty[dataCT$medical_specialty[i]])
}
dataCT$medical_specialty <- specialty
dataCT$medical_specialty <- matrix(dataCT$medical_specialty,
nrow = length(dataCT$medical_specialty), ncol = 1)
# Convert labels to categorical one-hot encoding
one_hot_SpecialtyLabels <- to_categorical(dataCT$medical_specialty,
num_classes = length(unique(dataCT$medical_specialty))+1)
one_hot_SpecialtyLabels <- one_hot_SpecialtyLabels[, -1] # remove empty column 1
# library(keras)
# labels <- to_categorical
# sum(one_hot_SpecialtyLabels) [1] 4999
num_words <- 10000
max_length <- 300
text_vectorization <- layer_text_vectorization(max_tokens = num_words, output_sequence_length = max_length)
train_set_ind <- sample(nrow(dataCT), floor(nrow(dataCT)*0.8)) # 80:20 plot training:testing
train_data <- dataCT[train_set_ind, ]
test_data <- dataCT[-train_set_ind, ]
one_hot_SpecialtyLabels_trainY <- one_hot_SpecialtyLabels[train_set_ind, ]
one_hot_SpecialtyLabels_testY <- one_hot_SpecialtyLabels[-train_set_ind, ]
# input <- layer_input(shape = c(1), dtype = "string") # for raw text input as string, needs to match exp next layer
# output <- input %>%
# text_vectorization() %>%
# layer_embedding(input_dim = num_words + 1, output_dim = 256) %>%
# layer_global_average_pooling_1d() %>%
# layer_dense(units = 256, activation = "relu") %>%
# layer_dropout(0.25) %>%
# layer_dense(units = 128, activation = "relu") %>%
# layer_dropout(0.25) %>%
# layer_dense(units = 64, activation = "relu") %>%
# # layer_dropout(0.25) %>%
# layer_dense(units = length(keys), activation = 'softmax')
# model2 <- keras_model(input, output)
#
# model2 %>% compile(
# loss = 'categorical_crossentropy',
# optimizer = optimizer_sgd(learning_rate = 0.01, decay = 1e-6, momentum = 0.9, nesterov = TRUE),
# metrics = list('accuracy')
# )
#
# history2 <- model2 %>% fit(train_data$clinicalNotes, one_hot_SpecialtyLabels_trainY,
# epochs = 10, batch_size = 512, validation_split = 0.2, verbose=2)
#
# # Evaluate the model2 performance
# results2 <- model2 %>% evaluate(test_data$clinicalNotes, one_hot_SpecialtyLabels_testY, verbose = 2)
# results2
#
# score <- model2 %>% evaluate(test_data$clinicalNotes, one_hot_SpecialtyLabels_testY)
# print(score)
# y_pred <- model2 %>% predict(test_data$clinicalNotes)
# head(apply(y_pred, 1, which.max)) # table(apply(y_pred, 1, which.max))
# # hist(y_pred[,8])
# table(y_pred, test_data$medical_specialty)
#
# ============================================
model3 <- keras_model_sequential() %>%
layer_hub(
handle = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1",
input_shape = list(),
dtype = tf$string,
trainable = TRUE
) %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dropout(0.25) %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dropout(0.25) %>%
layer_dense(units = 64, activation = "relu") %>%
# layer_dropout(0.25) %>%
layer_dense(units = length(keys), activation = 'softmax')
summary(model3)## Model: "sequential_3"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## keras_layer_3 (KerasLayer) (None, 20) 400020
## dense_14 (Dense) (None, 256) 5376
## dropout_2 (Dropout) (None, 256) 0
## dense_13 (Dense) (None, 128) 32896
## dropout_1 (Dropout) (None, 128) 0
## dense_12 (Dense) (None, 64) 8256
## dense_11 (Dense) (None, 40) 2600
## ================================================================================
## Total params: 449148 (1.71 MB)
## Trainable params: 449148 (1.71 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
model3 %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_sgd(learning_rate = 0.01, momentum = 0.9, nesterov = TRUE),
metrics = list('accuracy')
)
history3 <- model3 %>% fit(train_data$clinicalNotes, one_hot_SpecialtyLabels_trainY,
epochs = 100, batch_size = 512, validation_split = 0.2, verbose=0)
results3 <- model3 %>% evaluate(test_data$clinicalNotes, one_hot_SpecialtyLabels_testY, verbose = 0)
print(paste0("Mind that the testing-case performance metrics (Loss=", round(results3["loss"], 3),
" and Accuracy=", round(results3["accuracy"], 3),
") of the DNN text classification reflect results of ",
length(keys), " medical specialties (classes), not a binary classification!")) ## [1] "Mind that the testing-case performance metrics (Loss=2.208 and Accuracy=0.355) of the DNN text classification reflect results of 40 medical specialties (classes), not a binary classification!"
## 32/32 - 0s - loss: 2.2077 - accuracy: 0.3550 - 254ms/epoch - 8ms/step
## loss accuracy
## 2.20774 0.35500
## 32/32 - 0s - 355ms/epoch - 11ms/step
## [1] 38 38 13 8 8 13
y_pred_class <- apply(y_pred, 1, which.max)
# hist(y_pred[,8])
table(y_pred_class, test_data$medical_specialty[,1])##
## y_pred_class 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17
## 3 0 0 15 0 0 0 0 10 0 0 0 3 0 0 0 0
## 4 0 0 1 5 0 0 0 0 0 0 0 4 0 0 0 0
## 7 1 0 3 0 0 1 3 0 0 1 0 0 1 0 0 2
## 8 0 0 32 8 2 18 2 197 0 1 0 1 0 4 0 2
## 10 0 0 2 1 0 1 8 0 0 11 0 0 1 0 0 2
## 13 0 0 12 10 0 2 1 0 0 1 4 43 0 2 0 0
## 14 0 0 0 3 0 0 0 0 0 0 0 0 2 0 0 0
## 16 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0
## 19 0 0 1 4 0 0 1 3 1 2 0 2 0 0 2 1
## 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 30 0 0 2 0 0 0 0 1 0 0 0 2 1 0 0 1
## 34 0 0 3 0 0 1 6 2 0 1 0 0 0 0 0 3
## 38 2 2 12 15 1 6 31 1 1 13 2 3 2 1 1 7
##
## y_pred_class 18 19 20 21 22 23 24 25 27 29 30 31 32 33 34 35
## 3 0 0 0 0 0 2 0 0 0 0 1 1 0 0 0 0
## 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 0 3 0 3 0 0 0 0 0 0 3 1 0 2 2 0
## 8 16 36 18 1 27 20 9 1 0 4 26 11 0 1 1 0
## 10 0 0 1 0 1 0 1 1 0 0 0 1 1 0 0 3
## 13 0 6 0 0 2 0 5 0 0 1 2 0 0 0 0 0
## 14 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0
## 16 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 19 0 6 1 1 0 0 0 1 0 0 0 0 0 2 2 0
## 27 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 30 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## 34 0 1 0 0 1 0 1 0 0 2 2 1 1 1 8 0
## 38 0 9 2 9 1 0 3 1 1 8 8 3 1 9 7 0
##
## y_pred_class 36 37 38 39 40
## 3 0 0 2 0 0
## 4 0 0 0 0 0
## 7 1 0 14 0 0
## 8 4 6 3 0 0
## 10 0 0 4 0 0
## 13 0 0 2 1 0
## 14 0 0 6 0 0
## 16 0 0 0 1 0
## 19 0 1 4 2 1
## 27 0 0 0 0 0
## 30 0 0 1 0 0
## 34 0 0 0 0 0
## 38 3 0 63 1 0
# DT::datatable(matrix(table(y_pred_class, test_data$medical_specialty[,1]),40,40) )
heat <- matrix(0, 40, 40)
for ( i in 1:length(test_data$clinicalNotes)) {
heat[test_data$medical_specialty[i, 1], y_pred_class[i]] =
heat[test_data$medical_specialty[i, 1], y_pred_class[i]] + 1
}
plot_ly(x =~keys, y = ~keys, z = ~heat, name="Model Performance",
hovertemplate = paste('<i>Matching</i>: %{z:.0f}',
'<br><b>True</b>: %{x}<br>', '<b>Pred</b>: %{y}'),
colors = 'Reds', type = "heatmap") %>%
layout(title="Predicated Classes vs. True Clinical Units",
xaxis=list(title="Actual Class"), yaxis=list(title="Predicted Class"))All readers are encouraged to try text-based transfer learning using alternative datasets, e.g., the 50,000 movie reviews dataset. The code skeleton below illustrates the basic pipeline workflow for the movie review’s binary classifications.
# Load Movie Reviews (50K)
# split the entire dataset into a list of 3 objects:
# imdb[[1]]=training_set, imdb[[2]]=testing_set, imdb[[3]]=validation_set
imdb <-
tfds::tfds_load (
"imdb_reviews:1.0.0",
split = list("train[:60%]", "train[-40%:]", "test"),
as_supervised = TRUE
)
# Install keras package if you haven't already
# Load the keras package
# library(keras)
#
# # Load the IMDb dataset
# imdb <- dataset_imdb(num_words = 10000)
#
# # Split the dataset into train, validation, and test sets
# train_split <- 0.6
# validation_split <- 0.4
#
# # Calculate the number of samples for each split
# total_samples <- length(imdb$train$x)[1]
# train_samples <- round(train_split * total_samples)
# validation_samples <- round(validation_split * total_samples)
#
# # Create train, validation, and test sets
# train_dataset <- list(x = imdb$train$x[1:train_samples], y = imdb$train$y[1:train_samples])
# validation_dataset <- list(x = imdb$train$x[(train_samples + 1):(train_samples + validation_samples)], y = imdb$train$y[(train_samples + 1):(train_samples + validation_samples)])
# test_dataset <- imdb$test
#
# # Save train, validation, and test datasets into imdb[[1]], imdb[[2]], and imdb[[3]], respectively
# imdb[[1]] <- train_dataset
# imdb[[2]] <- validation_dataset
# imdb[[3]] <- test_dataset
# imdb <- tfds_load(
# "imdb_reviews:1.0.0",
# split = c("train[:60%]", "train[-40%:]", "test"),
# as_supervised = TRUE
# )
# summary(imdb)
# tfds_load returns a TensorFlow Dataset, an abstraction representing a list
# of elements, in which each element consists of one or more components.
# To access individual elements of a Dataset:
#
# library(tfds)
# library(magrittr)
firstBatch <- imdb[[1]] %>%
dataset_batch(1) %>% # Used to get only the first example
reticulate::as_iterator() %>%
reticulate::iter_next()
str(firstBatch)## List of 2
## $ :<tf.Tensor: shape=(1), dtype=string, numpy=…>
## $ :<tf.Tensor: shape=(1), dtype=int64, numpy=array([0], dtype=int64)>
# imdb_train_iterator <- as_iterator(imdb[[1]])
#
# # Retrieve the first example from the iterator
# firstBatch <- iter_next(imdb_train_iterator)
# library(magrittr)
# firstBatch <- list(
# x = imdb$train$x[[1]],
# y = imdb$train$y[[1]]
# )
review1 <- as_utf8(as.character(firstBatch[1][[1]]$numpy()[1][[1]])) # get text-review (string)
label1 <- as.numeric(firstBatch[2][[1]]$numpy()) # get binary class (0/1)
embedding_layer <- layer_hub(
handle ="https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
embedding_layer(firstBatch[[1]])## tf.Tensor(
## [[ 9.01966274e-01 -4.83913347e-03 1.17907055e-01 3.81319046e-01
## 6.57222793e-02 -3.01581532e-01 8.90584365e-02 -2.69034863e-01
## -8.51345584e-02 1.08877886e-02 -6.66372627e-02 -3.73063087e-01
## -2.76447266e-01 -1.87254980e-01 5.67507632e-02 9.09779966e-02
## -6.24961555e-02 -3.28687276e-03 -3.08512092e-01 3.78482223e-01
## 7.62880966e-02 1.43733576e-01 -1.12897493e-01 9.59761534e-03
## -2.38938913e-01 2.93743908e-02 7.28663057e-02 -2.48727947e-02
## -8.16893280e-02 6.68320432e-02 -5.62225394e-02 2.47078985e-01
## 1.17681175e-01 3.17581035e-02 2.65932620e-01 -1.37706831e-01
## -1.50708258e-01 -1.63614675e-01 -1.51269153e-01 2.34616160e-01
## -9.12236273e-02 -4.22684886e-02 -1.01224177e-01 -2.12229744e-01
## 6.74503446e-02 1.85163647e-01 3.62982228e-02 -3.50210071e-01
## -5.92576079e-02 -9.54059511e-02 -9.65666175e-02 3.79339904e-02
## -2.36725271e-01 2.67956525e-01 -2.22367734e-01 -1.80506572e-01
## -1.13724798e-01 4.91059460e-02 -1.19525626e-01 -2.27335095e-03
## -1.81468800e-01 -4.74342071e-02 9.61481929e-02 4.93341237e-02
## 2.69693173e-02 2.66610924e-02 -8.21918398e-02 -2.03230649e-01
## 2.25084737e-01 7.74206817e-02 -1.10149167e-01 1.33730099e-01
## 1.08389042e-01 -2.49691661e-02 3.02257799e-02 2.03551911e-02
## -1.39646962e-01 -1.77291587e-01 -1.31853789e-01 1.65671393e-01
## -4.72507323e-04 -9.78293121e-02 -1.64517537e-01 6.93127662e-02
## -7.20646083e-02 -1.01133175e-02 -4.18493431e-03 2.48376504e-01
## 7.00922966e-01 6.45013988e-01 -2.46314004e-01 2.48779714e-01
## 5.55042960e-02 -1.72061652e-01 5.44746453e-03 2.16645315e-01
## 1.24983951e-01 -1.32985115e-02 -9.09600873e-03 8.74783769e-02
## -2.72958595e-02 5.59117980e-02 2.11243659e-01 2.08114520e-01
## 1.86446942e-02 -2.44881704e-01 -2.11568519e-01 6.63717464e-02
## -1.52921677e-01 9.16463733e-02 -1.56010687e-01 4.47210558e-02
## -1.58450484e-01 -1.72194898e-01 -5.40404953e-02 -2.69618005e-01
## 1.23170123e-01 2.13364601e-01 -6.43658787e-02 3.61668468e-02
## 2.14489356e-01 -1.19912423e-01 -4.83419979e-04 2.64609545e-01
## 5.51236942e-02 -3.29729654e-02 3.31326015e-02 2.97882948e-02]], shape=(1, 128), dtype=float32)
# build the complete model
model <- keras_model_sequential() %>%
layer_hub(
handle = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1",
input_shape = list(),
dtype = tf$string,
trainable = TRUE
) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
summary(model)## Model: "sequential_4"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## keras_layer_5 (KerasLayer) (None, 128) 124642688
## dense_17 (Dense) (None, 16) 2064
## dense_16 (Dense) (None, 8) 136
## dense_15 (Dense) (None, 1) 9
## ================================================================================
## Total params: 124644897 (475.48 MB)
## Trainable params: 124644897 (475.48 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________
# compile model
model %>%
compile(optimizer="adam", loss="binary_crossentropy", metrics="accuracy")
# model training
history <- model %>%
fit(
imdb[[1]] %>% dataset_shuffle(10000) %>% dataset_batch(512),
epochs = 4, # for convergence, use larger number of epochs (e.g., 20+)
validation_data = imdb[[2]] %>% dataset_batch(512), verbose = 0)
library(plotly)
# plot performance
pl_loss <- plot_ly(x = ~c(1:history$params$epochs), y = ~history$metrics$loss,
type = "scatter", mode="markers+lines", name="Loss") %>%
add_trace(x = ~c(1:history$params$epochs), y = ~history$metrics$val_loss,
type = "scatter", mode="markers+lines", name="Validation Loss") %>%
layout(title="DNN Training/Validation Performance", xaxis=list(title="epoch"),
yaxis=list(title="Metric Value"), legend = list(orientation='h'),
hovermode = "x unified")
pl_acc <- plot_ly(x = ~c(1:history$params$epochs), y = ~history$metrics$accuracy,
type = "scatter", mode="markers+lines", name="Accuracy") %>%
add_trace(x = ~c(1:history$params$epochs), y = ~history$metrics$val_accuracy,
type = "scatter", mode="markers+lines", name="Validation Accuracy") %>%
layout(title="DNN Training/Validation Performance", xaxis=list(title="epoch"),
yaxis=list(title="Metric Value"), legend = list(orientation='h'),
hovermode = "x unified")
subplot(pl_loss, pl_acc, nrows=2, shareX = TRUE, titleX = TRUE)# model evaluation on testing data
model %>%
evaluate(imdb[[3]] %>% dataset_batch(512), verbose = 0)## loss accuracy
## 0.3188951 0.8667600
Similar to the unstructured text-mining (film review case) we illustrated above, we can use DNN transfer learning for image classification.
The cross-entropy measure of dissimilarity between two discrete probability distributions \(p\) (true state) and \(q\) (predicted state) with identical support \(X\) is defined as
\[H(p,q) = -\sum _{x_i\in X}{p(x_{i})\log q(x_{i})}.\] For binary outcomes, logistic regression transforms the log-loss over all training observations, i.e., it optimizes the average cross-entropy in the sample.
For a sample indexed by \(n = 1, \cdots, N\), the expected (average) loss function is:
\[J(w) ={\frac{1}{N}} \sum _{n=1}^{N}H(p_{n},q_{n})\ =\ -{\frac {1}{N}}\sum _{n=1}^{N}\ {\bigg [}y_{n}\log {\hat {y}}_{n}+(1-y_{n})\log(1-{\hat {y}}_{n}){\bigg ]},\]
where \({\hat {y}}_{n}\equiv g(w \cdot x_{n})=\frac{1}{1+e^{-w \cdot x_{n}}}\) and \(g(z)\) is the logistic function. The logistic loss is the cross-entropy loss or log-loss, and binary refers to the situation of binary outcome labels \(\{-1,+1\}\).
Hence, the binary cross-entropy (BCE) is simply
\[H(p,q) = -\sum _{x_i\in X}{p(x_{i})\log q(x_{i})} = -y\log {\hat {y}}-(1-y)\log(1-{\hat {y}}),\] where \(p \in \{ y , 1 − y \}\) and \(q \in \{ \hat {y}, 1 − \hat {y} \}\) represent the probability of the true and predicted binary outcomes, respectively.
High or low BCE values indicate “bad” or “good” model performance, respectively, with a perfect model having a \(BSE\approx 0\).
The Sørensen–Dice coefficient (Dice Coefficient) is another measure to assess the similarity between two sets, samples, or distributions. In our case we are applying the dice coefficient to track the overlap between the true brain-tumor masks, and the DCNN-derived mask-estimate (prediction) of the tumor based on the raw brain image.
| Discrete sets \(X\) and \(Y\) | (Boolean) Binary Data | Probabilities (e.g., quantiles) |
|---|---|---|
| \(D=\frac{2 |X\cap Y|}{|X|+|Y|}\), \(|\cdot |\) is set cardinality | TP=true positive, FP=false positive, FN=false negative, \(D=\frac {2TP}{2TP+FP+FN}\) | \(D=\frac {2|{\bf{p}}\cdot {\bf {q}}|}{|{\bf{p}}|^{2}+|{\bf {q}}|^{2}}\) |
Torch
Deep Convolutional Neural Network (CNN)The U-Net:
Convolutional Networks for Biomedical Image Segmentation, shown on
the image below, is an example of a DCNN. The U-shaped CNN (U-Net) represents
successive convolutional layers with max-pooling. During the
auto-encoding (left-down-hill branch) the U-Net reduces image resolution
(downsampling), whereas during the subsequent decoding phase
(right-uphill branch) upsamples the images to arrive at an output of the
same size as the original input. The information analysis (encoding) and
synthesis (decoding) facilitate the labeling of each output image pixel
by feeding information in each decoding layer from the corresponding
encoding layer with matching resolution in the downsizing encoding
layer.
Each upsampling (decoding) step concatenates the output from the previous layer with that from its counterpart in the compression (encoding) step. The final decoding output is a mask of the same size as the original image, derived by a \(1\times 1\)-convolution, which does not require a dense layer at the end as the output convolutional layer represents a single filter. Below we show how to load, train, and use a U-Net for transfer learning in 2D image segmentation. Note that this model has over \(3M\) trainable parameters. You can see an R example of a Unet model for input-output tensors of shape=c(128,128), see lines 73-183.
# If necessary, download the U-Net package, before you load it into R
# remotes::install_github("r-tensorflow/unet")
library(tfdatasets)
library(tfds)
library(tfhub)
library(tfruns)
library(torch)
# torch::install_torch()
# remotes::install_github("r-tensorflow/unet")
library(unet)
library(tibble)
# The u-Net call takes additional parameters, e.g., number of downsizing blocks, number of filters to start with,
# number of classes to identify; # ?unet provides details. For instance, we can specify the shape
# of the input images we will be segmenting tumors for: 256*256 3-channel RGB images.
model <- unet(input_shape = c(256, 256, 3))
# to print the model as text output, run:
# model
# Results: # Trainable params: 31,031,745Let’s first download and load in the Brain Tumor Imaging dataset. These data come from a 2019 study on Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm. The 2D brain MR images are paired with 2D tumor masks, which are trivial for controls and non-trivial for patients, using The Cancer Imaging Archive (TCIA). The data represent 110 patients with lower-grade glioma and include fluid-attenuated inversion recovery (FLAIR) MRI scans. There are 3-channels of the MRI data; pre-contrast, FLAIR, and post-contrast. The corresponding tumor masks were obtained by manual-delineations on the FLAIR images by board-certified radiologists.
# If you need to start a clean fresh run, remove all old files first! Be careful with this! set eval=T in all R-blocks!
##### First check > list.files("/data/")
##### do.call(file.remove, list(list.files("/data", full.names = TRUE)))
##### unlink("/data/*", recursive=TRUE, force=TRUE)
library(httr)
pathToZip <- tempfile()
pathToZip<-paste0(pathToZip,".zip")
#
#
url <- "https://umich.instructure.com/files/21813670/download?download_frd=1"
response <- GET(url)
content_type <- http_type(response)
print(content_type)
if (content_type == "application/zip" || content_type == "application/x-zip-compressed") {
content <- content(response, "raw")
writeBin(content, pathToZip)
} else {
stop("Unexpected content type received.")
}
# download.file("https://umich.instructure.com/files/21813670/download?download_frd=1", pathToZip, mode = "wb")
zip::unzip(pathToZip, files=NULL, exdir = paste0(getwd(),'/data'))library(tibble)
library(rsample)
train_dir <- file.path(getwd(),"data","data")
valid_dir <- file.path(getwd(),"data","mri_valid")
library(magick) # Needed for TIFF --> PNG image conversion and other image processing tasksCreate the necessary directories to store the training and validation imaging data (brain MRIs and tumor masks).
# check if ReadMe file is accessible
# file.rename("/data/ReadMe_TCGA_MRI_Segmentation_Data_Phenotypes.txt", train_dir)
# Import the meta-data
# meta_data <- read.csv(paste0(getwd(),"//data//TCGA_MRI_Segmentation_Data_Phenotypes.csv"))
file_path <- file.path(getwd(), "data", "TCGA_MRI_Segmentation_Data_Phenotypes.csv")
# Read the CSV file
meta_data <- read.csv(file_path)
# note that these are relative file/directory names. To see the complete local path
# tempdir(); getwd()
# Create a validation folder
dir.create(valid_dir)
# Check all n=110 patients are accessible
patients <- list.dirs(train_dir, recursive = FALSE)
length(patients)
# Randomly select 20 Patients for validation, remaining 90=110-20 are for training the DNN model
valid_indices <- sample(1:length(patients), 20)
valid_indices
patients[valid_indices] # prints the actual folders where the validation participants' data is
# Extract and Relocate the Validation cases (separate them from training data)
for (i in valid_indices) {
dir.create(file.path(valid_dir, basename(patients[i])))
for (f in list.files(patients[i])) {
file.rename(file.path(train_dir, basename(patients[i]), f), file.path(valid_dir, basename(patients[i]), f))
}
unlink(file.path(train_dir, basename(patients[i])), recursive = TRUE) # clean
}
# Confirm that only 80 patients are left in the standard data folder
# list all training data imaging files: list.dirs(train_dir, recursive = FALSE)
length(list.dirs(train_dir, recursive = FALSE))
# and 30-60 validation cases are in the validation folder
length(list.dirs(valid_dir, recursive = FALSE))
# and check validation data
length(list.files(valid_dir, recursive = T)) # [1] 1268Define data-frames containing the file-names for all training and validation data.
# Identify the TRAINING and VALIDATION data objects (raw images + tumor masks) as filenames
data_train <- tibble(
img = grep(list.files(train_dir, full.names = TRUE, pattern = "tif", recursive = TRUE),
pattern = 'mask', invert = TRUE, value = TRUE),
mask = grep(list.files(train_dir, full.names = TRUE, pattern = "tif", recursive = TRUE),
pattern = 'mask', value = TRUE)
)
data_valid <- tibble(
img = grep(list.files(valid_dir, full.names = TRUE, pattern = "tif", recursive = TRUE),
pattern = 'mask', invert = TRUE, value = TRUE),
mask = grep(list.files(valid_dir, full.names = TRUE, pattern = "tif", recursive = TRUE),
pattern = 'mask', value = TRUE)
)(Optionally) convert all 2D TIFF images to PNG RGB format! This may be necessary to ensure the input images are 3-channels, and are correctly interpreted as tensorflow objects.
# If all training + testing data are in one folder, split them by:
# data <- initial_split(data_train, prop = 0.8)
# convert all Training Data: TIFF images and masks to PNG format (for easier TF processing downstream)
files_img_tif <- data_train$img[grepl("\\.tif$", data_train$img), drop = TRUE]
data_train_img_png <- lapply(files_img_tif,
function(x) {
# image_write(image_read(x), path = gsub(".tif$", ".png", x), format = "png")
a = image_convert(image_read(x), format = "png")
image_write(a, path = gsub(".tif$", ".png", x), format = "png")
}
)
files_mask_tif <- data_train$mask[grepl("\\.tif$", data_train$mask), drop = TRUE]
data_train_mask_png <- lapply(files_mask_tif,
function(x) {
# image_write(image_read(x), path = gsub(".tif$", ".png", x), format = "png")
a = image_convert(image_read(x), format = "png")
image_write(a, path = gsub(".tif$", ".png", x), format = "png")
}
)
# Similarly convert all Validation Data
# convert all TIFF images and masks to PNG format (for easier TF processing downstream)
files_valid_img_tif <- data_valid$img[grepl("\\.tif$", data_valid$img), drop = TRUE]
data_valid_img_png <- lapply(files_valid_img_tif,
function(x) {
# image_write(image_read(x), path = gsub(".tif$", ".png", x), format = "png")
a = image_convert(image_read(x), format = "png")
image_write(a, path = gsub(".tif$", ".png", x), format = "png")
}
)
files_valid_mask_tif <- data_valid$mask[grepl("\\.tif$", data_valid$mask), drop = TRUE]
data_valid_mask_png <- lapply(files_valid_mask_tif,
function(x) {
# image_write(image_read(x), path = gsub(".tif$", ".png", x), format = "png")
a = image_convert(image_read(x), format = "png")
image_write(a, path = gsub(".tif$", ".png", x), format = "png")
}
)
# Check that the TIF --> PNG conversion worked, inspect one case
head(list.files("/data/data/TCGA_HT_A61A_20000127"))
# data_valid # check root directory
# Inspect some of the images/masks
# image_info(image_read(data_train_img_png[[3]]))
# image_write(image_read(data_train$img[3]), format = "tiff")
# image_write(image_read(data_train$img[3]), path = paste0(data_train$img[3], ".png"), format = "png")
# a <- image_read(paste0(data_train$img[3], ".png"))
# list.files(train_dir)
# To clean previous file references
# # delete a directory -- must add recursive = TRUE
# unlink("/data", recursive = TRUE); # Clean space # gc(full=T)Derive a binary class label - cancer (for non-trivial tumor masks) or control (for empty tumor masks).
# Compute a new binary outcome variable 1=Brain Tumor (mask has at least 1 white pixel), 0=Normal Brain, no white pixels in the mask
pos_neg_diagnosis <- sapply(data_train$mask,
function(x) { value = max(imager::magick2cimg(image_read(x)))
ifelse (value > 0, 1, 0) }
)
table(pos_neg_diagnosis) #; head(data_train)## pos_neg_diagnosis
## 0 1
## 2164 1153
# pos_neg_diagnosis
# 0 1
# 2046 1103
# Add the normal vs. cancer label to training and testing datasets
data_train$label <- pos_neg_diagnosis
pos_neg_diagnosis_valid <- sapply(data_valid$mask,
function(x) { value = max(imager::magick2cimg(image_read(x)))
ifelse (value > 0, 1, 0) }
)
table(pos_neg_diagnosis_valid)## pos_neg_diagnosis_valid
## 0 1
## 392 220
Next we will ingest the 3-channel (RGB) imaging data and the
corresponding tumor masks (binary images) for each participant.
The method torch::dataset() allows specifying
initialize() and .getitem() methods for
complex computable data objects. The first method
initialize() creates the archive of imaging and
mask file names that can be utilized by the second method
.getitem() for iterating over all cases. The method
.getitem() returns ordered input-output pairs and performs
weighted sampling, with prevalence to large lesion images, which is
useful for accounting for DNN training with imbalanced classes.
The training sets can be enhanced by data augmentation – a process expanding the set of training images and masks via operations such as flipping, resizing, and rotating based on certain specifications.
Below we use PyTorch to define
a brain_dataset method providing a larger
augmented training dataset, new size
length(train_ds) ~ 2K, and a larger validation
set, new size length(valid_ds)~1K. In practice, we can use
any alternative transfer-learning strategy including
pytorch, tensorflow, theano,
etc.
Note that unet training takes significant computational
time; training 20-epochs took a total of 600 compute hours, which
translates into a couple of days of computing on a 20-core server. We
have provided several
precomputed/pre-trained *.pt models on Canvas.
This DSPA2 module represents Part 2 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks). Learners are encouraged to first complete Part 1 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks) prior to continuing with transfer learning and this Part 2.
Next, after completing this Part 2, go to Part 3 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks) and finally Part 4 of the DSPA2 Chapter 14 (Deep Learning, Neural Networks), which cover the Torch and Tensorflow Image Pre-processing Image Classification Pipelines.