1 Overview

This BPAD/SOCR case-study is focused on understanding clinical data (spreadsheets), imaging data (MRI), and extraction of imaging-derived biomarkers (morphometric measures). The joint clinical and imaging data can be used for prediction, classification, segmentation, modeling, and forecasting of kidney stones and tumors using imaging and clinical data.

library("tidyverse")
library(plotly)

This kidney case-study can be used for building some simple models (e.g., correlational or linear-models using the meta-data, or imaging models computing the size, volume, diameter, surface area etc. of the kidney mask volumes, which are small, see attached 3D NII.gz kidney stone segmentation-mask volume).

Technical Details

URL
The KiTS21 cohort includes 210 patients who underwent partial or radical nephrectomy for suspected renal malignancy between 2010 and 2020 at M Health Fairview or Cleveland Clinic medical center.
Meta-data: (.csv), includes the following data elements: case_id, age_at_nephrectomy, gender, body_mass_index, comorbidities/myocardial_infarction, comorbidities/congestive_heart_failure, comorbidities/peripheral_vascular_disease, comorbidities/cerebrovascular_disease, comorbidities/dementia, comorbidities/copd, comorbidities/connective_tissue_disease, comorbidities/peptic_ulcer_disease, comorbidities/uncomplicated_diabetes_mellitus, comorbidities/diabetes_mellitus_with_end_organ_damage, comorbidities/chronic_kidney_disease, comorbidities/hemiplegia_from_stroke, comorbidities/leukemia, comorbidities/malignant_lymphoma, comorbidities/localized_solid_tumor, comorbidities/metastatic_solid_tumor, comorbidities/mild_liver_disease, comorbidities/moderate_to_severe_liver_disease, comorbidities/aids, smoking_history, age_when_quit_smoking, pack_years, chewing_tobacco_use, alcohol_use, intraoperative_complications/blood_transfusion, intraoperative_complications/injury_to_surrounding_organ, intraoperative_complications/cardiac_event, hospitalization, ischemia_time, radiographic_size, pathologic_size, malignant, pathology_t_stage, pathology_n_stage, pathology_m_stage, tumor_histologic_subtype, tumor_necrosis, tumor_isup_grade, clavien_surgical_complications, er_visit, readmission, estimated_blood_loss, surgery_type, surgical_procedure, surgical_approach, operative_time, cytoreductive, positive_resection_margins, last_preop_egfr/value, last_preop_egfr/days_before_nephrectomy, first_postop_egfr/value, first_postop_egfr/days_before_nephrectomy, last_postop_egfr/value, last_postop_egfr/days_before_nephrectomy, vital_status, vital_days_after_surgery, voxel_spacing/x_spacing, voxel_spacing/y_spacing, voxel_spacing/z_spacing
Imaging Data:
- Raw MRI images, e.g., volume case 00001. Note that these are very large 3D (solid) volumes, i.e., tensors of approximate dimensions $500^3$ stored in *.nii.gz format, which are very difficult to manipulate and visualize real-time. We’ll demonstrate interactive use by subsampling the 3D volumes to facilitate quick and effective navigation, however, in real studies, working with specialized software and high-resolution volumes is advantageous.
- Imaging MASKS data are available in NII.gz format on GitHub e.g., mask case 00001. The masks can be easily displayed using the SOCR Brain Viewer. Note that each mask (Kidney_mask_3D_vol, segmentation.nii.gz) has 3 intensity levels - background (0), kidney-mask (1), and tumor-mask (2).

2 Meta-Data Import

We’ll start with importing the clinical meta-data, which is available in a .csv file format on Canvas.

Kidney_dataset <- read.csv("https://umich.instructure.com/files/29594707/download?download_frd=1", 
                    header = TRUE)

2.1 Meta-Data Summaries

Below we describe the types of each variable in the clinical meta-data table and provide appropriate feature summaries for these quantitative and qualitative variables.

## Summaries
library(psych)
library(DT)
Hmisc::describe(Kidney_dataset)

## Kidney_dataset 
## 
##  63  Variables      210  Observations
## --------------------------------------------------------------------------------
## case_id 
##        n  missing distinct 
##      210        0      210 
## 
## lowest : case_00000 case_00001 case_00002 case_00003 case_00004
## highest: case_00205 case_00206 case_00207 case_00208 case_00209
## --------------------------------------------------------------------------------
## age_at_nephrectomy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0       59    0.999    58.35    15.77    35.00    39.80 
##      .25      .50      .75      .90      .95 
##    50.00    61.00    68.00    74.00    77.55 
## 
## lowest :  1 11 12 17 21, highest: 82 83 84 86 90
## --------------------------------------------------------------------------------
## gender 
##        n  missing distinct 
##      210        0        2 
##                         
## Value      female   male
## Frequency      87    123
## Proportion  0.414  0.586
## --------------------------------------------------------------------------------
## body_mass_index 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0      198        1    31.17    7.293    22.05    23.72 
##      .25      .50      .75      .90      .95 
##    26.47    30.07    35.61    40.07    42.87 
## 
## lowest : 16.2  18.92 19.96 20.13 20.16, highest: 45.6  46.62 48.24 48.42 49.61
## --------------------------------------------------------------------------------
## comorbidities.myocardial_infarction 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    201     9
## Proportion 0.957 0.043
## --------------------------------------------------------------------------------
## comorbidities.congestive_heart_failure 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    204     6
## Proportion 0.971 0.029
## --------------------------------------------------------------------------------
## comorbidities.peripheral_vascular_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    201     9
## Proportion 0.957 0.043
## --------------------------------------------------------------------------------
## comorbidities.cerebrovascular_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    206     4
## Proportion 0.981 0.019
## --------------------------------------------------------------------------------
## comorbidities.dementia 
##        n  missing distinct    value 
##      210        0        1    false 
##                 
## Value      false
## Frequency    210
## Proportion     1
## --------------------------------------------------------------------------------
## comorbidities.copd 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    199    11
## Proportion 0.948 0.052
## --------------------------------------------------------------------------------
## comorbidities.connective_tissue_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    209     1
## Proportion 0.995 0.005
## --------------------------------------------------------------------------------
## comorbidities.peptic_ulcer_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    209     1
## Proportion 0.995 0.005
## --------------------------------------------------------------------------------
## comorbidities.uncomplicated_diabetes_mellitus 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    172    38
## Proportion 0.819 0.181
## --------------------------------------------------------------------------------
## comorbidities.diabetes_mellitus_with_end_organ_damage 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    208     2
## Proportion  0.99  0.01
## --------------------------------------------------------------------------------
## comorbidities.chronic_kidney_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    194    16
## Proportion 0.924 0.076
## --------------------------------------------------------------------------------
## comorbidities.hemiplegia_from_stroke 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    209     1
## Proportion 0.995 0.005
## --------------------------------------------------------------------------------
## comorbidities.leukemia 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    209     1
## Proportion 0.995 0.005
## --------------------------------------------------------------------------------
## comorbidities.malignant_lymphoma 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    208     2
## Proportion  0.99  0.01
## --------------------------------------------------------------------------------
## comorbidities.localized_solid_tumor 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    181    29
## Proportion 0.862 0.138
## --------------------------------------------------------------------------------
## comorbidities.metastatic_solid_tumor 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    198    12
## Proportion 0.943 0.057
## --------------------------------------------------------------------------------
## comorbidities.mild_liver_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    206     4
## Proportion 0.981 0.019
## --------------------------------------------------------------------------------
## comorbidities.moderate_to_severe_liver_disease 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    209     1
## Proportion 0.995 0.005
## --------------------------------------------------------------------------------
## comorbidities.aids 
##        n  missing distinct    value 
##      210        0        1    false 
##                 
## Value      false
## Frequency    210
## Proportion     1
## --------------------------------------------------------------------------------
## smoking_history 
##        n  missing distinct 
##      210        0        3 
##                                                           
## Value       current_smoker    never_smoked previous_smoker
## Frequency               34             100              76
## Proportion           0.162           0.476           0.362
## --------------------------------------------------------------------------------
## age_when_quit_smoking 
##        n  missing distinct 
##      188       22       36 
## 
## lowest : 22             24             25             26             29            
## highest: 60             64             71             73             not_applicable
## --------------------------------------------------------------------------------
## pack_years 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      143       67       23    0.658    9.909    16.58        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        8       34       45 
##                                                                             
## Value        0.0   4.2   6.3   8.4  14.7  16.8  18.9  23.1  29.4  33.6  35.7
## Frequency    101     4     2     3     2     1     6     2     7     2     1
## Proportion 0.706 0.028 0.014 0.021 0.014 0.007 0.042 0.014 0.049 0.014 0.007
##                                                           
## Value       37.8  39.9  44.1  48.3  63.0  73.5 113.4 210.0
## Frequency      1     2     3     2     1     1     1     1
## Proportion 0.007 0.014 0.021 0.014 0.007 0.007 0.007 0.007
## 
## For the frequency table, variable is rounded to the nearest 2.1
## --------------------------------------------------------------------------------
## chewing_tobacco_use 
##        n  missing distinct 
##      210        0        2 
##                                                             
## Value      never_or_not_in_last_3mo         quit_in_last_3mo
## Frequency                       209                        1
## Proportion                    0.995                    0.005
## --------------------------------------------------------------------------------
## alcohol_use 
##        n  missing distinct 
##      210        0        3 
##                                                             
## Value           more_than_two_daily never_or_not_in_last_3mo
## Frequency                        13                       95
## Proportion                    0.062                    0.452
##                                    
## Value             two_or_less_daily
## Frequency                       102
## Proportion                    0.486
## --------------------------------------------------------------------------------
## intraoperative_complications.blood_transfusion 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    207     3
## Proportion 0.986 0.014
## --------------------------------------------------------------------------------
## intraoperative_complications.injury_to_surrounding_organ 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    207     3
## Proportion 0.986 0.014
## --------------------------------------------------------------------------------
## intraoperative_complications.cardiac_event 
##        n  missing distinct    value 
##      210        0        1    false 
##                 
## Value      false
## Frequency    210
## Proportion     1
## --------------------------------------------------------------------------------
## hospitalization 
##        n  missing distinct 
##      210        0       15 
## 
## lowest : 0                     1                     10                    11                    2                    
## highest: 6                     7                     8                     9                     died_before_discharge
## --------------------------------------------------------------------------------
## ischemia_time 
##        n  missing distinct 
##      200       10       39 
## 
## lowest : 0              10             11             12             13            
## highest: 5              50             54             8              not_applicable
## --------------------------------------------------------------------------------
## radiographic_size 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0       84    0.999    4.772    3.211    1.678    2.000 
##      .25      .50      .75      .90      .95 
##    2.500    4.000    5.600    9.611   11.455 
## 
## lowest : 1.2  1.4  1.5  1.6  1.66, highest: 13.2 13.9 14.1 15   16.2
## --------------------------------------------------------------------------------
## pathologic_size 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0       77    0.999    4.716    3.217    1.500    1.990 
##      .25      .50      .75      .90      .95 
##    2.400    3.900    5.975    9.200   11.065 
## 
## lowest : 0.8  1    1.2  1.3  1.4 , highest: 13   13.6 15   16   22.5
## --------------------------------------------------------------------------------
## malignant 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency     18   192
## Proportion 0.086 0.914
## --------------------------------------------------------------------------------
## pathology_t_stage 
##        n  missing distinct 
##      210        0        7 
##                                                     
## Value          0    1a    1b    2a    2b     3     4
## Frequency     18    92    41     9     3    44     3
## Proportion 0.086 0.438 0.195 0.043 0.014 0.210 0.014
## --------------------------------------------------------------------------------
## pathology_n_stage 
##        n  missing distinct 
##      210        0        3 
##                             
## Value          0     1     X
## Frequency     90     6   114
## Proportion 0.429 0.029 0.543
## --------------------------------------------------------------------------------
## pathology_m_stage 
##        n  missing distinct 
##      210        0        3 
##                             
## Value          0     1     X
## Frequency     93    17   100
## Proportion 0.443 0.081 0.476
## --------------------------------------------------------------------------------
## tumor_histologic_subtype 
##        n  missing distinct 
##      210        0       12 
## 
## lowest : angiomyolipoma           chromophobe              clear_cell_papillary_rcc clear_cell_rcc           mest                    
## highest: papillary                rcc_unclassified         spindle_cell_neoplasm    urothelial               wilms                   
## --------------------------------------------------------------------------------
## tumor_necrosis 
##        n  missing distinct 
##      187       23        2 
##                       
## Value      false  true
## Frequency    144    43
## Proportion  0.77  0.23
## --------------------------------------------------------------------------------
## tumor_isup_grade 
##        n  missing distinct     Info     Mean      Gmd 
##      172       38        4    0.826    2.238   0.8374 
##                                   
## Value       1.00  1.99  2.98  4.00
## Frequency     26    93    39    14
## Proportion 0.151 0.541 0.227 0.081
## 
## For the frequency table, variable is rounded to the nearest 0.03
## --------------------------------------------------------------------------------
## clavien_surgical_complications 
##        n  missing distinct 
##      209        1        7 
##                                                     
## Value          0     1     2    3a    3b     4     5
## Frequency    149    27    15     6     7     4     1
## Proportion 0.713 0.129 0.072 0.029 0.033 0.019 0.005
## --------------------------------------------------------------------------------
## er_visit 
##        n  missing distinct 
##      208        2        2 
##                       
## Value      false  true
## Frequency    191    17
## Proportion 0.918 0.082
## --------------------------------------------------------------------------------
## readmission 
##        n  missing distinct 
##      208        2        2 
##                       
## Value      false  true
## Frequency    187    21
## Proportion 0.899 0.101
## --------------------------------------------------------------------------------
## estimated_blood_loss 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      209        1       43    0.994      314    339.9       25       50 
##      .25      .50      .75      .90      .95 
##       75      200      400      800     1000 
## 
## lowest :    0    5   10   20   25, highest: 1200 1300 1500 1800 2000
## --------------------------------------------------------------------------------
## surgery_type 
##        n  missing distinct 
##      210        0        3 
##                                                  
## Value      laparoscopic         open      robotic
## Frequency            28           60          122
## Proportion        0.133        0.286        0.581
## --------------------------------------------------------------------------------
## surgical_procedure 
##        n  missing distinct 
##      210        0        2 
##                                                   
## Value      partial_nephrectomy radical_nephrectomy
## Frequency                  140                  70
## Proportion               0.667               0.333
## --------------------------------------------------------------------------------
## surgical_approach 
##        n  missing distinct 
##      210        0        2 
##                                           
## Value      Retroperitoneal Transperitoneal
## Frequency               39             171
## Proportion           0.186           0.814
## --------------------------------------------------------------------------------
## operative_time 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      208        2      151        1      243    106.6    120.4    139.4 
##      .25      .50      .75      .90      .95 
##    174.8    229.0    283.2    372.3    437.8 
## 
## lowest :  44  47  64  91  97, highest: 530 553 586 592 613
## --------------------------------------------------------------------------------
## cytoreductive 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    193    17
## Proportion 0.919 0.081
## --------------------------------------------------------------------------------
## positive_resection_margins 
##        n  missing distinct 
##      210        0        2 
##                       
## Value      false  true
## Frequency    199    11
## Proportion 0.948 0.052
## --------------------------------------------------------------------------------
## last_preop_egfr.value 
##        n  missing distinct 
##      153       57       49 
## 
## lowest : >=90 17   38   41   42  , highest: 86   87   88   89   90  
## --------------------------------------------------------------------------------
## last_preop_egfr.days_before_nephrectomy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      153       57       54    0.998    23.39    23.93      1.0      2.0 
##      .25      .50      .75      .90      .95 
##      6.0     15.0     36.0     58.2     69.0 
## 
## lowest :  1  2  3  4  5, highest: 75 76 77 83 90
## --------------------------------------------------------------------------------
## first_postop_egfr.value 
##        n  missing distinct 
##      157       53       59 
## 
## lowest : >=90   25     27     28     29    , highest: 88     89     9      90     age<16
## --------------------------------------------------------------------------------
## first_postop_egfr.days_before_nephrectomy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      157       53      112        1    437.6    526.5     78.0     82.2 
##      .25      .50      .75      .90      .95 
##    103.0    154.0    500.0   1162.2   2078.4 
## 
## lowest :   76   77   78   79   80, highest: 2317 2380 2393 2473 2503
## --------------------------------------------------------------------------------
## last_postop_egfr.value 
##        n  missing distinct 
##      157       53       60 
## 
## lowest : >=90   27     28     30     32    , highest: 88     89     9      90     age<16
## --------------------------------------------------------------------------------
## last_postop_egfr.days_before_nephrectomy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      157       53      147        1     1048    818.1    149.8    197.6 
##      .25      .50      .75      .90      .95 
##    442.0    932.0   1487.0   2256.8   2426.0 
## 
## lowest :   76   77   93  100  114, highest: 2499 2503 2584 2604 3070
## --------------------------------------------------------------------------------
## vital_status 
##        n  missing distinct 
##      210        0        2 
##                             
## Value      censored     dead
## Frequency       189       21
## Proportion      0.9      0.1
## --------------------------------------------------------------------------------
## vital_days_after_surgery 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0      185        1    956.8    875.2    15.45    23.80 
##      .25      .50      .75      .90      .95 
##   271.25   790.50  1427.50  2288.10  2431.60 
## 
## lowest :    0    2    8    9   11, highest: 2539 2607 2611 2900 3070
## --------------------------------------------------------------------------------
## voxel_spacing.x_spacing 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0      125    0.999   0.7966   0.1216   0.6565   0.6832 
##      .25      .50      .75      .90      .95 
##   0.7183   0.7812   0.8682   0.9631   0.9766 
## 
## lowest : 0.4375   0.451172 0.518    0.546875 0.550781
## highest: 0.970703 0.976562 0.976562 0.984    1.041   
## --------------------------------------------------------------------------------
## voxel_spacing.y_spacing 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0      125    0.999   0.7966   0.1216   0.6565   0.6832 
##      .25      .50      .75      .90      .95 
##   0.7183   0.7812   0.8682   0.9631   0.9766 
## 
## lowest : 0.4375   0.451172 0.518    0.546875 0.550781
## highest: 0.970703 0.976562 0.976562 0.984    1.041   
## --------------------------------------------------------------------------------
## voxel_spacing.z_spacing 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      210        0       10     0.92    3.162     1.93     0.50     0.50 
##      .25      .50      .75      .90      .95 
##     1.25     3.00     5.00     5.00     5.00 
##                                                                       
## Value      0.500 0.995 1.220 1.490 1.985 2.480 2.975 3.740 3.965 5.000
## Frequency     31    18     5     2     4    23    38     1     2    86
## Proportion 0.148 0.086 0.024 0.010 0.019 0.110 0.181 0.005 0.010 0.410
## 
## For the frequency table, variable is rounded to the nearest 0.045
## --------------------------------------------------------------------------------

Kidney_DataDescr <- psych::describe(Kidney_dataset)

library(SmartEDA)
Kidney_DataDescr <- ExpData(Kidney_dataset,  type=2,  fun=c("mean", "median", "var")) %>% datatable()

2.2 Suppress boolean comorbidity columns

At the initial stages, it may be useful to ignore the comorbidity indicator variables (too many Boolean characteristics of comorbid conditions). A deeper analysis may need to explore these interdependencies.

library(dplyr)
Kidney_dataset <- select(Kidney_dataset,  -starts_with("comorbidities."))
Kidney_DataDescr <- ExpData(Kidney_dataset,  type=2,  fun=c("mean", "median", "var")) %>% datatable()

3 EDA

Next, we perform some simple exploratory data analytics (EDA). These graphs are just examples of univariate and multivariate relations.

3.1 Distribution of `radiographic_size` by `surgery_type`

# pl <- ggplot(Kidney_dataset,  aes(x=surgery_type,  y=radiographic_size)) +  
#   geom_violin() + ggtitle("Distribution of radiographic_size by surgery_type") + 
#   labs(x = "Surgery Type",  y = "Tumor Size") + theme(legend.position = "bottom") # + 
#   # geom_text(stat = 'count',  aes(label = ..count..),  vjust = 1)
# ggplotly(pl)

plot_ly(Kidney_dataset,  x = ~surgery_type,  y = ~radiographic_size,  split = ~surgery_type, 
        type = 'violin',  box = list(visible = T),  meanline = list(visible = T)) %>%
  layout(title="Distribution of radiographic_size by surgery_type",  
         xaxis=list(title="Surgery Type"),  yaxis=list(title="Tumor Size"), 
         legend = list(title="Surgery Type",  orientation = 'h'))

3.2 Distribution of `vital_days_after_surgery` by `surgical_approach` and `gender`

plot_ly(Kidney_dataset,  type = 'violin') %>%
  add_trace(x = ~surgical_approach[Kidney_dataset$gender == 'female'],  
            y = ~vital_days_after_surgery[Kidney_dataset$gender == 'female'],  
            legendgroup = 'Yes',  scalegroup = 'Yes',  name = 'Female', 
            side='negative',  box=list(visible=T),  meanline=list(visible=T), 
            color = I("blue"),  points = 'all',   jitter = 0.5,  pointpos = -1) %>%
  add_trace(x = ~surgical_approach[Kidney_dataset$gender == 'male'],  
            y = ~vital_days_after_surgery[Kidney_dataset$gender == 'male'],  
            legendgroup = 'Yes',  scalegroup = 'Yes',  name = 'Male', 
            side='positive',  box=list(visible=T),  meanline=list(visible=T), 
            color = I("yellow"),  points = 'all',   jitter = 0.5,  pointpos = +1) %>%
  layout(title="Distribution of vital_days_after_surgery by surgical_approach and `ender",  
         xaxis=list(title="Surgical Approach"),  yaxis=list(title="Vital days after surgery"), 
         legend = list(title="Surgical Approach",  orientation = 'h')) #,

         #violingap = 1,  violingroupgap = 1,  violinmode = 'overlay')

3.3 Joint 2D+ Distribution of `vital_days_after_surgery` and `body_mass_index`

2D, 3D, and higher dimensional distributions provide more more holistic depiction of the underlying phenomena, see pub 1 and pub 2.

Section 3.3 of the Spacekime/TCIU book shows the use of copula functions to compute the joint distirbution from given marginal densities.

If the marginal components $\{X_i\}$ are independent, computing the joint multivariate distribution involves simply the multiplication of the individual marginal probability distributions, i.e., \[f_{{\bf{X}}}({\bf{x}})=\prod_{i=1}^n{f_{X_i}(x_i)}\ .\]

When there are inter-dependencies between variables, computing the joint probability distribution $f_{{\bf{X}}}({\bf{x}})$ from the corresponding marginal univariate distribution models, $f_{X_i}(x_i)$, requires linking the marginal distributions via connecting copula function models. For a pair of density functions, $f_X (x)$ and $f_Y (y)$, and their corresponding cumulative probability distribution functions, $F_X (x)$ and $F_Y (y)$, the joint bivariate cumulative distribution function model, $F_{XY} (x,y)$ can be expressed via a distribution copula $C(∙,∙)$ function. Similarly, the bivariate copula density function model, $f_{XY} (x,y)$ can be explicated in terms of some appropriate probability density copula function, $c(∙,∙)$.

\[F_{XY} (x,y)=F_X (x)\times F_Y (y)\times \underbrace{C(F_X (x),F_Y (y))}_{distribution\ copula},\]
\[f_{XY} (x,y)=f_X (x)\times f_Y (y)\times \underbrace{c(f_X (x),f_Y (y))}_{density\ copula}.\]

Here is an example illustrating the joint bivariate distribution of vital_days_after_surgery and body_mass_index.

library(extraDistr)
library(pracma)

# marginal domains, # summary(Kidney_dataset$vital_days_after_surgery)
len <- 300
t <- 1:len
x_dom <- seq(0, 3070, length=len) # range of vital_days_after_surgery
y_dom <- seq(16, 50, length=len)  # range of body_mass_index

rho <- 0.3  # initial dependence (correlation), -1 < rho = Corr(t,phi) < 1

# Base function definitions
# f(t), Kidney_dataset$vital_days_after_surgery
pdf_x <- density(Kidney_dataset$vital_days_after_surgery, bw = "sj")
f <- approxfun(pdf_x, rule=1:2)  # density as an abstract function
fMax <- max(abs(pdf_x$y))
F1 <- function (x) {     # CDF as an abstract function
  # message(paste('input to F1 =',x))
  if (x < 1) {  val = 0   } 
  else if (x > len) {  val = 1  } 
  else {  val=integral(f, xmin=x_dom[1]-200, xmax=x_dom[x], 
                       reltol=1e-12, method="Kronrod")
  }
  # val=integral(f, xmin=x_dom[1]-200, xmax=x, reltol=1e-12, method="Kronrod")
  return(val)
}  # F1(100)

# g(phi), Kidney_dataset$body_mass_index
pdf_y <- density(Kidney_dataset$body_mass_index)
g <- approxfun(pdf_y, rule=1:2)  # density as an abstract function
gMax <- max(abs(pdf_y$y))
G1 <- function (y) {
  if (y < 1) { val = 0 }
  else if (y > len) { val = 1 } 
  else { val = integral(g, y_dom[1], y_dom[y], method = "Kronrod") }
  return(val)
}  # G1(100)

# Gauss-copula
# C_{XY}^{Gauss} (x,y)=\frac{1}{\sqrt{1-ρ^2}} \times \exp \left (-\frac{ρ^2 (a^2+b^2 )-2abρ)}{2(1-ρ^2 )} \right ),
# where the pairwise correlation $-1 < \rho < 1$ and
# a(x)= \sqrt{2}\times erf^{-1} (2\times F_X (x)-1),
# b(y)= \sqrt{2}\times erf^{-1} (2\times F_Y (y)-1),
# and the error function is
# $$erf^{-1}(s)= \frac{2}{\sqrt{\pi}}\int_{0}^{\infty}{e^{-t^2}dt}.$$  
CGauss <- function (t=len/2, s=len/2) {
  if (t < 1) { A= -1e+9 }
  else if (t >= len) { A= 1e+9 } 
  else { A = sqrt(2) * erfinv(2*(F1(t))-1) }
  
  if (s < 1) { B= -1e+9 }
  else if (s > len) { B= 1e+9 } 
  else { B = sqrt(2) * erfinv(2*(G1(s))-1) }
  
  # message(paste('A=', A, 'B=', B))
  val = (1/(sqrt(1-rho^2))) * 
             exp(-(rho^2*(A^2+B^2) - 2*A*B*rho)/(2*(1-rho^2)))
  if (is.finite(val)) return (val)   
  else return (0)  
}  # CGauss(1, 17)

# F(t)
fJoint <- function (t=len/2, s=len/2, rho = rho) {
  return ( f(x_dom[t]) * g(y_dom[s]) * CGauss(t, s) )
}  # fJoint(1,18)

### Part 1:  Classical Time Dynamics: $\phi=0$.
# plot(s, f(s)/fMax)
# plot(s, g(s)/gMax)
# # plot(s, 2*g(rep(0, length(s)))/gMax)
# # plot(s, log(F(t=s, phi=0, rho=0.1)))
# plot(s, log(F(t=s, phi=rep(0, length(s)), rho=0.1)))
# plot(s, CGauss(s))

library(plotly)
y1 <- 1000*f(x_dom)   # f: first marginal
y2 <- 10* g(y_dom)   # g: second marginal
y3 <- c()        # Gaussian Copula (. , s=sMax/2)
for (i in 1:len) y3[i]=CGauss(i, 55)   # or y_dom[len/2]
y4 <- c()        # joint density f(t,t)
for (i in 1:len) y4[i] <- 10000*fJoint(t=i, s=55, rho)

plot_ly() %>%
  add_trace(x=~t, y=~y1, type='scatter', mode='markers+lines',
            name="f: first marginal") %>%
  add_trace(x=~t, y=~y2, type='scatter', mode='markers+lines',
            name="g: second marginal", opacity=1) %>%
  add_trace(x=~t, y=~y3, type='scatter', mode='markers+lines',
            name="C_Gauss(t,s): Gaussian Copula", opacity=0.5) %>%
  add_trace(x=~t, y=~y4, type='scatter',
            mode='markers+lines',
            name="f(t,s, rho)): Joint Density") %>%
  layout(title = "Copula-based Modeling of Joint Distribution", 
         xaxis = list(title = "time"), 
         yaxis = list(title = "Functional Value"), # showlegend = FALSE, 
         legend = list(orientation = 'h'),
         scene = list(aspectmode = "manual",
                      aspectratio = list(x=1, y=1)))

Next we can display the raw and smoothed versions of the joint distribution as flat 2D images.

#### Path Kynamics over curves \gamma:\mathbb{R}\to \mathbb{C}
### Part 2: Path Kynamics over curves \gamma:\mathbb{R}\to \mathbb{C}
### Kime tube gamma(k1,k2) on torus   ########## CARTESIAN
#### Parametric curve on torus
# define kime-surface
xsurf <- x_dom[t]    # longitudinal trajectory
ysurf <- y_dom[t]    # phase trajectory

zsurf <- mat<-matrix(0, nrow=len, ncol=len)      # joint density f(t,s)
for (i in 1:len) {
  for (j in 1:len) {
    zsurf[i,j] <- 100000*fJoint(t=i, s=j, rho)
  }
  if (i %% 30 ==0) message(paste0('Progress: ... ', round((100*i)/len, 0), '%'))
}
# image(zsurf)
plot_ly(z= zsurf, type="heatmap") # %>% hide_colorbar()

## Multi-core parallelization
# library(foreach)
# library(doParallel)
# nc = detectCores()
# cl = makeCluster(nc-2, typr = 'FORK')
# registerDoParallel(cl)
# zsurf1 <- mat<-matrix(0, nrow=len, ncol=len)
# zsurf1 <- foreach (i = 1:len, .combine = 'c') %dopar% {
#   for (j in 1:len) {
#     100000*fJoint(t=i, s=j, rho)
#   }
#   # if (i %% 30 ==0) message(paste0('Progress: ... ', round((100*i)/len, 0), '%'))
# }
# stopImplicitCluster()
# stopCluster(cl)
# image(zsurf)

# Clean zsurf, as %> table(is.na(zsurf)) # FALSE  TRUE # 9911    89 
# and table(is.infinite(zsurf))  # FALSE  TRUE  9920    80 
# replace missing values in zsurf by column (MARGIN=2) average
# zsurfSmooth <- sweep(zsurf, MARGIN = 2, 
#                   STATS = apply(zsurf, MARGIN = 2, median, na.rm=TRUE),
#                   FUN =  function(x,s) ifelse(is.na(x), s, x))
# zsurfSmooth2 <- sweep(zsurfSmooth, MARGIN = 2, 
#                   STATS = apply(zsurfSmooth, MARGIN = 2, median, na.rm=TRUE),
#                   FUN =  function(x,s) ifelse(is.infinite(x), s, x))
# image(zsurfSmooth2)

The next plot renders the joint distribution in the 2D Cartesian plane. The display shows the 2D density surface overlayed with a pair of canonical curves corresponding to a fixed vital_days_after_surgery, $t=100$ (orange) and a fixed time body_mass_index, $s=s(=22)$ (green).

# 3D Kime-surface Plot
f <- list(family = "Courier New, monospace", size = 18, color = "black")
xlab <- list(title = "post surgery vitality", titlefont = f)
ylab <- list(title = "BMI", titlefont = f)
zlab <- list(title = "Density surface", titlefont = f)

plot_ly(x=~xsurf, y=~ysurf, z=~t(zsurf), type="surface", opacity=0.95,
        showlegend=FALSE, name="Joint Density-Surface") %>%
    # add vital days after surgery projection curve, for a fixed BMI=22
    add_trace(x=~xsurf, y=ysurf[len/2], z=~zsurf[, len/2], type="scatter3d",
              mode="lines", line=list(width=20), name="BMI=22") %>%
    # add BMI projection curve vital days after surgery=1000
    add_trace(x=xsurf[100], y=~ysurf, z=~zsurf[100,], type="scatter3d",
              mode="lines", line=list(width=20), name="1000 days post surgery") %>%
    hide_colorbar() %>%
    layout(title = "Joint Bivariate Density (Gaussian copula, Rho=0.3)", 
           scene = list(aspectmode = "manual",
                    xaxis = xlab, yaxis = ylab, zaxis = zlab,
                    aspectratio = list(x=1, y=1, z=1))) # 1:1:1 aspect ratio

4 Analytics

Quantitative data analytics (QDA) represent are next in the natural order of extracting actionable information from complex data. Below are several illustrations of using a data-frame to store the clinical kidney data. Specifically, we will fit classical statistical models (e.g., GLM with binary or quantitative outcomes). Mode advanced analytical strategies employ modern ML/AI for clustering, classification, prediction, forecasting, or segmentation of the study participants. Clearly these demonstrations can be significantly expanded to reflect specific hypothesis-driven research inquiries, exploratory studies, confirmatory analytics, or other inferential approaches to extract actionable information of the observed kidney clinical information.

4.1 Logistic Regression - Predicting Patient Survival

The participant survival (censored or dead indicators) are stored in the factor variable Kidney_dataset$vital_status, which takes two possible values (censored=alive or dead).

After randomly splitting the data into two sets (training and testing), we fit a logistic (glm) model using the training data, assess the performance of the model on the independent testing dataset, compute the receiver operating characteristic (ROC) curve, and report a variety of performance metrics, including area under the ROC curve (AUC).

# Model 1 -- may need to split 80:20 the data into training set (for model fitting) and 
# testing set (for independent valuation)
Kidney_dataset$vital_status <- as.factor(Kidney_dataset$vital_status)
table(Kidney_dataset$vital_status)

## 
## censored     dead 
##      189       21

# 80% training + 20% testing
set.seed(1234)
subset_int <- sample(nrow(Kidney_dataset),  floor(nrow(Kidney_dataset)*0.8))  
Kidney_dataset_train <- Kidney_dataset[subset_int,  ]
Kidney_dataset_test  <- Kidney_dataset[-subset_int,  ] 

# Build logistic regression model (based on training set)
Log.Reg.Model <- glm(vital_status ~ age_at_nephrectomy+tumor_isup_grade+estimated_blood_loss+
                       operative_time+pathologic_size+body_mass_index,  
                     data = Kidney_dataset_train,  family = "binomial")

summary(Log.Reg.Model)

## 
## Call:
## glm(formula = vital_status ~ age_at_nephrectomy + tumor_isup_grade + 
##     estimated_blood_loss + operative_time + pathologic_size + 
##     body_mass_index, family = "binomial", data = Kidney_dataset_train)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)   
## (Intercept)          -5.925453   2.573797  -2.302  0.02132 * 
## age_at_nephrectomy    0.015919   0.023118   0.689  0.49108   
## tumor_isup_grade      1.154501   0.396964   2.908  0.00363 **
## estimated_blood_loss -0.001089   0.001182  -0.921  0.35680   
## operative_time        0.001753   0.003293   0.532  0.59453   
## pathologic_size       0.081326   0.081302   1.000  0.31717   
## body_mass_index      -0.019959   0.052095  -0.383  0.70163   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 98.771  on 136  degrees of freedom
## Residual deviance: 77.690  on 130  degrees of freedom
##   (31 observations deleted due to missingness)
## AIC: 91.69
## 
## Number of Fisher Scoring iterations: 6

# Nagelkerke r-squared
library(fmsb)
library(pscl)
NagelkerkeR2(Log.Reg.Model)

## $N
## [1] 137
## 
## $R2
## [1] 0.2776233

# Other pseudo r-squared
pR2(Log.Reg.Model)

## fitting null model for pseudo-r2

##         llh     llhNull          G2    McFadden        r2ML        r2CU 
## -38.8450735 -49.3853113  21.0804755   0.2134286   0.1426183   0.2776233

# Prediction (using testing data)
Log.Reg.Predict <- predict(Log.Reg.Model,  Kidney_dataset_test,  type = "response")

# Check Accuracy
library(caret)
# Confusion Matrix
confusionMatrix(data = factor(as.factor(Log.Reg.Predict <= 0.02),  labels=c("censored",  "dead")),  
                reference = Kidney_dataset_test$vital_status)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction censored dead
##   censored       27    2
##   dead            2    1
##                                           
##                Accuracy : 0.875           
##                  95% CI : (0.7101, 0.9649)
##     No Information Rate : 0.9062          
##     P-Value [Acc > NIR] : 0.8239          
##                                           
##                   Kappa : 0.2644          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.9310          
##             Specificity : 0.3333          
##          Pos Pred Value : 0.9310          
##          Neg Pred Value : 0.3333          
##              Prevalence : 0.9062          
##          Detection Rate : 0.8438          
##    Detection Prevalence : 0.9062          
##       Balanced Accuracy : 0.6322          
##                                           
##        'Positive' Class : censored        
##

# print(paste('Accuracy is ',  1 - MisClassificError))

# Area Under Curve
library(ROCR)

# replace NA predictions to 0 (trivial) probabilities
Log.Reg.Predict <- Log.Reg.Predict %>% replace(is.na(.),  0)

TestDataDependentVar <- Kidney_dataset_test$vital_status
pred <- ROCR::prediction(Log.Reg.Predict,  TestDataDependentVar)
perf <- ROCR::performance(prediction.obj = pred,  measure = "tpr",  x.measure = "fpr")
plot(perf,  col = "black",  lty = 2,  lwd = 2)
abline(a = 0,  b = 1,  col = "red")

# Calculating AUC
auc <- ROCR::performance(prediction.obj = pred,  measure = "auc")
auc <- as.numeric(auc@y.values)
minauc <- (round(auc,  digits = 2))
minauct <- paste(c("min (AUC) = "),  minauc,  sep = "")
legend(0.52,  0.5,  c(minauct,  "\n"),  cex = 1,  box.col = "Red")

5 Imaging data

Now we can bring in the 3D volumetric (MRI) data. We will import and display the 3D imaging data for just one participant. Note that these 3D volumes are rather large (approximate tensor dimensions are $602\times 512\times 512$). The imaging data includes the abdominal 3D high-resolution MRI volume and the corresponding kidney mask and kidney-tumor mask. The imaging data can be used to obtained imaging-derived morphometric measures of the masks. These 3D volumetric masks can capture the overall kidney morphology, anatomy, physiology and to some extend function, along with any expert-delineated kidney-tumors, nephrologist-identified kidney-stones, etc. In practice such derived morphometrics (e.g., mask diameter, volume, surface area, boundary shape, complexity, etc.) can be further associated with the remaining clinical data phenotypes. Such translational studies are commonly referred to as “phenomics studies”.

Let’s explore the raw kidney volume of the first participant by displaying some 2D cardinal-projection panes, specially chosen to cut the 3D abdominal MRI volume through the kidneys.

# load the raw kidney image for the first participant
library(oro.nifti)
Kidney_image_3D <- "https://umich.instructure.com/files/29648559/download?download_frd=1"
Kidney_image_3D_File <- file.path(tempdir(),  "Kidney_image_3D.nii.gz")
download.file(Kidney_image_3D,  dest=Kidney_image_3D_File,  mode = "wb",  quiet=F)
Kidney_image_3D_vol <- readNIfTI(Kidney_image_3D_File,  reorient=FALSE) # oro.nifti::readNIfTI()

# some simple 3D MRI volume descriptors
hist(Kidney_image_3D_vol)

str(Kidney_image_3D_vol)

## Formal class 'nifti' [package "oro.nifti"] with 46 slots
##   ..@ .Data         : num [1:602, 1:512, 1:512] -1024 -1024 -1024 -1024 -1024 ...
##   ..@ sizeof_hdr    : int 348
##   ..@ data_type     : chr ""
##   ..@ db_name       : chr ""
##   ..@ extents       : int 0
##   ..@ session_error : int 0
##   ..@ regular       : chr ""
##   ..@ dim_info      : chr ""
##   ..@ dim_          : int [1:8] 3 602 512 512 1 1 1 1
##   ..@ intent_p1     : num 0
##   ..@ intent_p2     : num 0
##   ..@ intent_p3     : num 0
##   ..@ intent_code   : int 0
##   ..@ datatype      : int 64
##   ..@ bitpix        : int 64
##   ..@ slice_start   : int 0
##   ..@ pixdim        : num [1:8] 1 0.5 0.799 0.799 1 ...
##   ..@ vox_offset    : num 352
##   ..@ scl_slope     : int 1
##   ..@ scl_inter     : int 0
##   ..@ slice_end     : int 0
##   ..@ slice_code    : int 0
##   ..@ xyzt_units    : int 0
##   ..@ cal_max       : num 1393
##   ..@ cal_min       : num -1024
##   ..@ slice_duration: num 0
##   ..@ toffset       : num 0
##   ..@ glmax         : num 1393
##   ..@ glmin         : num -1024
##   ..@ descrip       : chr ""
##   ..@ aux_file      : chr ""
##   ..@ qform_code    : int 0
##   ..@ sform_code    : int 2
##   ..@ quatern_b     : num -0.707
##   ..@ quatern_c     : num 0
##   ..@ quatern_d     : num 0.707
##   ..@ qoffset_x     : num 0
##   ..@ qoffset_y     : num 0
##   ..@ qoffset_z     : num 0
##   ..@ srow_x        : num [1:4] 0 0 -0.799 0
##   ..@ srow_y        : num [1:4] 0 -0.799 0 0
##   ..@ srow_z        : num [1:4] -0.5 0 0 0
##   ..@ intent_name   : chr ""
##   ..@ magic         : chr "n+1"
##   ..@ extender      : int [1:4] 0 0 0 0
##   ..@ reoriented    : logi FALSE
##   ..$ dim: int [1:3] 602 512 512

dim(Kidney_image_3D_vol)  # 602 512 512

## [1] 602 512 512

# Load the tumor mask volume
Kidney_mask_3D <- "https://umich.instructure.com/files/29642168/download?download_frd=1"
Kidney_mask_3D_File <- file.path(tempdir(),  "Kidney_mask_3D.nii.gz")
download.file(Kidney_mask_3D,  dest=Kidney_mask_3D_File,  mode = "wb",  quiet=F)
Kidney_mask_3D_vol <- readNIfTI(Kidney_mask_3D_File,  reorient=FALSE) 
dim(Kidney_mask_3D_vol)  # 602 512 512

## [1] 602 512 512

# vol-dimensions
xdim <- dim(Kidney_image_3D_vol)[1]  # Z
ydim <- dim(Kidney_image_3D_vol)[2]  # Y
zdim <- dim(Kidney_image_3D_vol)[3]  # X

# cross-hairs of 2D cardinal projection planes, mind the negative (reversed) step-sizes
x = 376 #as.integer(xdim/2)   # 602-380
y = 224 # as.integer(ydim/2)
z = 334 # as.integer(zdim/2)

library(oro.nifti)
orthographic(Kidney_image_3D_vol, Kidney_mask_3D_vol, zlim.y=c(0.5, 2.5),
             xyz=c(xdim-x,ydim-y,zdim-z), zlim=range(Kidney_image_3D_vol)*0.9)

# z slice 
# img <- (1200+Kidney_image_3D_vol[as.integer(dim(Kidney_image_3D_vol)[1]/2),,])/10
# mode(img) <- "integer"; str(img)
# plot_ly(z=~img, type="heatmap")
z2D_img <- (1200+Kidney_image_3D_vol[z,,])/10  # 380
mode(z2D_img) <- "integer"
z2D_mask <- Kidney_mask_3D_vol[z,,]
# z2D_mask <- (Kidney_mask_3D_vol[as.integer(dim(Kidney_mask_3D_vol)[1]/2),,])

# vertical flip of 2D arrays (matrices) to show images in standard anatomical space
# z2D_img <- z2D_img[c((dim(z2D_img)[2]):1),,drop = FALSE]
# z2D_mask <- z2D_mask[c((dim(z2D_mask)[2]):1),,drop = FALSE]

# Each mask (Kidney_mask_3D_vol, segmentation.nii.gz) has background (0), kidney-mask (1), and tumor-mask(2)
table(z2D_mask)

## z2D_mask
##      0      1      2 
## 253733   8189    222

#      0      1      2 
# 258312   3657    175 

# zslice <- ggplotly(ggplot() + geom_hline(yintercept = y,  color = "red") + 
#                      geom_vline(xintercept = x,  color = "red") + xlim(0,  xdim) + ylim(0,  ydim)) %>%
#   # add_contour(z = ~(255*z2D_mask),  name="Axial Mask") %>%
#   add_contour(z = ~z2D_img,  name="Axial Image",  opacity=0.3) %>% hide_colorbar()
# # x slice
# x2D_img <- t(Kidney_image_3D_vol[x,  ,  ])
# xslice <- ggplotly(ggplot() + geom_hline(yintercept = z,  color = "red") + 
#                      geom_vline(xintercept = y,  color = "red") + xlim(0,  ydim) + ylim(0,  zdim)) %>%
#   add_contour(z = ~x2D_img,  name="Sagittal Plane")  %>% hide_colorbar()
# # y slice
# y2D_img <- t(Kidney_image_3D_vol[,  y,  ])
# yslice <- ggplotly(ggplot() + geom_hline(yintercept = z,  color = "red") + 
#                      geom_vline(xintercept = x,  color = "red") + xlim(0, xdim) + ylim(0,  zdim)) %>%
#   add_contour(z = ~y2D_img,  name="Coronal Plane")  %>% hide_colorbar()
# # combine all 4 plots:
# subplot(zslice,  xslice,  yslice,  nrows = 3)

# Define crosshairs
hline <- function(y = 0, color = "red") { 
  list(type = "line", x0 = 0, x1 = 1, xref = "paper", y0 = y, y1 = y,
       line = list(color = color, dash="dot"))
}
vline <- function(x = 0, color = "red") { 
  list(type = "line", y0 = 0, y1 = 1, yref = "paper", x0 = x, x1 = x,                                              line = list(color = color, dash="dot"))
}

# Axial View
plot_ly() %>%
    add_trace(z=~z2D_img, type="contour", colorscale = "Greys", name="Axial Image", opacity=0.3) %>%
    add_trace(z=~z2D_mask, type="heatmap", colorscale = "Reds", name="Axial Mask") %>%
    # add_segments(x=x, y=0, xend=x, yend=dim(z2D_img)[1]) %>%
    # add_segments(x=0, y=y, xend=dim(z2D_img)[2], yend=y) %>%
    layout(title="Axial MRI Image and Kidney Mask", scene = list(aspectratio = list(x=1, y=1, z=1))) %>% 
    #, shapes=list(hline(198), vline(168))) 
    hide_colorbar()

# Sagittal View
z2D_img <- (1200+Kidney_image_3D_vol[,,x])/10  
mode(z2D_img) <- "integer"
z2D_mask <- Kidney_mask_3D_vol[,,x]
plot_ly() %>%
  add_trace(z=~z2D_img, type="contour", colorscale = "Greys", name="Sagittal Image", opacity=0.3) %>%
  # add_trace(z=~z2D_mask, type="heatmap", colorscale = "Reds", name="Sagittal Mask") %>%
  layout(title="Sagittal MRI Image and Kidney Mask") %>% hide_colorbar()

  # shapes = list(hline(242), vline(305))) %>% hide_colorbar()

# Coronal View
z2D_img <- (1200+Kidney_image_3D_vol[,y,])/10  
mode(z2D_img) <- "integer"
z2D_mask <- Kidney_mask_3D_vol[,y,]
plot_ly() %>%
  add_trace(z=~z2D_img, type="contour", colorscale = "Greys", name="Coronal Image", opacity=0.3) %>%
  add_trace(z=~z2D_mask, type="heatmap", colorscale = "Reds", name="Coronal Mask") %>%
  layout(title="Coronal MRI Image and Kidney Mask") %>% hide_colorbar()

5.1 Joint analysis of the clinical meta-data and the volumetric imaging data

Let’s illustrate some simple examples of deriving imaging morphometry measures from the 3D imaging volumes. In essence, we will just compute the kidney and tumor volumes. More advanced metrics, such as regional boundary surface measures (e.g., surface area, curvature, fractal dimension, shape index, etc.), may also be computed, see some of the LONI/SOCR team morphometric analysis publications. Once the imaging morphometrics are computed, they can be added to the clinical data spreadsheet and used for subsequent exploratory or quantitative (confirmatory/hypothesis-driven) analytics. Recall that the 3D imaging MASKS data are available in NII.gz format on GitHub e.g., mask case 00001.

# Morphometrics calculating function, an example is the volume of the kidney and tumor
morphometrix <- function (volume) {
  morpho_voxels <- table(volume) # Voxel counts: background, kidney-voxels, and tumor-voxels
  # Volume (mm^3): background, kidney-volume, and tumor volume
  pix_dim <- volume@pixdim[c(2:4)]    # extract the x, y, and z step-sizes
  morpho_volume <- morpho_voxels * prod(pix_dim)  # Volumes: background, kidney-volume, and tumor-volume
  return(morpho_volume)
}

# Test the morphometrix function
morphometrix(Kidney_mask_3D_vol)

## volume
##            0            1            2 
## 49869481.630   474833.982     7265.388

# Download the kidney and tumor mask volume for a single participant
maskDownloader <- function (path) {
  Kidney_mask_3D_File <- file.path(tempdir(),  "Kidney_mask_3D.nii.gz")
  download.file(path,  dest=Kidney_mask_3D_File,  mode = "wb",  quiet=F)
  return(readNIfTI(Kidney_mask_3D_File,  reorient=FALSE))
}

library(stringr)
files <- paste0("https://github.com/neheller/kits19/blob/master/data/case_", 
                str_pad(0:209, width = 5, side = "left", pad = 0), 
                "/segmentation.nii.gz?raw=true")

# https://github.com/neheller/kits19/blob/master/data/case_00209/segmentation.nii.gz?raw=true

# Run the morphometrix for all 210 study participants
for (i in 1:dim(Kidney_dataset)[1]) {  # 1:210 participants
  # Download the mask
  Kidney_mask_3D_vol <- maskDownloader(files[i])
    
  # Compute the kidney-volume and tumor volume (mm^3) and augment the clinical meta-data: Kidney_dataset
  morpho <- morphometrix(Kidney_mask_3D_vol)
  Kidney_dataset$KidneyVolume[i] <- (morpho[2] + morpho[3])  # kidney + tumor volume!
  Kidney_dataset$TumorVolume[i] <- morpho[3]  # just the (smaller) kidney-tumor volume
}

# Save augmented data
write.csv(Kidney_dataset, file="C:/Users/dinov/Desktop/kits21_kidney_dataset_NI_augmented.csv")

# Try some data analytics using the clinical data augmented with imaging-derived morphometric data ...

Try some simple model-based or model-free data analytics using the clinical data augmented with imaging-derived morphometric data.

Biomedical Physics with Applications to Disease (BPAD)

Kidney Imaging-Clinical Case Study

BPAD Team + SOCR

June 29, 2023

1 Overview

2 Meta-Data Import

2.1 Meta-Data Summaries

2.2 Suppress boolean comorbidity columns

3 EDA

3.1 Distribution of `radiographic_size` by `surgery_type`

3.2 Distribution of `vital_days_after_surgery` by `surgical_approach` and `gender`

3.3 Joint 2D+ Distribution of `vital_days_after_surgery` and `body_mass_index`

4 Analytics

4.1 Logistic Regression - Predicting Patient Survival

5 Imaging data

5.1 Joint analysis of the clinical meta-data and the volumetric imaging data

Biomedical Physics with Applications to Disease (BPAD)

Kidney Imaging-Clinical Case Study

BPAD Team + SOCR

June 29, 2023

1 Overview

2 Meta-Data Import

2.1 Meta-Data Summaries

2.2 Suppress boolean comorbidity columns

3 EDA

3.1 Distribution of radiographic_size by surgery_type

3.2 Distribution of vital_days_after_surgery by surgical_approach and gender

3.3 Joint 2D+ Distribution of vital_days_after_surgery and body_mass_index

4 Analytics

4.1 Logistic Regression - Predicting Patient Survival

5 Imaging data

5.1 Joint analysis of the clinical meta-data and the volumetric imaging data

3.1 Distribution of `radiographic_size` by `surgery_type`

3.2 Distribution of `vital_days_after_surgery` by `surgical_approach` and `gender`

3.3 Joint 2D+ Distribution of `vital_days_after_surgery` and `body_mass_index`