Scientific Methods for Health Sciences: HS 853: Homework Projects

Fall 2017: HS 853

Scientific Methods for Health Sciences: Special Topics

Homework 4

Homework 4.1: Let’s use the 10^th Case-study, which includes data on youth health risk behavior and associations between victimization, substance use, and suicide attempt among youth in Chicago in 2013. The data elements include: record, age, sex, grade, race4, qn16, qn17, qn19, qn20, qn21, qn24, qn25, qn27, qn28, qn29, qn30, qn33, qn37, qn43, qn45, qn47, qn50, qn51, qn52, qn53, qn54, qn55, qn56, and qn57. Assume there are 4 latent variables (factors): demographics (Age, sex, race, grade), victimization (qn16 - qn25), suicide attempt (qn27-qn30), and substance use (qn33-qn57).

Fit a structural equation model (SEM) and a generalized estimating equation (GEE) model to the data, interpret the models, and contrast the corresponding results. Some exemplary R code that may be helpful is included below.
myData <- read.csv('https://umich.instructure.com/files/605266/download?download_frd=1')
dim(myData); summary(myData); colnames(myData); head(myData)
# Fit SEM
model_sem <- ' # latent variable definitions - defining how the latent variables are “manifested by” a set of observed # (or manifest) variables, aka “indicators”
# (1) Measurement Model
Demo =~ age + sex + race4 + grade
Victim =~ qn16 + qn17 + qn19 + qn20 + qn21 + qn24 + qn25
Suicide =~ qn27 + qn28 + qn29 + qn30
Substance =~ qn33 + qn37 + qn43 + qn45 + qn47 + qn50 + qn51 + qn52 + qn53 + qn54 + qn55 + qn56 + qn57
# (2) Regressions Victim ~ Demo + Substance Suicide ~ Victim + Substance
# (3) residual correlations qn16 ~~ qn25
qn27 ~~ qn28 + qn29 + qn30
qn33 ~~ qn37 + qn57
'
fit_sem <- sem(model_sem, data=myData, missing = "ML") #
summary(fit_sem)

# Fit GEE model
# library("geepack");
attach(myData);
fit_gee <- geeglm(grade ~ age + sex + race4 + qn16 + qn25 + qn28 + qn29 + qn55 + qn56 + qn57, data=na.omit(myData), family=gaussian("identity"), id = record, corstr = "exchangeable", scale.fix = TRUE)
# The column labeled Wald in the summary table is the square of the z-statistic.
# The reported p-values are the upper tailed probabilities from a chisq1 distribution
# and test whether the true parameter value ≠0.
summary(fit_gee); anova(fit_gee); extractAIC(fit_gee)
fitMeasures(fit_sem, c("cfi", "rmsea", "srmr"))
# plot SEM model # library(semPlot)
semPaths(fit_sem)
Homework 4.2: The Longitudinal Study of American Youth (LSAY) study followed a national probability sample of 7^th and 10^th grade public school students for the last 25 years and is the largest and most comprehensive data set available to examine the factors that contribute to student and young adult interest in and understanding of science and technology. The LSAY is distinctive in its measurement of home, school, peer, and community variables over more than two decades, making it ideal for analyses that seek to understand the interaction between and among these factors.

Fit in a linear growth model to the LSAY dataset (CSV format of the data is available here). Report and interpret several latent growth curve models, inspecting the corresponding fit measures -- Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA).
Which of these 3 models (model1, model2, model3) would we choose as most appropriate?

model1 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1*math.0 + 1*math.1 + 1*math.2
s =~ 0*math.0 + 1*math.1 + 2*math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
math.0 ~ att.0
math.1 ~ att.1
math.2 ~ att.2
'

model2 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2
s =~ 0* att.0 + 1* att.1 + 2* att.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'

model3 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'

Example R code that may be helpful is included below.
# A clean version of this data is stored as CSV file in the long-format
myData <- read.csv('https://umich.instructure.com/files/611677/download?download_frd=1')
dim(myData); summary(myData); colnames(myData); head(myData)
# library("foreign")
# if you want to transform the data into a wide format
# Long to wide format
myData_wide <- reshape(myData, timevar = "yr", # the variable in long format that differentiates multiple
# records from the same group or individual.
idvar = c("cohort", "school", "weight", "id"), # Names of one or more variables
# in long format that identify multiple records from the same group/individual.
# These variables may also be present in wide format.
direction = "wide"
)
dim(myData_wide); summary(myData_wide); colnames(myData_wide); head(myData_wide)
# Wide variables: id cohort school weight math.0 att.0 math.1 att.1 math.2 att.2
# you can use the lavaan’s package *growth* function
model1 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1*math.0 + 1*math.1 + 1*math.2
s =~ 0*math.0 + 1*math.1 + 2*math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
math.0 ~ att.0
math.1 ~ att.1
math.2 ~ att.2
'
fit1 <- growth(model1, data=myData_wide)
summary(fit1); fitMeasures(fit1, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit1)
library(semPlot)
semPaths(fit1, "std", edge.label.cex = 1.0, label.prop = 1.0, curve = 1.5, curvePivot = TRUE, mar = c(3, 5, 5, 5), nCharNodes=5, nCharEdges=5, groups = "latents", edge.width=2.0)
model2 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2
s =~ 0* att.0 + 1* att.1 + 2* att.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
fit2 <- growth(model2, data=myData_wide)
summary(fit2); fitMeasures(fit2, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit2)
model3 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
fit3 <- growth(model3, data=myData_wide)
summary(fit3); fitMeasures(fit3, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit3)
Homework 4.2: Fit in ARIMA(p,q,d) models to 3 voxel locations ([X, Y, Z, ], specifying the 3D spatial coordinates of the voxels of interest) and discuss the models similarities and/or differences. Recall the Chapter 2 notes (02_Review_QC_MLR_LME_Modeling.docx) on the 4D fMRI hyper-volume. require(brainR); require(RNifti) # See examples here: https://cran.r-project.org/web/packages/oro.nifti/vignettes/nifti.pdf
# and here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089470
fMRIURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
download.file(fMRIURL, dest=fMRIFile, quiet=FALSE) fMRIVolume <- readNIfTI(fMRIFile, reorient=FALSE) # dimensions: 64 x 64 x 21 x 180 ; 4mm x 4mm x 6mm x 3 sec
fMRIVolDims <- dim(fMRIVolume);
fMRIVolDims ## [1] 64 64 21 180
time_dim <- fMRIVolDims[4];
time_dim ## [1] 180 # To examine the time course of a specific 3D voxel (say the one at x=30, y=30, z=15):
plot(fMRIVolume[30, 30, 10,], type='l', main="Time Series of 3D Voxel \n (x=30, y=30, z=15)", col="blue")
x1 <- c(1:180)
y1 <- loess(fMRIVolume[30, 30, 10,]~ x1, family = "gaussian")
lines(x1, smooth(fMRIVolume[30,30, 10,]), col = "red", lwd = 2)
lines(ksmooth(x1, fMRIVolume[30, 30, 10,], kernel = "normal", bandwidth = 5), col = "green", lwd =3)

SOCR Resource Visitor number