Homework 4
- Homework 4.1: Let’s use the
10th Case-study,
which includes
data on youth health risk behavior and associations
between victimization, substance use, and suicide attempt among
youth in Chicago in 2013. The data elements include: record, age,
sex, grade, race4, qn16, qn17, qn19, qn20, qn21, qn24, qn25, qn27,
qn28, qn29, qn30, qn33, qn37, qn43, qn45, qn47, qn50, qn51, qn52,
qn53, qn54, qn55, qn56, and qn57. Assume there are 4 latent
variables (factors): demographics (Age, sex, race, grade),
victimization (qn16 - qn25), suicide attempt (qn27-qn30), and
substance use (qn33-qn57).
Fit a
structural equation model (SEM)
and a generalized estimating equation (GEE) model to the data,
interpret the models, and contrast the corresponding results. Some
exemplary R code that may be helpful is included below.
myData <- read.csv('https://umich.instructure.com/files/605266/download?download_frd=1')
dim(myData); summary(myData); colnames(myData); head(myData)
# Fit SEM
model_sem <- ' # latent variable definitions - defining how
the latent variables are “manifested by” a set of observed # (or
manifest) variables, aka “indicators”
# (1) Measurement Model
Demo =~ age + sex + race4 + grade
Victim =~ qn16 + qn17 + qn19 + qn20 +
qn21 + qn24 + qn25
Suicide =~ qn27 + qn28 + qn29 + qn30
Substance =~ qn33 + qn37 + qn43 + qn45 + qn47 + qn50 + qn51 + qn52 + qn53 +
qn54 + qn55 + qn56 + qn57
# (2) Regressions Victim ~ Demo +
Substance Suicide ~ Victim + Substance
# (3) residual correlations
qn16 ~~ qn25
qn27 ~~ qn28 + qn29 + qn30
qn33 ~~ qn37 + qn57
'
fit_sem <- sem(model_sem, data=myData, missing = "ML") #
summary(fit_sem)
# Fit GEE model
# library("geepack");
attach(myData);
fit_gee <- geeglm(grade ~ age + sex + race4 + qn16 + qn25 + qn28 + qn29 +
qn55 + qn56 + qn57, data=na.omit(myData),
family=gaussian("identity"), id = record, corstr = "exchangeable",
scale.fix = TRUE)
# The column labeled Wald in the summary table is the square of the z-statistic.
# The reported p-values are the upper tailed probabilities from a chisq1 distribution
# and test whether the true parameter value ≠0.
summary(fit_gee); anova(fit_gee); extractAIC(fit_gee)
fitMeasures(fit_sem, c("cfi", "rmsea", "srmr"))
# plot SEM model # library(semPlot)
semPaths(fit_sem)
- Homework 4.2: The
Longitudinal Study of American Youth (LSAY) study followed a national probability
sample of 7th and 10th grade public school students for the last 25 years and
is the largest and most comprehensive data set available to examine the factors that contribute
to student and young adult interest in and understanding of science and technology.
The LSAY is distinctive in its measurement of home, school, peer, and community variables
over more than two decades, making it ideal for analyses that seek to understand the interaction
between and among these factors.
Fit in a linear growth model to the LSAY dataset
(CSV format of the data is
available here).
Report and interpret several latent growth curve models, inspecting the corresponding fit measures
-- Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA).
Which of these 3 models (model1, model2, model3) would we choose as most appropriate?
model1 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1*math.0 + 1*math.1 + 1*math.2
s =~ 0*math.0 + 1*math.1 + 2*math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
math.0 ~ att.0
math.1 ~ att.1
math.2 ~ att.2
'
model2 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2
s =~ 0* att.0 + 1* att.1 + 2* att.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
model3 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
Example R code that may be helpful is included below.
# A clean version of this data is stored as CSV file in the long-format
myData <- read.csv('https://umich.instructure.com/files/611677/download?download_frd=1')
dim(myData); summary(myData); colnames(myData); head(myData)
# library("foreign")
# if you want to transform the data into a wide format
# Long to wide format
myData_wide <- reshape(myData,
timevar = "yr", # the variable in long format that differentiates multiple
# records from the same group or individual.
idvar = c("cohort", "school", "weight", "id"), # Names of one or more variables
# in long format that identify multiple records from the same group/individual.
# These variables may also be present in wide format.
direction = "wide"
)
dim(myData_wide); summary(myData_wide); colnames(myData_wide); head(myData_wide)
# Wide variables: id cohort school weight math.0 att.0 math.1 att.1 math.2 att.2
# you can use the lavaan’s package *growth* function
model1 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1*math.0 + 1*math.1 + 1*math.2
s =~ 0*math.0 + 1*math.1 + 2*math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
math.0 ~ att.0
math.1 ~ att.1
math.2 ~ att.2
'
fit1 <- growth(model1, data=myData_wide)
summary(fit1); fitMeasures(fit1, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit1)
library(semPlot)
semPaths(fit1, "std", edge.label.cex = 1.0, label.prop = 1.0, curve = 1.5, curvePivot = TRUE, mar = c(3, 5, 5, 5), nCharNodes=5,
nCharEdges=5, groups = "latents", edge.width=2.0)
model2 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2
s =~ 0* att.0 + 1* att.1 + 2* att.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
fit2 <- growth(model2, data=myData_wide)
summary(fit2); fitMeasures(fit2, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit2)
model3 <- '
# Latent variables intercept and slope with fixed coefficients
i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
# regressions
i ~ cohort + school + weight
s ~ cohort + school + weight
# time-varying covariates
att.0 ~ math.0
att.1 ~ math.1
att.2 ~ math.2
'
fit3 <- growth(model3, data=myData_wide)
summary(fit3); fitMeasures(fit3, c("cfi", "rmsea", "srmr"))
parameterEstimates(fit3)
- Homework 4.2: Fit in ARIMA(p,q,d) models to 3 voxel locations ([X, Y, Z, ],
specifying the 3D spatial coordinates of the voxels of interest) and discuss the models
similarities and/or differences. Recall the
Chapter 2 notes (02_Review_QC_MLR_LME_Modeling.docx)
on the 4D fMRI hyper-volume.
require(brainR); require(RNifti) # See examples here:
https://cran.r-project.org/web/packages/oro.nifti/vignettes/nifti.pdf
# and here:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089470
fMRIURL <-
"http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
download.file(fMRIURL, dest=fMRIFile, quiet=FALSE) fMRIVolume <-
readNIfTI(fMRIFile, reorient=FALSE) # dimensions: 64 x 64 x 21 x
180 ; 4mm x 4mm x 6mm x 3 sec
fMRIVolDims <- dim(fMRIVolume);
fMRIVolDims ## [1] 64 64 21 180
time_dim <- fMRIVolDims[4];
time_dim ## [1] 180 # To examine the time course of a specific 3D
voxel (say the one at x=30, y=30, z=15):
plot(fMRIVolume[30, 30,
10,], type='l', main="Time Series of 3D Voxel \n (x=30, y=30,
z=15)", col="blue")
x1 <- c(1:180)
y1 <- loess(fMRIVolume[30, 30, 10,]~ x1, family = "gaussian")
lines(x1, smooth(fMRIVolume[30,30, 10,]), col = "red", lwd = 2)
lines(ksmooth(x1, fMRIVolume[30,
30, 10,], kernel = "normal", bandwidth = 5), col = "green", lwd =3)
SOCR Resource Visitor
number