Homework 4

  • Homework 4.1: Let’s use the 10th Case-study, which includes data on youth health risk behavior and associations between victimization, substance use, and suicide attempt among youth in Chicago in 2013. The data elements include: record, age, sex, grade, race4, qn16, qn17, qn19, qn20, qn21, qn24, qn25, qn27, qn28, qn29, qn30, qn33, qn37, qn43, qn45, qn47, qn50, qn51, qn52, qn53, qn54, qn55, qn56, and qn57. Assume there are 4 latent variables (factors): demographics (Age, sex, race, grade), victimization (qn16 - qn25), suicide attempt (qn27-qn30), and substance use (qn33-qn57).

    Fit a structural equation model (SEM) and a generalized estimating equation (GEE) model to the data, interpret the models, and contrast the corresponding results. Some exemplary R code that may be helpful is included below.
    myData <- read.csv('https://umich.instructure.com/files/605266/download?download_frd=1')
    dim(myData); summary(myData); colnames(myData); head(myData)
    # Fit SEM
    model_sem <- ' # latent variable definitions - defining how the latent variables are “manifested by” a set of observed # (or manifest) variables, aka “indicators”
    # (1) Measurement Model
    Demo =~ age + sex + race4 + grade
    Victim =~ qn16 + qn17 + qn19 + qn20 + qn21 + qn24 + qn25
    Suicide =~ qn27 + qn28 + qn29 + qn30
    Substance =~ qn33 + qn37 + qn43 + qn45 + qn47 + qn50 + qn51 + qn52 + qn53 + qn54 + qn55 + qn56 + qn57
    # (2) Regressions Victim ~ Demo + Substance Suicide ~ Victim + Substance
    # (3) residual correlations qn16 ~~ qn25
    qn27 ~~ qn28 + qn29 + qn30
    qn33 ~~ qn37 + qn57
    '
    fit_sem <- sem(model_sem, data=myData, missing = "ML") #
    summary(fit_sem)

    # Fit GEE model
    # library("geepack");
    attach(myData);
    fit_gee <- geeglm(grade ~ age + sex + race4 + qn16 + qn25 + qn28 + qn29 + qn55 + qn56 + qn57, data=na.omit(myData), family=gaussian("identity"), id = record, corstr = "exchangeable", scale.fix = TRUE)
    # The column labeled Wald in the summary table is the square of the z-statistic.
    # The reported p-values are the upper tailed probabilities from a chisq1 distribution
    # and test whether the true parameter value ≠0.
    summary(fit_gee); anova(fit_gee); extractAIC(fit_gee)
    fitMeasures(fit_sem, c("cfi", "rmsea", "srmr"))
    # plot SEM model # library(semPlot)
    semPaths(fit_sem)


  • Homework 4.2: The Longitudinal Study of American Youth (LSAY) study followed a national probability sample of 7th and 10th grade public school students for the last 25 years and is the largest and most comprehensive data set available to examine the factors that contribute to student and young adult interest in and understanding of science and technology. The LSAY is distinctive in its measurement of home, school, peer, and community variables over more than two decades, making it ideal for analyses that seek to understand the interaction between and among these factors.

    Fit in a linear growth model to the LSAY dataset (CSV format of the data is available here). Report and interpret several latent growth curve models, inspecting the corresponding fit measures -- Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA).
    Which of these 3 models (model1, model2, model3) would we choose as most appropriate?

    model1 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1*math.0 + 1*math.1 + 1*math.2
    s =~ 0*math.0 + 1*math.1 + 2*math.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    math.0 ~ att.0
    math.1 ~ att.1
    math.2 ~ att.2
    '


    model2 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1* att.0 + 1* att.1 + 1* att.2
    s =~ 0* att.0 + 1* att.1 + 2* att.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    att.0 ~ math.0
    att.1 ~ math.1
    att.2 ~ math.2
    '


    model3 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
    s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    att.0 ~ math.0
    att.1 ~ math.1
    att.2 ~ math.2
    '


    Example R code that may be helpful is included below.
    # A clean version of this data is stored as binary SAS file in long-format
    myData <- read.dta('http://www.ats.ucla.edu/stat/data/lsay_long_clean.dta')
    dim(myData); summary(myData); colnames(myData); head(myData)
    # library("foreign")
    # if you want to transform the data into a wide format
    # Long to wide format
    myData_wide <- reshape(myData, timevar = "yr", # the variable in long format that differentiates multiple
    # records from the same group or individual.
    idvar = c("cohort", "school", "weight", "id"), # Names of one or more variables
    # in long format that identify multiple records from the same group/individual.
    # These variables may also be present in wide format.
    direction = "wide"
    )
    dim(myData_wide); summary(myData_wide); colnames(myData_wide); head(myData_wide)
    # Wide variables: id cohort school weight math.0 att.0 math.1 att.1 math.2 att.2
    # you can use the lavaan’s package *growth* function
    model1 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1*math.0 + 1*math.1 + 1*math.2
    s =~ 0*math.0 + 1*math.1 + 2*math.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    math.0 ~ att.0
    math.1 ~ att.1
    math.2 ~ att.2
    '
    fit1 <- growth(model1, data=myData_wide)
    summary(fit1); fitMeasures(fit1, c("cfi", "rmsea", "srmr"))
    parameterEstimates(fit1)
    library(semPlot)
    semPaths(fit1, "std", edge.label.cex = 1.0, label.prop = 1.0, curve = 1.5, curvePivot = TRUE, mar = c(3, 5, 5, 5), nCharNodes=5, nCharEdges=5, groups = "latents", edge.width=2.0)
    model2 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1* att.0 + 1* att.1 + 1* att.2
    s =~ 0* att.0 + 1* att.1 + 2* att.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    att.0 ~ math.0
    att.1 ~ math.1
    att.2 ~ math.2
    '
    fit2 <- growth(model2, data=myData_wide)
    summary(fit2); fitMeasures(fit2, c("cfi", "rmsea", "srmr"))
    parameterEstimates(fit2)
    model3 <- '
    # Latent variables intercept and slope with fixed coefficients
    i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
    s =~ 0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
    # regressions
    i ~ cohort + school + weight
    s ~ cohort + school + weight
    # time-varying covariates
    att.0 ~ math.0
    att.1 ~ math.1
    att.2 ~ math.2
    '
    fit3 <- growth(model3, data=myData_wide)
    summary(fit3); fitMeasures(fit3, c("cfi", "rmsea", "srmr"))
    parameterEstimates(fit3)

SOCR Resource Visitor number Dinov Email