Homework 4
				
				
				
				
					- Homework 4.1: Let’s use the 
						10th Case-study,
						which includes 
						data on youth health risk behavior and associations
						between victimization, substance use, and suicide attempt among
						youth in Chicago in 2013. The data elements include: record, age,
						sex, grade, race4, qn16, qn17, qn19, qn20, qn21, qn24, qn25, qn27,
						qn28, qn29, qn30, qn33, qn37, qn43, qn45, qn47, qn50, qn51, qn52,
						qn53, qn54, qn55, qn56, and qn57. Assume there are 4 latent
						variables (factors): demographics (Age, sex, race, grade),
						victimization (qn16 - qn25), suicide attempt (qn27-qn30), and
						substance use (qn33-qn57). 
						
						
 
 Fit a 
						structural equation model (SEM)
						and a generalized estimating equation (GEE) model to the data,
						interpret the models, and contrast the corresponding results. Some
						exemplary R code that may be helpful is included below. 
						
						
							myData <- read.csv('https://umich.instructure.com/files/605266/download?download_frd=1')
							dim(myData); summary(myData); colnames(myData); head(myData) 
							# Fit SEM 
							model_sem <- ' # latent variable definitions - defining how
							the latent variables are “manifested by” a set of observed # (or
							manifest) variables, aka “indicators” 
 # (1) Measurement Model 
 
							Demo =~ age + sex + race4 + grade 
 
							Victim =~ qn16 + qn17 + qn19 + qn20 +
							qn21 + qn24 + qn25 
 
							Suicide =~ qn27 + qn28 + qn29 + qn30 
 
							Substance =~ qn33 + qn37 + qn43 + qn45 + qn47 + qn50 + qn51 + qn52 + qn53 +
							qn54 + qn55 + qn56 + qn57 
 
							# (2) Regressions Victim ~ Demo +
							Substance Suicide ~ Victim + Substance 
 
							# (3) residual correlations
							qn16 ~~ qn25 
 
							qn27 ~~ qn28 + qn29 + qn30 
 
							qn33 ~~ qn37 + qn57 
 
							'
	
							fit_sem <- sem(model_sem, data=myData, missing = "ML") # 
							summary(fit_sem) 
							
							# Fit GEE model 
							#	library("geepack");
							attach(myData);
							fit_gee <- geeglm(grade ~ age + sex + race4 + qn16 + qn25 + qn28 + qn29 +
							qn55 + qn56 + qn57, data=na.omit(myData),
							family=gaussian("identity"), id = record, corstr = "exchangeable",
							scale.fix = TRUE) 
							# The column labeled Wald in the summary table is the square of the z-statistic. 
							# The reported p-values are the	upper tailed probabilities from a chisq1 distribution 
							# and test	whether the true parameter value ≠0. 
							summary(fit_gee); anova(fit_gee); extractAIC(fit_gee) 
							fitMeasures(fit_sem, c("cfi", "rmsea", "srmr")) 
							# plot SEM model # library(semPlot)
							semPaths(fit_sem)
						
						
 
 
					 
					
					- Homework 4.2: The 
						Longitudinal Study of American Youth (LSAY) study followed a national probability 
							sample of 7th and 10th grade public school students for the last 25 years and 
							is the largest and most comprehensive data set available to examine the factors that contribute 
							to student and young adult interest in and understanding of science and technology. 
							The LSAY is distinctive in its measurement of home, school, peer, and community variables 
							over more than two decades, making it ideal for analyses that seek to understand the interaction 
							between and among these factors.
 
							Fit in a linear growth model to the LSAY dataset
							(CSV format of the data is
							available here). 
							Report and interpret several latent growth curve models, inspecting the corresponding fit measures
							-- Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA).
							 
							Which of these 3 models (model1, model2, model3) would we choose as most appropriate?
 
							
							model1 <- '
							  # Latent variables intercept and slope with fixed coefficients
							i =~ 1*math.0 + 1*math.1 + 1*math.2
							s =~  0*math.0 + 1*math.1 + 2*math.2
							  # regressions
							i ~  cohort + school + weight     
                                                        
							s ~ cohort + school + weight
							  # time-varying covariates
							math.0 ~ att.0
							math.1 ~ att.1 
							math.2 ~ att.2
							'
							
							
 
							
							model2 <- '
							  # Latent variables intercept and slope with fixed coefficients
							i =~ 1* att.0 + 1* att.1 + 1* att.2
							s =~  0* att.0 + 1* att.1 + 2* att.2
							  # regressions
							i ~  cohort + school + weight          
                                                   
							s ~ cohort + school + weight
							  # time-varying covariates
							att.0 ~ math.0 
							att.1 ~ math.1
							att.2 ~ math.2
							'
							
							
 
							
							model3 <- '
							  # Latent variables intercept and slope with fixed coefficients
							i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
							s =~  0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
							  # regressions
							i ~  cohort + school + weight       
                                                      
							s ~ cohort + school + weight
							  # time-varying covariates
							att.0 ~ math.0 
							att.1 ~ math.1
							att.2 ~ math.2
							'
							
							
 
							
							Example R code that may be helpful is included below.
							
								# A clean version of this data is stored as CSV file in the long-format 
								myData <- read.csv('https://umich.instructure.com/files/611677/download?download_frd=1')
								dim(myData); summary(myData); colnames(myData); head(myData)
								
								# library("foreign")
								# if you want to transform the data into a wide format
 
								# Long to wide format
								myData_wide <- reshape(myData, 
									timevar = "yr", # the variable in long format that differentiates multiple
  
									# records from the same group or individual.
 
									idvar = c("cohort", "school", "weight", "id"), # Names of one or more variables 
 
									# in long format that identify multiple records from the same group/individual. 
 
									# These variables may also be present in wide format.
 
									direction = "wide"
 
								    )
								dim(myData_wide); summary(myData_wide); colnames(myData_wide); head(myData_wide)
								
								# Wide variables: id cohort school weight math.0 att.0 math.1 att.1 math.2 att.2
								# you can use the lavaan’s package *growth* function
								
								model1 <- '
 
								  # Latent variables intercept and slope with fixed coefficients
 
								i =~ 1*math.0 + 1*math.1 + 1*math.2
 
								s =~  0*math.0 + 1*math.1 + 2*math.2
 
								  # regressions
 
								i ~  cohort + school + weight  
                                                            
								s ~ cohort + school + weight
 
								  # time-varying covariates
 
								math.0 ~ att.0
 
								math.1 ~ att.1 
 
								math.2 ~ att.2
 
								'
								
								fit1 <- growth(model1, data=myData_wide)
								summary(fit1); fitMeasures(fit1, c("cfi", "rmsea", "srmr"))
									parameterEstimates(fit1)
								library(semPlot)
								semPaths(fit1, "std", edge.label.cex = 1.0, label.prop = 1.0, curve = 1.5, curvePivot = TRUE, mar = c(3, 5, 5, 5), nCharNodes=5, 
									nCharEdges=5, groups = "latents", edge.width=2.0)
						
								model2 <- '
 
								  # Latent variables intercept and slope with fixed coefficients
 
								i =~ 1* att.0 + 1* att.1 + 1* att.2
 
								s =~  0* att.0 + 1* att.1 + 2* att.2
 
								  # regressions
 
								i ~  cohort + school + weight   
                                                           
								s ~ cohort + school + weight
 
								  # time-varying covariates
 
								att.0 ~ math.0 
 
								att.1 ~ math.1
 
								att.2 ~ math.2
 
								'
								
								fit2 <- growth(model2, data=myData_wide)
								summary(fit2); fitMeasures(fit2, c("cfi", "rmsea", "srmr"))
								parameterEstimates(fit2)
																
								model3 <- '
 
								  # Latent variables intercept and slope with fixed coefficients
 
								i =~ 1* att.0 + 1* att.1 + 1* att.2 + math.0 + math.1 + math.2
 
								s =~  0* att.0 + 1* att.1 + 2* att.2 + math.0 + math.1 + math.2
 
								  # regressions
 
								i ~  cohort + school + weight   
                                                           
								s ~ cohort + school + weight
 
								  # time-varying covariates
 
								att.0 ~ math.0 
 
								att.1 ~ math.1
 
								att.2 ~ math.2
 
								'
								
								fit3 <- growth(model3, data=myData_wide)
								summary(fit3); fitMeasures(fit3, c("cfi", "rmsea", "srmr"))
									parameterEstimates(fit3)
							
					 
					- Homework 4.2: Fit in ARIMA(p,q,d) models to 3 voxel locations ([X, Y, Z, ], 
						specifying the 3D spatial coordinates of the voxels of interest) and discuss the models 
						similarities and/or differences. Recall the 
						Chapter 2 notes (02_Review_QC_MLR_LME_Modeling.docx) 
						on the 4D fMRI hyper-volume.
						 require(brainR); require(RNifti) # See examples here:
							https://cran.r-project.org/web/packages/oro.nifti/vignettes/nifti.pdf
							# and here:
							http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089470
							fMRIURL <-
							"http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
							fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
							download.file(fMRIURL, dest=fMRIFile, quiet=FALSE) fMRIVolume <-
							readNIfTI(fMRIFile, reorient=FALSE) # dimensions: 64 x 64 x 21 x
							180 ; 4mm x 4mm x 6mm x 3 sec 
							fMRIVolDims <- dim(fMRIVolume);
							fMRIVolDims ## [1] 64 64 21 180 
							time_dim <- fMRIVolDims[4];
							time_dim ## [1] 180 # To examine the time course of a specific 3D
							voxel (say the one at x=30, y=30, z=15): 
							plot(fMRIVolume[30, 30,
							10,], type='l', main="Time Series of 3D Voxel \n (x=30, y=30,
							z=15)", col="blue") 
							x1 <- c(1:180) 
							y1 <- loess(fMRIVolume[30, 30, 10,]~ x1, family = "gaussian") 
							lines(x1, smooth(fMRIVolume[30,30, 10,]), col = "red", lwd = 2) 
							lines(ksmooth(x1, fMRIVolume[30,
							30, 10,], kernel = "normal", bandwidth = 5), col = "green", lwd =3) 
						
					 
				
				
				
			 
			
			
				SOCR Resource Visitor
				number