Consequences of Egocentric Networks (methods)
Figure: Basic micro-macro model
1 Introduction
In this assignment/tutorial I will demonstrate how to estimate a micro-macro model with the R package Lavaan. The Lavaan website can be found here. During the workgroup I will explain all code. For those of you who don’t attend the workgroups, google knows way more than I do.
2 Before you start
Before you start, check whether you run the latest RStudio version (from the Help menu, pick ‘check for updates’ and whether you need to update R.
install.packages("installr") #you first install packages
require(installr) #then you will need to activate packages.
updateR() #run the function to start the update process
Give your script a nice name. Include the author, and data when you last modified the script. Include a lot of comments in your script! Don’t forget, always start with cleaning up your workspace.
And set your working directory.
# set working directory
setwd("C:\\YOURDIR\\YOURSUBDIR\\YOURSUBSUBDIR\\") #change to your own workdirectory
Install the packages you will need.
3 Data
3.1 Simulate data
If I try to get an understanding of a new method, I usually use a simulated dataset. Then I at least know what the world looks like (I know what I have put in so I know what I should get out of the model). For you guys and gals, it is not necessary to understand the simulation process but feel free to have a close look.
set.seed(13789876)
# simulate the true network characteristic
LX <- rnorm(1000, 0, 2)
# this network characteristic is latent, not measured. We have six indicators for this latent
# variable: 2 per alter; 3 alters.
# a good indicator
x1 <- alt1_xa <- LX + rnorm(1000, 0, 1)
x2 <- alt2_xa <- LX + rnorm(1000, 0, 1)
x3 <- alt3_xa <- LX + rnorm(1000, 0, 1)
# a messy indicator
alt1_xb <- 0.3 * LX + 0.1 * x1 + rnorm(1000, 0, 1) + 0.1 * x1 * rnorm(1000, 0, 1)
alt2_xb <- 0.3 * LX + 0.1 * x2 + rnorm(1000, 0, 1) + 0.1 * x3 * rnorm(1000, 0, 1)
alt3_xb <- 0.3 * LX + 0.1 * x3 + rnorm(1000, 0, 1) + 0.1 * x3 * rnorm(1000, 0, 1)
# we also have missingness (MCAR)
n1 <- rbinom(1000, 1, 0.95)
n2 <- rbinom(1000, 1, 0.85)
n3 <- rbinom(1000, 1, 0.75)
alt1_xa <- ifelse(n1, alt1_xa, NA)
alt2_xa <- ifelse(n2, alt2_xa, NA)
alt3_xa <- ifelse(n3, alt3_xa, NA)
alt1_xb <- ifelse(n1, alt1_xb, NA)
alt2_xb <- ifelse(n2, alt2_xb, NA)
alt3_xb <- ifelse(n3, alt3_xb, NA)
# lets calculate network size.
ns <- rowSums(cbind(n1, n2, n3))
# simulate two dependnt variables to play with. mean alter effect
Y1 <- 5 * LX + rnorm(1000, 0, 5)
# total alter effect
Y2 <- 3 * LX * ns + rnorm(1000, 0, 5)
ID <- 1:length(Y1)
data_wide <- data.frame(ID, Y1, Y2, alt1_xa, alt2_xa, alt3_xa, alt1_xb, alt2_xb, alt3_xb)
data_long <- reshape(data_wide, direction = "long", varying = c("alt1_xa", "alt1_xb", "alt2_xa", "alt2_xb",
"alt3_xa", "alt3_xb"), timevar = "alter", v.names = c("xa", "xb"), times = c("alt1", "alt2", "alt3"),
idvar = "ID")
We have a dataset with two different dependent variables: Y1 and Y2. For each ego we collected information on at least 3 alters. For each alter we have information on two characteristics: xa and xb. Suppose these alter characteristics are indicators of alter’s happiness. We want to know if alter’s happiness is related to ego’s happiness. O yeah, we have our data in both long and wide format.
3.2 have a look at your data
Assingment 1: Have a look at your data. What are the percentages of missing data?
Assingment 2: Try to make your own dataset in long format from the dataset in wide format.
4 Naive approaches
4.1 just try to use one indicator of one alter?
This would imply estimating a simple linear OLS model.
##
## Call:
## lm(formula = Y1 ~ alt1_xa, data = data_wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.4016 -4.8055 -0.2507 4.4061 20.3413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3197 0.2161 1.479 0.139
## alt1_xa 3.9772 0.0982 40.503 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.695 on 958 degrees of freedom
## (40 observations deleted due to missingness)
## Multiple R-squared: 0.6313, Adjusted R-squared: 0.6309
## F-statistic: 1640 on 1 and 958 DF, p-value: < 2.2e-16
4.2 Aggregation method
But obviously we would like to use the information on all alters. One method is to use an aggregation method. Thus calculate the mean happiness score of the alters and use to predict ego’s happiness. It is called the aggregation method because we now have one observation per egonet.
Assingment 3: Before you look at the code below. Try to calculate the mean happiness score of the alters. Be aware that we missing values in our dataset.
# aggregation first calculate the mean score of the alters.
data_wide$xam <- rowMeans(cbind(data_wide$alt1_xa, data_wide$alt2_xa, data_wide$alt3_xa), na.rm = TRUE)
summary(lm(Y1 ~ xam, data = data_wide))
##
## Call:
## lm(formula = Y1 ~ xam, data = data_wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.2898 -4.0026 -0.1233 4.0629 20.3896
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.19756 0.18942 1.043 0.297
## xam 4.52667 0.09248 48.947 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.984 on 996 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.7063, Adjusted R-squared: 0.7061
## F-statistic: 2396 on 1 and 996 DF, p-value: < 2.2e-16
4.3 Disaggregation method
Another common approach is to disaggregate the data. In this approach we match the score of ego to all individual alters.
##
## Call:
## lm(formula = Y1 ~ xa, data = data_long)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.9658 -4.7366 -0.1134 4.6496 22.9888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.09209 0.13263 0.694 0.488
## xa 3.97062 0.06054 65.584 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.742 on 2582 degrees of freedom
## (416 observations deleted due to missingness)
## Multiple R-squared: 0.6249, Adjusted R-squared: 0.6247
## F-statistic: 4301 on 1 and 2582 DF, p-value: < 2.2e-16
5 micro-macro approach
Both approaches do not do justice to the structure of our dataset. We have information on alters (lowest level / alter-level / micro-level) and these determine an egonet characteristic (highest level / ego-level / macro-level). Where in the traditional multi-level model we use macro-level variables to predict micro-level variables, we now have micro-level variables predicting macro-level variables. See Figure below.
Figure: Micro-macro latent variable model with one micro-level variable1
In the literature two approaches are discussed to estimate a micro-macro model, a persons as variables approach and a multi-level approach. The persons as variables approach is - I hope - easiest to implement. The idea is that the alter scores load on a latent variable at the ego-level. This latent variable has a random component at the ego-level (cf random intercept in multi-level models). In a basic model, the latent variable is the (biased corrected) mean alter-score.
I am using the package Lavaan to estimate the models. In the section ‘alternative approaches’ I will demonstrate the multi-level approach and the syntax to estimate these models with MPlus (within R).
5.1 first quickly estimate the previous models in Lavaan.
Information of one alter only:
library(lavaan)
model1 <- '
Y1 ~ alt1_xa
Y1 ~ 1
Y1 ~~ Y1
'
fit1 <- lavaan(model1, data = data_wide)
summary(fit1)
## lavaan 0.6-7 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 3
##
## Used Total
## Number of observations 960 1000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## alt1_xa 3.977 0.098 40.545 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 0.320 0.216 1.481 0.139
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 44.725 2.041 21.909 0.000
Aggregation method:
library(lavaan)
model1 <- "
Y1 ~ xam
Y1 ~ 1
Y1 ~~ Y1
"
fit1 <- lavaan(model1, data = data_wide)
summary(fit1)
## lavaan 0.6-7 ended normally after 23 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 3
##
## Used Total
## Number of observations 998 1000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## xam 4.527 0.092 48.996 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 0.198 0.189 1.044 0.296
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 35.737 1.600 22.338 0.000
Disaggregation method:
library(lavaan)
model1 <- "
Y1 ~ xa
Y1 ~ 1
Y1 ~~ Y1
"
fit1 <- lavaan(model1, data = data_long)
summary(fit1)
## lavaan 0.6-7 ended normally after 19 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 3
##
## Used Total
## Number of observations 2584 3000
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## xa 3.971 0.061 65.609 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 0.092 0.133 0.695 0.487
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 45.420 1.264 35.944 0.000
5.2 Persons as variable approach in Lavaan
The assumption is that the alters are indistinguishable. Thus variables of alters have presumably the same mean-value, same-variance, same loading on the latent variable.
Assignment 4: Test some of these assumptions.
We work with a latent variable. Hence we have to either fix the variance of the latent-variable or fix the factor loading of one of our indicator variables. I have chosen for the latter. I fixed the loading to “1”. But given the assumptions just mentioned above, all of them are fixed. The advantage is that one unit increase in e.g. alt1_xa
leads to one unit increase in our latent variable FX
. We hence are able to compare the estimate of the latent variable with the previous estimates we obtained for the alter characteristics. Note, we do not include an exogenous variable at the ego-level yet.
5.2.1 persons as variables, cases listwise deleted
# one individual-level predictor, one latent variable at group level
model2 <- "
FX =~ 1*alt1_xa
FX =~ 1*alt2_xa
FX =~ 1*alt3_xa
alt1_xa ~~ a*alt1_xa
alt2_xa ~~ a*alt2_xa
alt3_xa ~~ a*alt3_xa
FX ~~ FX
Y1 ~~ Y1
Y1 ~ FX
Y1 ~ 1
alt1_xa ~ c*1
alt2_xa ~ c*1
alt3_xa ~ c*1
"
fit <- lavaan(model2, data = data_wide)
summary(fit)
## lavaan 0.6-7 ended normally after 31 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 10
## Number of equality constraints 4
##
## Used Total
## Number of observations 636 1000
##
## Model Test User Model:
##
## Test statistic 7.410
## Degrees of freedom 8
## P-value (Chi-square) 0.493
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## FX =~
## alt1_xa 1.000
## alt2_xa 1.000
## alt3_xa 1.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## FX 5.049 0.122 41.474 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 -0.053 0.434 -0.123 0.902
## .alt1_xa (c) 0.034 0.081 0.422 0.673
## .alt2_xa (c) 0.034 0.081 0.422 0.673
## .alt3_xa (c) 0.034 0.081 0.422 0.673
## FX 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .alt1_xa (a) 1.006 0.040 25.219 0.000
## .alt2_xa (a) 1.006 0.040 25.219 0.000
## .alt3_xa (a) 1.006 0.040 25.219 0.000
## FX 3.795 0.232 16.359 0.000
## .Y1 22.934 1.797 12.760 0.000
5.2.2 persons as variables, include cases with missing values on alter-characterstics
A big advantage of the micro-macro model is that we do not have to delete cases in a listwise manner. The above estimates are based on respondents who reported to have 3 alters. But naturally, not everyone will have a complete network of 3 alters. Thus, lets tell Lavaan to include those cases as well.
# one individual-level predictor, one latent variable at group level
model2 <- "
FX =~ 1*alt1_xa
FX =~ 1*alt2_xa
FX =~ 1*alt3_xa
alt1_xa ~~ a*alt1_xa
alt2_xa ~~ a*alt2_xa
alt3_xa ~~ a*alt3_xa
FX ~~ FX
Y1 ~~ Y1
Y1 ~ FX
Y1 ~ 1
alt1_xa ~ c*1
alt2_xa ~ c*1
alt3_xa ~ c*1
"
fit <- lavaan(model2, data = data_wide, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 10
## Number of equality constraints 4
##
## Number of observations 1000
## Number of missing patterns 8
##
## Model Test User Model:
##
## Test statistic 10.807
## Degrees of freedom 8
## P-value (Chi-square) 0.213
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## FX =~
## alt1_xa 1.000
## alt2_xa 1.000
## alt3_xa 1.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## FX 5.023 0.105 47.620 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 0.189 0.349 0.541 0.589
## .alt1_xa (c) 0.007 0.065 0.105 0.916
## .alt2_xa (c) 0.007 0.065 0.105 0.916
## .alt3_xa (c) 0.007 0.065 0.105 0.916
## FX 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .alt1_xa (a) 0.999 0.035 28.169 0.000
## .alt2_xa (a) 0.999 0.035 28.169 0.000
## .alt3_xa (a) 0.999 0.035 28.169 0.000
## FX 3.792 0.188 20.169 0.000
## .Y1 25.944 1.661 15.616 0.000
And there you have it.
6 Assignment
Assignment 5: Re-estimate the micro-macro model but now use the other indicator.
Figure: Micro-macro latent variable model with two micro-level variables
Assignment 6: Re-estimate the micro-macro model but now use both indicators. See the figure above for the intended model. Note, we do not have an exogenous variable at the ego-level yet. We include a covariance between the two indicators at the alter-level, because it may not be reasonable to assume that all of the association between the alter-level indicators is explained by the latent-variable.
Figure: Micro-macro latent variable model with two micro-level variables. Ego-level variable X moderates the impact of the latent variable at the group-level.
Assignment 7 is way too difficult. First, in Lavaan it is not straightforward to estimate interaction effects when the interaction variable involves a latent-variable. Second, the variable size is not normally distributed. This violation of model assumptions leads to all kind of ‘problems’.
Assignment 7: Test the hypothesis that the larger the core-discussion network, the smaller the influence of each individual alter will be. Test this hypothesis for both dependent variables. Bonus: estimate both dependent variables in one SEM. See the figure above for the intended model.
Assignment 8: Please now use a real dataset. Formulate an interesting hypothesis (and provide a motivation) on how the CDN (i.e. how alters) may influence the “Attitude towards eu-integration” of ego.
7 Real data
Download sn2021_egonetdata_v2.Rdata
Save it in your working directory
If you want to use this data run the following command: load('sn2021_egonetdata_v2.Rdata')
Description of dataset
Subset of the LISS panel data (year 2009 and 2010).
Four ego variables (eu, educ, age, g).
Three confidant variables (educ_a, age_a, g_a).
In the wide data the first number in the label of confidant variables indicates the survey wave, the second number the alter id. For this assignment please use data in wide format.
liss_wide: liss data in a wide dataframe. liss_long: liss data in a long dataframe.
Dependent variables:
‘eu’ Attitude towards eu-integration (0 “EU integration went to far” to 4 “EU integration not far enough”)
Ego control variables:
‘educ’ is educational attainment of ego in years.
‘age’ is the self-reported age of respondents.
‘g’ measures whether respondents are female (1) or male(0).
Alter variables:
‘educ_a’ measures the educational attainment of confidant in years.
‘g_a’ measures the gender of confidants, female (1) or male (0).
‘age_a’ measures the age of confidants in 14 categories. These are
1 younger than 16
2 16 - 20
3 21 - 25
4 26 - 30
5 31 - 35
6 36 - 40
7 41 - 45
8 46 - 50
9 51 - 55
10 56 - 60
11 61 - 65
12 66 - 70
13 71 years or older
8 answers
8.1 calculate network size
data_wide$size <- as.numeric(rowSums(!is.na(cbind(data_wide$alt1_xa, data_wide$alt2_xa, data_wide$alt3_xa))))
table(data_wide$size, useNA = "always")
##
## 0 1 2 3 <NA>
## 2 48 314 636 0
8.2 Assignment 6
I also included the main effect of size. Please note that size is not a normally distributed variable. This may lead to all kind of estimation problems. See here and here.
# just ignore the non-normality of size
model <- "
#latent variable
FX =~ 1*alt1_xa
FX =~ 1*alt2_xa
FX =~ 1*alt3_xa
FX =~ a*alt1_xb
FX =~ a*alt2_xb
FX =~ a*alt3_xb
#variances
alt1_xa ~~ b*alt1_xa
alt2_xa ~~ b*alt2_xa
alt3_xa ~~ b*alt3_xa
alt1_xb ~~ c*alt1_xb
alt2_xb ~~ c*alt2_xb
alt3_xb ~~ c*alt3_xb
FX ~~ FX
Y1 ~~ Y1
Y2 ~~ Y2
size ~~ size
#covariances
Y1 ~~ Y2
alt1_xa ~~ d*alt1_xb
alt2_xa ~~ d*alt2_xb
alt3_xa ~~ d*alt3_xb
#regression model
Y1 ~ FX + size
Y1 ~ 1
FX ~ size
Y2 ~ FX + size
Y2 ~ 1
#intercepts/means
alt1_xa ~ e*1
alt2_xa ~ e*1
alt3_xa ~ e*1
alt1_xb ~ f*1
alt2_xb ~ f*1
alt3_xb ~ f*1
"
fit1 <- lavaan(model, data = data_wide, missing = "fiml", fixed.x = FALSE)
# declare the size variable to be ordered. We do have to switch of estimation procedure.
data_wide$size2 <- ordered(data_wide$size)
# The code below won't work because if we delete observations with missing values, we don't have any
# variance left in the size variable. A solution would be to use a two-step approach. model <- '
# #latent variable FX =~ 1*alt1_xa FX =~ 1*alt2_xa FX =~ 1*alt3_xa FX =~ a*alt1_xb FX =~ a*alt2_xb FX
# =~ a*alt3_xb #variances alt1_xa ~~ b*alt1_xa alt2_xa ~~ b*alt2_xa alt3_xa ~~ b*alt3_xa alt1_xb ~~
# c*alt1_xb alt2_xb ~~ c*alt2_xb alt3_xb ~~ c*alt3_xb FX ~~ FX Y1 ~~ Y1 Y2 ~~ Y2 size2 ~~ size2
# #covariances Y1 ~~ Y2 alt1_xa ~~ d*alt1_xb alt2_xa ~~ d*alt2_xb alt3_xa ~~ d*alt3_xb #regression
# model Y1 ~ FX + size2 Y1 ~ 1 FX ~ size2 Y2 ~ FX + size2 Y2 ~ 1 #intercepts/means alt1_xa ~ e*1
# alt2_xa ~ e*1 alt3_xa ~ e*1 alt1_xb ~ f*1 alt2_xb ~ f*1 alt3_xb ~ f*1 ' fit2 <- lavaan(model, data
# = data_wide, fixed.x=FALSE, ordered=c('size2'))
summary(fit1)
# summary(fit2)
## lavaan 0.6-7 ended normally after 105 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 30
## Number of equality constraints 12
##
## Number of observations 1000
## Number of missing patterns 8
##
## Model Test User Model:
##
## Test statistic 3027.830
## Degrees of freedom 36
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## FX =~
## alt1_xa 1.000
## alt2_xa 1.000
## alt3_xa 1.000
## alt1_xb (a) 0.407 0.011 36.042 0.000
## alt2_xb (a) 0.407 0.011 36.042 0.000
## alt3_xb (a) 0.407 0.011 36.042 0.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## FX 5.095 0.107 47.609 0.000
## size -0.914 0.341 -2.683 0.007
## FX ~
## size 0.117 0.110 1.070 0.285
## Y2 ~
## FX 7.937 0.137 57.834 0.000
## size -0.755 0.436 -1.733 0.083
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 ~~
## .Y2 -1.255 1.547 -0.812 0.417
## .alt1_xa ~~
## .alt1_xb (d) 0.094 0.024 3.969 0.000
## .alt2_xa ~~
## .alt2_xb (d) 0.094 0.024 3.969 0.000
## .alt3_xa ~~
## .alt3_xb (d) 0.094 0.024 3.969 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 1.003 1.560 0.643 0.520
## .Y2 -0.666 2.295 -0.290 0.772
## .alt1_xa (e) -0.306 0.293 -1.045 0.296
## .alt2_xa (e) -0.306 0.293 -1.045 0.296
## .alt3_xa (e) -0.306 0.293 -1.045 0.296
## .alt1_xb (f) -0.102 0.121 -0.844 0.399
## .alt2_xb (f) -0.102 0.121 -0.844 0.399
## .alt3_xb (f) -0.102 0.121 -0.844 0.399
## size 0.000
## .FX 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .alt1_xa (b) 1.001 0.035 28.771 0.000
## .alt2_xa (b) 1.001 0.035 28.771 0.000
## .alt3_xa (b) 1.001 0.035 28.771 0.000
## .alt1_xb (c) 1.098 0.032 34.764 0.000
## .alt2_xb (c) 1.098 0.032 34.764 0.000
## .alt3_xb (c) 1.098 0.032 34.764 0.000
## .FX 3.675 0.182 20.148 0.000
## .Y1 26.210 1.637 16.008 0.000
## .Y2 31.545 2.600 12.131 0.000
## size 7.028 0.314 22.361 0.000
Please note that the estimated variance (model implied variance) of the size
variable is way too large. In reality the variance is 0.35
. This indicates we have a misspecified model.
8.3 Assignment 7
Unfortunately, in Lavaan it is not implemented to include an interaction term with a latent variable into the structural part of the model. We have to fall back on a two-step approach. For some literature on the differences between a one-step and two-step approach see Anderson and Gerbing (1988).2
But here it goes…
# credits where credits are due:
# https://stackoverflow.com/questions/24399353/r-lavaan-coding-latent-variable-interactions
# 1. set up our measurement model
model2 <- "
#latent variable
FX =~ 1*alt1_xa
FX =~ 1*alt2_xa
FX =~ 1*alt3_xa
FX =~ a*alt1_xb
FX =~ a*alt2_xb
FX =~ a*alt3_xb
#variances
alt1_xa ~~ b*alt1_xa
alt2_xa ~~ b*alt2_xa
alt3_xa ~~ b*alt3_xa
alt1_xb ~~ c*alt1_xb
alt2_xb ~~ c*alt2_xb
alt3_xb ~~ c*alt3_xb
FX ~~ FX
"
fit <- lavaan(model2, data = data_wide, missing = "fiml", fixed.x = FALSE)
# 2. extract the predicted values of the cfa and add them to new dataframe data_wide2
data_wide2 <- data.frame(data_wide, predict(fit))
# 3. create a new variable with the interaction of FX and size
data_wide2$FXsize <- data_wide2$FX * data_wide2$size
# 3. now set up the structural model and add the predefined interaction
model2 <- "
FX ~~ FX
Y1 ~~ Y1
Y2 ~~ Y2
size ~~ 0.35*size #I am fixing the variance to the observed variance.
FXsize ~~ FXsize
#covariances
Y1 ~~ Y2
#regression model
Y1 ~ FX + size + FXsize
Y1 ~ 1
FX ~ size
Y2 ~ FX + size + FXsize
Y2 ~ 1
"
fit <- lavaan(model2, data = data_wide2, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 72 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 14
##
## Number of observations 1000
## Number of missing patterns 2
##
## Model Test User Model:
##
## Test statistic 22209.404
## Degrees of freedom 6
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## FX 5.165 0.476 10.843 0.000
## size -0.767 0.320 -2.400 0.016
## FXsize -0.062 0.178 -0.349 0.727
## FX ~
## size 0.008 0.022 0.338 0.736
## Y2 ~
## FX 1.062 0.547 1.942 0.052
## size -0.366 0.366 -1.001 0.317
## FXsize 2.577 0.205 12.587 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 ~~
## .Y2 12.511 1.319 9.485 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 2.132 0.849 2.513 0.012
## .Y2 0.602 0.971 0.619 0.536
## .FX 0.000
## size 0.000
## FXsize 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .FX 3.471 0.155 22.354 0.000
## .Y1 34.789 1.557 22.345 0.000
## .Y2 45.446 2.034 22.347 0.000
## size 0.350
## FXsize 24.736 1.106 22.358 0.000
8.4 final model?
Let us assume that based on the literature, I don’t expect that size would lead to my latent variable (e.g. network-happiness). I do not expect that people (egos) with larger networks have networks in which the alters are on average happier. Thus I don’t want to include this direct path in my model. I also do not expect that network size by itself is related to my happiness. I thus also would like to exclude this path from the model. I just have very good reasons (ahum) to think that each alter contributes uniquely to my happiness, there is an additive effect. This implies an interaction between network size and our latent variable. Let us estimate this more ‘theoretical’ model.
model <- "
FX ~~ FX
Y1 ~~ Y1
Y2 ~~ Y2
FXsize ~~ FXsize
#covariances
Y1 ~~ Y2
#regression model
Y1 ~ FX + FXsize
Y1 ~ 1
Y2 ~ FX + FXsize
Y2 ~ 1
"
fit <- lavaan(model, data = data_wide2, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 61 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 11
##
## Number of observations 1000
## Number of missing patterns 2
##
## Model Test User Model:
##
## Test statistic 3127.833
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y1 ~
## FX 5.107 0.476 10.729 0.000
## FXsize -0.043 0.178 -0.240 0.810
## Y2 ~
## FX 1.032 0.546 1.890 0.059
## FXsize 2.587 0.205 12.652 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .Y1 ~~
## .Y2 12.606 1.324 9.522 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y1 0.146 0.187 0.779 0.436
## .Y2 -0.346 0.214 -1.622 0.105
## FX 0.000
## FXsize 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## FX 3.472 0.155 22.353 0.000
## .Y1 34.985 1.566 22.344 0.000
## .Y2 45.493 2.036 22.347 0.000
## FXsize 24.735 1.106 22.358 0.000
We would conclude that my line of reasoning holds true with respect to Y2 but not with respect to Y1.
9 Alternative Approaches
The above micro-macro models may also be estimated within Mplus developed by Muthén and Muthén www.statmodel.com. If you have Mplus installed on your computer, you may use the R package MplusAutomation
to estimate models with Mplus within R.
Images adapted from: Bennink, M., Croon, M. A., Kroon, B., & Vermunt, J. K. (2016). Micro–macro multilevel latent class models with multiple discrete individual-level variables. Advances in Data Analysis and Classification, 10(2), 139-154.↩︎
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological bulletin, 103(3), 411.↩︎