Consequences of Egocentric Networks (methods)

Figure: Basic micro-macro model

1 Introduction

In this assignment/tutorial I will demonstrate how to estimate a micro-macro model with the R package Lavaan. The Lavaan website can be found here. During the workgroup I will explain all code. For those of you who don’t attend the workgroups, google knows way more than I do.

In the upper left and right corner of the code blocks you will find copy-to-clipboard buttons. Use these buttons to copy the code to your own editor.

2 Before you start

Before you start, check whether you run the latest RStudio version (from the Help menu, pick ‘check for updates’ and whether you need to update R.

install.packages("installr")  #you  first install packages
require(installr)  #then you will need to activate packages. 
updateR()  #run the function to start the update process

Give your script a nice name. Include the author, and data when you last modified the script. Include a lot of comments in your script! Don’t forget, always start with cleaning up your workspace.

### Author: JOCHEM TOLSMA### Lastmod: 31-08-2020###

# cleanup workspace
rm(list = ls())

And set your working directory.

# set working directory
setwd("C:\\YOURDIR\\YOURSUBDIR\\YOURSUBSUBDIR\\")  #change to your own workdirectory

Install the packages you will need.

# install packages
install.packages("lavaan", dependencies = TRUE)  # to estimate the micro-macro model
install.packages("psych")  # to describe our dataset
install.packages("nlme")  # for the multilevel models

3 Data

3.1 Simulate data

If I try to get an understanding of a new method, I usually use a simulated dataset. Then I at least know what the world looks like (I know what I have put in so I know what I should get out of the model). For you guys and gals, it is not necessary to understand the simulation process but feel free to have a close look.

set.seed(13789876)
# simulate the true network characteristic
LX <- rnorm(1000, 0, 2)
# this network characteristic is latent, not measured. We have six indicators for this latent
# variable: 2 per alter; 3 alters.

# a good indicator
x1 <- alt1_xa <- LX + rnorm(1000, 0, 1)
x2 <- alt2_xa <- LX + rnorm(1000, 0, 1)
x3 <- alt3_xa <- LX + rnorm(1000, 0, 1)

# a messy indicator
alt1_xb <- 0.3 * LX + 0.1 * x1 + rnorm(1000, 0, 1) + 0.1 * x1 * rnorm(1000, 0, 1)
alt2_xb <- 0.3 * LX + 0.1 * x2 + rnorm(1000, 0, 1) + 0.1 * x3 * rnorm(1000, 0, 1)
alt3_xb <- 0.3 * LX + 0.1 * x3 + rnorm(1000, 0, 1) + 0.1 * x3 * rnorm(1000, 0, 1)

# we also have missingness (MCAR)
n1 <- rbinom(1000, 1, 0.95)
n2 <- rbinom(1000, 1, 0.85)
n3 <- rbinom(1000, 1, 0.75)

alt1_xa <- ifelse(n1, alt1_xa, NA)
alt2_xa <- ifelse(n2, alt2_xa, NA)
alt3_xa <- ifelse(n3, alt3_xa, NA)

alt1_xb <- ifelse(n1, alt1_xb, NA)
alt2_xb <- ifelse(n2, alt2_xb, NA)
alt3_xb <- ifelse(n3, alt3_xb, NA)

# lets calculate network size.
ns <- rowSums(cbind(n1, n2, n3))

# simulate two dependnt variables to play with.  mean alter effect
Y1 <- 5 * LX + rnorm(1000, 0, 5)

# total alter effect
Y2 <- 3 * LX * ns + rnorm(1000, 0, 5)

ID <- 1:length(Y1)


data_wide <- data.frame(ID, Y1, Y2, alt1_xa, alt2_xa, alt3_xa, alt1_xb, alt2_xb, alt3_xb)

data_long <- reshape(data_wide, direction = "long", varying = c("alt1_xa", "alt1_xb", "alt2_xa", "alt2_xb", 
    "alt3_xa", "alt3_xb"), timevar = "alter", v.names = c("xa", "xb"), times = c("alt1", "alt2", "alt3"), 
    idvar = "ID")

We have a dataset with two different dependent variables: Y1 and Y2. For each ego we collected information on at least 3 alters. For each alter we have information on two characteristics: xa and xb. Suppose these alter characteristics are indicators of alter’s happiness. We want to know if alter’s happiness is related to ego’s happiness. O yeah, we have our data in both long and wide format.

3.2 have a look at your data

Assingment 1: Have a look at your data. What are the percentages of missing data?
Assingment 2: Try to make your own dataset in long format from the dataset in wide format.

4 Naive approaches

4.1 just try to use one indicator of one alter?

This would imply estimating a simple linear OLS model.

# using one alter observation
summary(lm(Y1 ~ alt1_xa, data = data_wide))
## 
## Call:
## lm(formula = Y1 ~ alt1_xa, data = data_wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.4016  -4.8055  -0.2507   4.4061  20.3413 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.3197     0.2161   1.479    0.139    
## alt1_xa       3.9772     0.0982  40.503   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.695 on 958 degrees of freedom
##   (40 observations deleted due to missingness)
## Multiple R-squared:  0.6313, Adjusted R-squared:  0.6309 
## F-statistic:  1640 on 1 and 958 DF,  p-value: < 2.2e-16

4.2 Aggregation method

But obviously we would like to use the information on all alters. One method is to use an aggregation method. Thus calculate the mean happiness score of the alters and use to predict ego’s happiness. It is called the aggregation method because we now have one observation per egonet.

Assingment 3: Before you look at the code below. Try to calculate the mean happiness score of the alters. Be aware that we missing values in our dataset.

# aggregation first calculate the mean score of the alters.


data_wide$xam <- rowMeans(cbind(data_wide$alt1_xa, data_wide$alt2_xa, data_wide$alt3_xa), na.rm = TRUE)
summary(lm(Y1 ~ xam, data = data_wide))
## 
## Call:
## lm(formula = Y1 ~ xam, data = data_wide)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.2898  -4.0026  -0.1233   4.0629  20.3896 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.19756    0.18942   1.043    0.297    
## xam          4.52667    0.09248  48.947   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.984 on 996 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.7063, Adjusted R-squared:  0.7061 
## F-statistic:  2396 on 1 and 996 DF,  p-value: < 2.2e-16

4.3 Disaggregation method

Another common approach is to disaggregate the data. In this approach we match the score of ego to all individual alters.

summary(lm(Y1 ~ xa, data = data_long))
## 
## Call:
## lm(formula = Y1 ~ xa, data = data_long)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.9658  -4.7366  -0.1134   4.6496  22.9888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.09209    0.13263   0.694    0.488    
## xa           3.97062    0.06054  65.584   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.742 on 2582 degrees of freedom
##   (416 observations deleted due to missingness)
## Multiple R-squared:  0.6249, Adjusted R-squared:  0.6247 
## F-statistic:  4301 on 1 and 2582 DF,  p-value: < 2.2e-16

5 micro-macro approach

Both approaches do not do justice to the structure of our dataset. We have information on alters (lowest level / alter-level / micro-level) and these determine an egonet characteristic (highest level / ego-level / macro-level). Where in the traditional multi-level model we use macro-level variables to predict micro-level variables, we now have micro-level variables predicting macro-level variables. See Figure below.

Figure: Micro-macro latent variable model with one micro-level variable1

In the literature two approaches are discussed to estimate a micro-macro model, a persons as variables approach and a multi-level approach. The persons as variables approach is - I hope - easiest to implement. The idea is that the alter scores load on a latent variable at the ego-level. This latent variable has a random component at the ego-level (cf random intercept in multi-level models). In a basic model, the latent variable is the (biased corrected) mean alter-score.
I am using the package Lavaan to estimate the models. In the section ‘alternative approaches’ I will demonstrate the multi-level approach and the syntax to estimate these models with MPlus (within R).

5.1 first quickly estimate the previous models in Lavaan.

Information of one alter only:

library(lavaan)

model1 <- '
  Y1 ~ alt1_xa
  Y1 ~ 1
  Y1 ~~ Y1
  '

fit1 <- lavaan(model1, data = data_wide)
summary(fit1)
## lavaan 0.6-7 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##                                                   Used       Total
##   Number of observations                           960        1000
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     alt1_xa           3.977    0.098   40.545    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                0.320    0.216    1.481    0.139
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1               44.725    2.041   21.909    0.000

Aggregation method:

library(lavaan)
model1 <- "
  Y1 ~ xam
  Y1 ~ 1
  Y1 ~~ Y1
  "

fit1 <- lavaan(model1, data = data_wide)
summary(fit1)
## lavaan 0.6-7 ended normally after 23 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##                                                   Used       Total
##   Number of observations                           998        1000
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     xam               4.527    0.092   48.996    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                0.198    0.189    1.044    0.296
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1               35.737    1.600   22.338    0.000

Disaggregation method:

library(lavaan)
model1 <- "
  Y1 ~ xa
  Y1 ~ 1
  Y1 ~~ Y1
  "

fit1 <- lavaan(model1, data = data_long)
summary(fit1)
## lavaan 0.6-7 ended normally after 19 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##                                                   Used       Total
##   Number of observations                          2584        3000
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     xa                3.971    0.061   65.609    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                0.092    0.133    0.695    0.487
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1               45.420    1.264   35.944    0.000

5.2 Persons as variable approach in Lavaan

The assumption is that the alters are indistinguishable. Thus variables of alters have presumably the same mean-value, same-variance, same loading on the latent variable.

Assignment 4: Test some of these assumptions.

We work with a latent variable. Hence we have to either fix the variance of the latent-variable or fix the factor loading of one of our indicator variables. I have chosen for the latter. I fixed the loading to “1”. But given the assumptions just mentioned above, all of them are fixed. The advantage is that one unit increase in e.g. alt1_xa leads to one unit increase in our latent variable FX. We hence are able to compare the estimate of the latent variable with the previous estimates we obtained for the alter characteristics. Note, we do not include an exogenous variable at the ego-level yet.

5.2.1 persons as variables, cases listwise deleted

# one individual-level predictor, one latent variable at group level
model2 <- "
  
  FX =~ 1*alt1_xa
  FX =~ 1*alt2_xa
  FX =~ 1*alt3_xa
  
  alt1_xa ~~ a*alt1_xa
  alt2_xa ~~ a*alt2_xa
  alt3_xa ~~ a*alt3_xa
  FX ~~ FX
  Y1 ~~ Y1
  
  Y1 ~ FX
  Y1 ~ 1
  alt1_xa ~ c*1
  alt2_xa ~ c*1
  alt3_xa ~ c*1
"

fit <- lavaan(model2, data = data_wide)
summary(fit)
## lavaan 0.6-7 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         10
##   Number of equality constraints                     4
##                                                       
##                                                   Used       Total
##   Number of observations                           636        1000
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 7.410
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.493
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FX =~                                               
##     alt1_xa           1.000                           
##     alt2_xa           1.000                           
##     alt3_xa           1.000                           
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     FX                5.049    0.122   41.474    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1               -0.053    0.434   -0.123    0.902
##    .alt1_xa    (c)    0.034    0.081    0.422    0.673
##    .alt2_xa    (c)    0.034    0.081    0.422    0.673
##    .alt3_xa    (c)    0.034    0.081    0.422    0.673
##     FX                0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .alt1_xa    (a)    1.006    0.040   25.219    0.000
##    .alt2_xa    (a)    1.006    0.040   25.219    0.000
##    .alt3_xa    (a)    1.006    0.040   25.219    0.000
##     FX                3.795    0.232   16.359    0.000
##    .Y1               22.934    1.797   12.760    0.000

5.2.2 persons as variables, include cases with missing values on alter-characterstics

A big advantage of the micro-macro model is that we do not have to delete cases in a listwise manner. The above estimates are based on respondents who reported to have 3 alters. But naturally, not everyone will have a complete network of 3 alters. Thus, lets tell Lavaan to include those cases as well.

# one individual-level predictor, one latent variable at group level
model2 <- "
  
  FX =~ 1*alt1_xa
  FX =~ 1*alt2_xa
  FX =~ 1*alt3_xa
  
  alt1_xa ~~ a*alt1_xa
  alt2_xa ~~ a*alt2_xa
  alt3_xa ~~ a*alt3_xa
  FX ~~ FX
  Y1 ~~ Y1
  
  Y1 ~ FX 
  Y1 ~ 1
  alt1_xa ~ c*1
  alt2_xa ~ c*1
  alt3_xa ~ c*1
"

fit <- lavaan(model2, data = data_wide, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 33 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         10
##   Number of equality constraints                     4
##                                                       
##   Number of observations                          1000
##   Number of missing patterns                         8
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                10.807
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.213
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FX =~                                               
##     alt1_xa           1.000                           
##     alt2_xa           1.000                           
##     alt3_xa           1.000                           
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     FX                5.023    0.105   47.620    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                0.189    0.349    0.541    0.589
##    .alt1_xa    (c)    0.007    0.065    0.105    0.916
##    .alt2_xa    (c)    0.007    0.065    0.105    0.916
##    .alt3_xa    (c)    0.007    0.065    0.105    0.916
##     FX                0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .alt1_xa    (a)    0.999    0.035   28.169    0.000
##    .alt2_xa    (a)    0.999    0.035   28.169    0.000
##    .alt3_xa    (a)    0.999    0.035   28.169    0.000
##     FX                3.792    0.188   20.169    0.000
##    .Y1               25.944    1.661   15.616    0.000

And there you have it.

6 Assignment

Assignment 5: Re-estimate the micro-macro model but now use the other indicator.

Figure: Micro-macro latent variable model with two micro-level variables

Assignment 6: Re-estimate the micro-macro model but now use both indicators. See the figure above for the intended model. Note, we do not have an exogenous variable at the ego-level yet. We include a covariance between the two indicators at the alter-level, because it may not be reasonable to assume that all of the association between the alter-level indicators is explained by the latent-variable.

Figure: Micro-macro latent variable model with two micro-level variables. Ego-level variable X moderates the impact of the latent variable at the group-level.


Assignment 7 is way too difficult. First, in Lavaan it is not straightforward to estimate interaction effects when the interaction variable involves a latent-variable. Second, the variable size is not normally distributed. This violation of model assumptions leads to all kind of ‘problems’.

Assignment 7: Test the hypothesis that the larger the core-discussion network, the smaller the influence of each individual alter will be. Test this hypothesis for both dependent variables. Bonus: estimate both dependent variables in one SEM. See the figure above for the intended model.

Assignment 8: Please now use a real dataset. Formulate an interesting hypothesis (and provide a motivation) on how the CDN (i.e. how alters) may influence the “Attitude towards eu-integration” of ego.

7 Real data

Download sn2021_egonetdata_v2.Rdata

Save it in your working directory

If you want to use this data run the following command: load('sn2021_egonetdata_v2.Rdata')

Description of dataset
Subset of the LISS panel data (year 2009 and 2010).
Four ego variables (eu, educ, age, g).
Three confidant variables (educ_a, age_a, g_a).
In the wide data the first number in the label of confidant variables indicates the survey wave, the second number the alter id. For this assignment please use data in wide format.

liss_wide: liss data in a wide dataframe. liss_long: liss data in a long dataframe.

Dependent variables:
‘eu’ Attitude towards eu-integration (0 “EU integration went to far” to 4 “EU integration not far enough”)

Ego control variables:
‘educ’ is educational attainment of ego in years.
‘age’ is the self-reported age of respondents.
‘g’ measures whether respondents are female (1) or male(0).

Alter variables:
‘educ_a’ measures the educational attainment of confidant in years. ‘g_a’ measures the gender of confidants, female (1) or male (0).
‘age_a’ measures the age of confidants in 14 categories. These are 1 younger than 16 2 16 - 20 3 21 - 25 4 26 - 30 5 31 - 35 6 36 - 40 7 41 - 45 8 46 - 50 9 51 - 55 10 56 - 60 11 61 - 65 12 66 - 70 13 71 years or older

8 answers

8.1 calculate network size

data_wide$size <- as.numeric(rowSums(!is.na(cbind(data_wide$alt1_xa, data_wide$alt2_xa, data_wide$alt3_xa))))
table(data_wide$size, useNA = "always")
## 
##    0    1    2    3 <NA> 
##    2   48  314  636    0

8.2 Assignment 6

I also included the main effect of size. Please note that size is not a normally distributed variable. This may lead to all kind of estimation problems. See here and here.

# just ignore the non-normality of size
model <- "
  #latent variable
  FX =~ 1*alt1_xa
  FX =~ 1*alt2_xa
  FX =~ 1*alt3_xa
  
  FX =~ a*alt1_xb
  FX =~ a*alt2_xb
  FX =~ a*alt3_xb
  
  #variances
  alt1_xa ~~ b*alt1_xa
  alt2_xa ~~ b*alt2_xa
  alt3_xa ~~ b*alt3_xa
  
  alt1_xb ~~ c*alt1_xb
  alt2_xb ~~ c*alt2_xb
  alt3_xb ~~ c*alt3_xb
  
  FX ~~ FX
  Y1 ~~ Y1
  Y2 ~~ Y2
  size ~~ size
  
  #covariances
  Y1 ~~ Y2
  alt1_xa ~~ d*alt1_xb
  alt2_xa ~~ d*alt2_xb
  alt3_xa ~~ d*alt3_xb
  
  #regression model
  Y1 ~ FX + size
  Y1 ~ 1
  FX ~ size
  Y2 ~ FX + size
  Y2 ~ 1
   
  
  #intercepts/means
  alt1_xa ~ e*1
  alt2_xa ~ e*1
  alt3_xa ~ e*1
  alt1_xb ~ f*1
  alt2_xb ~ f*1
  alt3_xb ~ f*1
  
"


fit1 <- lavaan(model, data = data_wide, missing = "fiml", fixed.x = FALSE)

# declare the size variable to be ordered. We do have to switch of estimation procedure.
data_wide$size2 <- ordered(data_wide$size)

# The code below won't work because if we delete observations with missing values, we don't have any
# variance left in the size variable. A solution would be to use a two-step approach.  model <- '
# #latent variable FX =~ 1*alt1_xa FX =~ 1*alt2_xa FX =~ 1*alt3_xa FX =~ a*alt1_xb FX =~ a*alt2_xb FX
# =~ a*alt3_xb #variances alt1_xa ~~ b*alt1_xa alt2_xa ~~ b*alt2_xa alt3_xa ~~ b*alt3_xa alt1_xb ~~
# c*alt1_xb alt2_xb ~~ c*alt2_xb alt3_xb ~~ c*alt3_xb FX ~~ FX Y1 ~~ Y1 Y2 ~~ Y2 size2 ~~ size2
# #covariances Y1 ~~ Y2 alt1_xa ~~ d*alt1_xb alt2_xa ~~ d*alt2_xb alt3_xa ~~ d*alt3_xb #regression
# model Y1 ~ FX + size2 Y1 ~ 1 FX ~ size2 Y2 ~ FX + size2 Y2 ~ 1 #intercepts/means alt1_xa ~ e*1
# alt2_xa ~ e*1 alt3_xa ~ e*1 alt1_xb ~ f*1 alt2_xb ~ f*1 alt3_xb ~ f*1 ' fit2 <- lavaan(model, data
# = data_wide, fixed.x=FALSE, ordered=c('size2'))

summary(fit1)

# summary(fit2)
## lavaan 0.6-7 ended normally after 105 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         30
##   Number of equality constraints                    12
##                                                       
##   Number of observations                          1000
##   Number of missing patterns                         8
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              3027.830
##   Degrees of freedom                                36
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   FX =~                                               
##     alt1_xa           1.000                           
##     alt2_xa           1.000                           
##     alt3_xa           1.000                           
##     alt1_xb    (a)    0.407    0.011   36.042    0.000
##     alt2_xb    (a)    0.407    0.011   36.042    0.000
##     alt3_xb    (a)    0.407    0.011   36.042    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     FX                5.095    0.107   47.609    0.000
##     size             -0.914    0.341   -2.683    0.007
##   FX ~                                                
##     size              0.117    0.110    1.070    0.285
##   Y2 ~                                                
##     FX                7.937    0.137   57.834    0.000
##     size             -0.755    0.436   -1.733    0.083
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Y1 ~~                                               
##    .Y2               -1.255    1.547   -0.812    0.417
##  .alt1_xa ~~                                          
##    .alt1_xb    (d)    0.094    0.024    3.969    0.000
##  .alt2_xa ~~                                          
##    .alt2_xb    (d)    0.094    0.024    3.969    0.000
##  .alt3_xa ~~                                          
##    .alt3_xb    (d)    0.094    0.024    3.969    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                1.003    1.560    0.643    0.520
##    .Y2               -0.666    2.295   -0.290    0.772
##    .alt1_xa    (e)   -0.306    0.293   -1.045    0.296
##    .alt2_xa    (e)   -0.306    0.293   -1.045    0.296
##    .alt3_xa    (e)   -0.306    0.293   -1.045    0.296
##    .alt1_xb    (f)   -0.102    0.121   -0.844    0.399
##    .alt2_xb    (f)   -0.102    0.121   -0.844    0.399
##    .alt3_xb    (f)   -0.102    0.121   -0.844    0.399
##     size              0.000                           
##    .FX                0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .alt1_xa    (b)    1.001    0.035   28.771    0.000
##    .alt2_xa    (b)    1.001    0.035   28.771    0.000
##    .alt3_xa    (b)    1.001    0.035   28.771    0.000
##    .alt1_xb    (c)    1.098    0.032   34.764    0.000
##    .alt2_xb    (c)    1.098    0.032   34.764    0.000
##    .alt3_xb    (c)    1.098    0.032   34.764    0.000
##    .FX                3.675    0.182   20.148    0.000
##    .Y1               26.210    1.637   16.008    0.000
##    .Y2               31.545    2.600   12.131    0.000
##     size              7.028    0.314   22.361    0.000

Please note that the estimated variance (model implied variance) of the size variable is way too large. In reality the variance is 0.35. This indicates we have a misspecified model.

8.3 Assignment 7

Unfortunately, in Lavaan it is not implemented to include an interaction term with a latent variable into the structural part of the model. We have to fall back on a two-step approach. For some literature on the differences between a one-step and two-step approach see Anderson and Gerbing (1988).2

But here it goes…

# credits where credits are due:
# https://stackoverflow.com/questions/24399353/r-lavaan-coding-latent-variable-interactions

# 1. set up our measurement model
model2 <- "
  #latent variable
  FX =~ 1*alt1_xa
  FX =~ 1*alt2_xa
  FX =~ 1*alt3_xa
  
  FX =~ a*alt1_xb
  FX =~ a*alt2_xb
  FX =~ a*alt3_xb
  
  #variances
  alt1_xa ~~ b*alt1_xa
  alt2_xa ~~ b*alt2_xa
  alt3_xa ~~ b*alt3_xa
  
  alt1_xb ~~ c*alt1_xb
  alt2_xb ~~ c*alt2_xb
  alt3_xb ~~ c*alt3_xb
  
  FX ~~ FX
  "
fit <- lavaan(model2, data = data_wide, missing = "fiml", fixed.x = FALSE)

# 2. extract the predicted values of the cfa and add them to new dataframe data_wide2
data_wide2 <- data.frame(data_wide, predict(fit))

# 3. create a new variable with the interaction of FX and size
data_wide2$FXsize <- data_wide2$FX * data_wide2$size

# 3. now set up the structural model and add the predefined interaction

model2 <- "
  
  FX ~~ FX
  Y1 ~~ Y1
  Y2 ~~ Y2
  size ~~ 0.35*size #I am fixing the variance to the observed variance. 
  FXsize ~~ FXsize
  
  #covariances
  Y1 ~~ Y2
 
  #regression model
  Y1 ~ FX + size + FXsize
  Y1 ~ 1
  FX ~ size
  Y2 ~ FX + size + FXsize
  Y2 ~ 1
   
  
"

fit <- lavaan(model2, data = data_wide2, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 72 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         14
##                                                       
##   Number of observations                          1000
##   Number of missing patterns                         2
##                                                       
## Model Test User Model:
##                                                        
##   Test statistic                              22209.404
##   Degrees of freedom                                  6
##   P-value (Chi-square)                            0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     FX                5.165    0.476   10.843    0.000
##     size             -0.767    0.320   -2.400    0.016
##     FXsize           -0.062    0.178   -0.349    0.727
##   FX ~                                                
##     size              0.008    0.022    0.338    0.736
##   Y2 ~                                                
##     FX                1.062    0.547    1.942    0.052
##     size             -0.366    0.366   -1.001    0.317
##     FXsize            2.577    0.205   12.587    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Y1 ~~                                               
##    .Y2               12.511    1.319    9.485    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                2.132    0.849    2.513    0.012
##    .Y2                0.602    0.971    0.619    0.536
##    .FX                0.000                           
##     size              0.000                           
##     FXsize            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .FX                3.471    0.155   22.354    0.000
##    .Y1               34.789    1.557   22.345    0.000
##    .Y2               45.446    2.034   22.347    0.000
##     size              0.350                           
##     FXsize           24.736    1.106   22.358    0.000

8.4 final model?

Let us assume that based on the literature, I don’t expect that size would lead to my latent variable (e.g. network-happiness). I do not expect that people (egos) with larger networks have networks in which the alters are on average happier. Thus I don’t want to include this direct path in my model. I also do not expect that network size by itself is related to my happiness. I thus also would like to exclude this path from the model. I just have very good reasons (ahum) to think that each alter contributes uniquely to my happiness, there is an additive effect. This implies an interaction between network size and our latent variable. Let us estimate this more ‘theoretical’ model.

model <- "
  
  FX ~~ FX
  Y1 ~~ Y1
  Y2 ~~ Y2
  FXsize ~~ FXsize
  
  #covariances
  Y1 ~~ Y2
 
  #regression model
  Y1 ~ FX + FXsize
  Y1 ~ 1
  Y2 ~ FX + FXsize
  Y2 ~ 1
   
  
"

fit <- lavaan(model, data = data_wide2, missing = "fiml", fixed.x = FALSE)
summary(fit)
## lavaan 0.6-7 ended normally after 61 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         11
##                                                       
##   Number of observations                          1000
##   Number of missing patterns                         2
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              3127.833
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y1 ~                                                
##     FX                5.107    0.476   10.729    0.000
##     FXsize           -0.043    0.178   -0.240    0.810
##   Y2 ~                                                
##     FX                1.032    0.546    1.890    0.059
##     FXsize            2.587    0.205   12.652    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Y1 ~~                                               
##    .Y2               12.606    1.324    9.522    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y1                0.146    0.187    0.779    0.436
##    .Y2               -0.346    0.214   -1.622    0.105
##     FX                0.000                           
##     FXsize            0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     FX                3.472    0.155   22.353    0.000
##    .Y1               34.985    1.566   22.344    0.000
##    .Y2               45.493    2.036   22.347    0.000
##     FXsize           24.735    1.106   22.358    0.000

We would conclude that my line of reasoning holds true with respect to Y2 but not with respect to Y1.

9 Alternative Approaches

The above micro-macro models may also be estimated within Mplus developed by Muthén and Muthén www.statmodel.com. If you have Mplus installed on your computer, you may use the R package MplusAutomation to estimate models with Mplus within R.


  1. Images adapted from: Bennink, M., Croon, M. A., Kroon, B., & Vermunt, J. K. (2016). Micro–macro multilevel latent class models with multiple discrete individual-level variables. Advances in Data Analysis and Classification, 10(2), 139-154.↩︎

  2. Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological bulletin, 103(3), 411.↩︎

Next