Consequences of Dyads (methods)

1 Introduction
2 Before you start
3 Data
- 3.1 Simulate data
- 3.2 Describe data
4 Check for interdependencies
5 Hypothesis testing
6 APIM
7 Assignment
8 Alternative approaches
- 8.1 multilevel

Figure 1: Basic APIM model

1 Introduction

In this assignment/tutorial I will demonstrate how to estimate an APIM with the R package Lavaan. The Lavaan website can be found here. During the workgroup I will explain all code. For those of you who don’t attend the workgroups, google knows way more than I do.

In the upper right corner of the code blocks you will find a copy-to-clipboard button. Use this button to copy the code to your own editor.

2 Before you start

Before you start, check whether you run the latest RStudio version (from the Help menu, pick ‘check for updates’ and whether you need to update R.

install.packages("installr")  #you  first install packages
require(installr)  #then you will need to activate packages. 
updateR()  #run the function to start the update process

Give your script a nice name. Include the author, and data when you last modified the script. Include a lot of comments in your script! Don’t forget, always start with cleaning up your workspace.

### Author: JOCHEM TOLSMA### Lastmod: 31-08-2020###

# cleanup workspace
rm(list = ls())

And set your working directory.

# set working directory
setwd("C:\\YOURDIR\\YOURSUBDIR\\YOURSUBSUBDIR\\")  #change to your own workdirectory

Install the packages you will need.

# install packages
install.packages("lavaan", dependencies = TRUE)  # to estimate the APIM model
install.packages("psych")  # to describe our dataset
install.packages("MASS")  # to simulate variables from a multi-variate normal distribution
install.packages("nlme")  # for the multilevel models

Q1: Where and how can you find additional information on R packages?

A1:

google
The R Cran website go the package
- check out the manual
- see if the package has vignettes
- read the literature mentioned in the package
Check if the package has a designated websites.

3 Data

3.1 Simulate data

If I try to get an understanding of a new method, I usually use a simulated dataset. Then I at least know what the world looks like (I know what I have put in so I know what I should get out of the model). For you guys and gals, it is not necessary to understand the simulation process but feel free to have a close look.

require(MASS)
set.seed(9864)  #We set a seed. In this we the random numbers we will generate be the same and we thus end up with the same dataset. Please not that to be absolutaly sure to get the same dataset, we need to run the same R version (and packages)

# let us start with simulating the age of both partners. Naturally, we introduce some correlation
# between the two.
Sigma <- matrix(c(10, 4, 4, 5), 2, 2)
age <- mvrnorm(n = 1000, mu = c(20, 25), Sigma)
age_W <- age[, 1]
age_M <- age[, 2]
# we do the same for the educational level of both partners.
Sigma <- matrix(c(4, 3, 3, 5), 2, 2)
educ <- mvrnorm(n = 1000, mu = c(10, 8), Sigma)
educ_W <- round(educ[, 1])
educ_M <- round(educ[, 2])
# let us simulate a dyad-level variable: the number of kids.
nkids <- sample.int(n = 5, size = 1000, replace = TRUE)
# introduce some randomness at the dyad or household level.
HH_rand <- rnorm(n = 1000, 0, 6)
# and introduce some unique randomness for each partner
W_rand <- rnorm(n = 1000, 2, 4)
M_rand <- rnorm(n = 1000, 5, 1)
# lets construct a dependent variable for the women.  note that both characteristics of ego (the
# women in this case), alter (the men in this case) and the dyad/household have an impact on the
# score of the women.
Y_W <- 0.3 * age_W + 0.7 * educ_W + 0.1 * age_M + 0.6 * educ_M + -2 * nkids + HH_rand + W_rand
# let's do something similar for the men but with different effect sizes.
Y_M <- 0.1 * age_M + 0.7 * educ_M + 0.2 * educ_W + -2 * nkids + HH_rand + M_rand
# Introduce some influence effects.
Sigma <- matrix(c(8, 5, 5, 10), 2, 2)
errort2 <- mvrnorm(n = 1000, mu = c(0, 0), Sigma)
Y_Wt2 <- (mean(Y_W) + 3) + 0.6 * (Y_W - mean(Y_W)) + 0.4 * (Y_M - mean(Y_M)) + 0.15 * age_W + 0.4 * educ_W + 
    0.1 * age_M + 0.3 * educ_M + -1 * nkids + errort2[, 1]
Y_Mt2 <- (mean(Y_M) + 5) + 0.5 * (Y_W - mean(Y_W)) + 0.7 * (Y_M - mean(Y_M)) + 0.05 * age_M + 0.2 * educ_M + 
    0.2 * educ_W + -1 * nkids + errort2[, 2]
# and let's put everything together
data <- data.frame(age_M, age_W, educ_M, educ_W, nkids, Y_M, Y_W, Y_Mt2, Y_Wt2)
# add some description to the data
attr(data, "description") <- "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one couple/household. Variables with \"_W\" refer to women,\"_M\" refer to men. Variables with no underscore refer to the couple. The dependent variable is measured at two occassions."

# I don't think the variables need any further description.

Q2: Which functions did we use to simulate data? Please list all used functions. Make a good habit of writing down all new functions you encounter.

A2:

require
set.seed
matrix
c
mvrnorm
round
sample.int
rnorm
data.frame
attr

Q3: What is the difference between (), r{} and r[] in R?

A3:

After a function we use (). Within () we identify the arguments of the function. Example: function().
{} is used to combine different lines in R (and in if else statements). Example: {some code here
and some code here}
[] and [[]] are used to extract elements from objects such as vectors, matrix, arrays, lists. Example: object[1].

3.2 Describe data

Lets have a look at our data.

require(psych)
head(data)
str(data)
summary(data)
attr(data, "description")
describe(data)

##      age_M    age_W educ_M educ_W nkids        Y_M       Y_W     Y_Mt2    Y_Wt2
## 1 23.65153 17.18029      8      9     4 12.5966983 24.001964 22.827987 30.69627
## 2 27.11747 25.93062      8     10     1  0.1857714  5.201483  9.688008 25.84382
## 3 22.20588 20.02249      8      9     5  0.9591408 12.854358  7.710286 24.28463
## 4 27.48565 18.99072      6      7     3 10.8892384 11.547787 15.247796 29.79648
## 5 28.29219 19.02406      9     12     5 11.3599118 11.441439 18.001653 27.64218
## 6 24.72061 24.40805      8     12     5  6.2244939 19.168225 12.614080 23.46683
## 'data.frame':    1000 obs. of  9 variables:
##  $ age_M : num  23.7 27.1 22.2 27.5 28.3 ...
##  $ age_W : num  17.2 25.9 20 19 19 ...
##  $ educ_M: num  8 8 8 6 9 8 9 10 9 10 ...
##  $ educ_W: num  9 10 9 7 12 12 12 8 12 12 ...
##  $ nkids : int  4 1 5 3 5 5 5 5 5 4 ...
##  $ Y_M   : num  12.597 0.186 0.959 10.889 11.36 ...
##  $ Y_W   : num  24 5.2 12.9 11.5 11.4 ...
##  $ Y_Mt2 : num  22.83 9.69 7.71 15.25 18 ...
##  $ Y_Wt2 : num  30.7 25.8 24.3 29.8 27.6 ...
##  - attr(*, "description")= chr "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one"| __truncated__
##      age_M           age_W           educ_M           educ_W          nkids            Y_M         
##  Min.   :17.01   Min.   :10.66   Min.   : 1.000   Min.   : 3.00   Min.   :1.000   Min.   :-13.397  
##  1st Qu.:23.52   1st Qu.:18.14   1st Qu.: 7.000   1st Qu.: 9.00   1st Qu.:2.000   1st Qu.:  4.465  
##  Median :25.01   Median :20.22   Median : 8.000   Median :10.00   Median :3.000   Median :  9.151  
##  Mean   :25.01   Mean   :20.20   Mean   : 8.033   Mean   :10.08   Mean   :3.078   Mean   :  8.980  
##  3rd Qu.:26.55   3rd Qu.:22.17   3rd Qu.:10.000   3rd Qu.:11.00   3rd Qu.:4.000   3rd Qu.: 13.773  
##  Max.   :31.67   Max.   :30.48   Max.   :17.000   Max.   :17.00   Max.   :5.000   Max.   : 28.029  
##       Y_W             Y_Mt2             Y_Wt2       
##  Min.   :-8.637   Min.   :-15.552   Min.   : 1.074  
##  1st Qu.:10.619   1st Qu.:  8.457   1st Qu.:21.805  
##  Median :16.273   Median : 15.799   Median :27.930  
##  Mean   :16.162   Mean   : 15.639   Mean   :27.984  
##  3rd Qu.:21.741   3rd Qu.: 22.523   3rd Qu.:33.998  
##  Max.   :42.715   Max.   : 41.200   Max.   :52.909  
## [1] "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one couple/household. Variables with \"_W\" refer to women,\"_M\" refer to men. Variables with no underscore refer to the couple. The dependent variable is measured at two occassions."
##        vars    n  mean   sd median trimmed   mad    min   max range  skew kurtosis   se
## age_M     1 1000 25.01 2.27  25.01   25.02  2.25  17.01 31.67 14.66 -0.06    -0.03 0.07
## age_W     2 1000 20.20 3.18  20.22   20.19  3.01  10.66 30.48 19.81  0.05     0.01 0.10
## educ_M    3 1000  8.03 2.27   8.00    8.03  2.97   1.00 17.00 16.00  0.07     0.08 0.07
## educ_W    4 1000 10.08 2.01  10.00   10.09  1.48   3.00 17.00 14.00 -0.01     0.20 0.06
## nkids     5 1000  3.08 1.41   3.00    3.10  1.48   1.00  5.00  4.00 -0.08    -1.28 0.04
## Y_M       6 1000  8.98 6.86   9.15    9.04  6.93 -13.40 28.03 41.43 -0.10    -0.20 0.22
## Y_W       7 1000 16.16 8.12  16.27   16.17  8.27  -8.64 42.71 51.35 -0.03    -0.10 0.26
## Y_Mt2     8 1000 15.64 9.90  15.80   15.62 10.39 -15.55 41.20 56.75 -0.05    -0.25 0.31
## Y_Wt2     9 1000 27.98 8.97  27.93   27.99  9.06   1.07 52.91 51.84 -0.04    -0.16 0.28

4 Check for interdependencies

When you reached this point in the course, you should know that within social networks observations are not independent. This has theoretical and methodological consequences. A first step is therefore to assess whether our observations are interdependent (covary, or correlate). There are naive and less naive ways to check for interdependence.

For more background information see this page by David A. Kenny.
Also check out paragraph 3.3 of the book by Snijders and Boskers (1999).¹

Lets start with something that pops up in your mind immediately…a correlation and for those of you with some experience in multilevel analysis…intraclass correlation (ICC).

4.1 correlation (wide format)

cor.test(data$Y_M, data$Y_W)

## 
##  Pearson's product-moment correlation
## 
## data:  data$Y_M and data$Y_W
## t = 50.326, df = 998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8284633 0.8636065
## sample estimates:
##       cor 
## 0.8469575

This would indicate a strong and significant correlation. Remember thar our data is in wide format. A better way (i.e. the results more closely approaches the ICC) is to calculate the correlation on a long dataset. This method is called the double entry method.

4.2 correlation (long format)

var1 <- c(data$Y_M, data$Y_W)
var2 <- c(data$Y_W, data$Y_M)
cor.test(var1, var2)

## 
##  Pearson's product-moment correlation
## 
## data:  var1 and var2
## t = 25.393, df = 1998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4600764 0.5263813
## sample estimates:
##       cor 
## 0.4939466

Much lower but still significant. But what is the correct interdependence? Well that can be calculated by the ICC. And there are two definitions. See the refs above.

4.3 ICC (ANOVA tradition)

$$ ICC = \frac{(MS_B - MS_W)}{(MS_B + MS_W)} $$
where,

MS_B = VAR(X̄_dyad) * 2

and

MS_W = ∑(X_ego − X_alter)²/(2 * N_dyads)

Let’s have a go!

MSB <- var((data$Y_M + data$Y_W)/2) * 2
MSW <- (sum((data$Y_M - data$Y_W)^2))/(2 * length(data$Y_W))
ICC_anova <- (MSB - MSW)/(MSB + MSW)
ICC_anova

## [1] 0.4943247

Do you see that the correct ICC is very close to the correlation based on a dataset in long format (double entry method)? Thus in practice, the double entry method is very convenient to check for interdependencies.

4.4 ICC (regression tradition)

Most of you are probably more familiar with definitions of the ICC as provided within textbooks on multi-level analysis. Where the intraclass variance - at least for continuous dependent variables - is defined as the between variance (i.e. the variance in dyad means) divided by the total variance (i.e. the sum of the between and within variance). There is only one problem, we need these variances present in the ‘real population’. In our data we only observe the variances present in our sample. The observed between variance needs to be corrected. Below I will show you how to do that.

# first make a dataset in longformat.
dyadmean <- (data$Y_M + data$Y_W)/2
data_long <- rbind(data, data)
data_long$index <- rep(1:2, each = 1000)
data_long$dyad_id <- rep(1:1000, times = 2)
data_long$dyadmean <- c(dyadmean, dyadmean)
head(data_long)

##      age_M    age_W educ_M educ_W nkids        Y_M       Y_W     Y_Mt2    Y_Wt2 index dyad_id
## 1 23.65153 17.18029      8      9     4 12.5966983 24.001964 22.827987 30.69627     1       1
## 2 27.11747 25.93062      8     10     1  0.1857714  5.201483  9.688008 25.84382     1       2
## 3 22.20588 20.02249      8      9     5  0.9591408 12.854358  7.710286 24.28463     1       3
## 4 27.48565 18.99072      6      7     3 10.8892384 11.547787 15.247796 29.79648     1       4
## 5 28.29219 19.02406      9     12     5 11.3599118 11.441439 18.001653 27.64218     1       5
## 6 24.72061 24.40805      8     12     5  6.2244939 19.168225 12.614080 23.46683     1       6
##    dyadmean
## 1 18.299331
## 2  2.693627
## 3  6.906749
## 4 11.218513
## 5 11.400675
## 6 12.696360

# lets the first dyad entry refer to the women and the second to the men
data_long$age <- ifelse(data_long$index == 1, data_long$age_W, data_long$age_M)
data_long$educ <- ifelse(data_long$index == 1, data_long$educ_W, data_long$educ_M)
data_long$Y <- ifelse(data_long$index == 1, data_long$Y_W, data_long$Y_M)

# also define the age and educa of the partner
data_long$age_P <- ifelse(data_long$index == 2, data_long$age_W, data_long$age_M)
data_long$educ_P <- ifelse(data_long$index == 2, data_long$educ_W, data_long$educ_M)
data_long$Y_P <- ifelse(data_long$index == 2, data_long$Y_W, data_long$Y_M)

head(data_long)

##      age_M    age_W educ_M educ_W nkids        Y_M       Y_W     Y_Mt2    Y_Wt2 index dyad_id
## 1 23.65153 17.18029      8      9     4 12.5966983 24.001964 22.827987 30.69627     1       1
## 2 27.11747 25.93062      8     10     1  0.1857714  5.201483  9.688008 25.84382     1       2
## 3 22.20588 20.02249      8      9     5  0.9591408 12.854358  7.710286 24.28463     1       3
## 4 27.48565 18.99072      6      7     3 10.8892384 11.547787 15.247796 29.79648     1       4
## 5 28.29219 19.02406      9     12     5 11.3599118 11.441439 18.001653 27.64218     1       5
## 6 24.72061 24.40805      8     12     5  6.2244939 19.168225 12.614080 23.46683     1       6
##    dyadmean      age educ         Y    age_P educ_P        Y_P
## 1 18.299331 17.18029    9 24.001964 23.65153      8 12.5966983
## 2  2.693627 25.93062   10  5.201483 27.11747      8  0.1857714
## 3  6.906749 20.02249    9 12.854358 22.20588      8  0.9591408
## 4 11.218513 18.99072    7 11.547787 27.48565      6 10.8892384
## 5 11.400675 19.02406   12 11.441439 28.29219      9 11.3599118
## 6 12.696360 24.40805   12 19.168225 24.72061      8  6.2244939

With this dataset in longformat we can calculate the ICC.

# first calculate the between variance of our sample. Note that this we only need observations of the
# dyads once (thus N_dyads=1000)
S_B <- var(data_long$dyadmean[1:1000])
# within variance
SW <- sum((data_long$Y - data_long$dyadmean)^2)/1000  #we divide by the number of dyads
# We now need to correct the observed between variance to reflect the population between variance.
S_B_E <- S_B - SW/2
ICC_ML <- S_B_E/(S_B_E + SW)
ICC_ML

## [1] 0.4943247

Of course exactly similar to the ICC_anova. But this procedure is of course quite cumbersome. Thus in practice I personally simply calculate a correlation on the variabels of interest in a dataset in long format. Or, alternatively, estimate an empty multi-level model. Which also spits out the ICC (after some tweaking). See below.

require(nlme)
# estimate empty model with ML
mlme <- lme(Y ~ 1, data = data_long, random = list(~1 | dyad_id), )
summary(mlme)
# Standard deviations are reported instead of variances.  extract the variances.
VarCorr(mlme)
# the intercept variance is at the between-level. the residual variances are at the observation /
# within-level.  thus based on these numbers we may calculate the ICC ourselves.
varests <- as.numeric(VarCorr(mlme)[1:2])
ICC_MLb <- varests[1]/sum(varests)
ICC_MLb

## Linear mixed-effects model fit by REML
##  Data: data_long 
##        AIC      BIC    logLik
##   13881.45 13898.25 -6937.723
## 
## Random effects:
##  Formula: ~1 | dyad_id
##         (Intercept) Residual
## StdDev:    5.857116 5.923985
## 
## Fixed effects: Y ~ 1 
##               Value Std.Error   DF t-value p-value
## (Intercept) 12.5713 0.2277117 1000 55.2071       0
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -1.99822956 -0.62468953 -0.08257107  0.61939001  2.77997551 
## 
## Number of Observations: 2000
## Number of Groups: 1000 
## dyad_id = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 34.30581 5.857116
## Residual    35.09360 5.923985
## [1] 0.4943242

The take home message is that the two observations for each dyad are indeed correlated. Is this a lot? Is this significant?

5 Hypothesis testing

Suppose we wish to test the following hypotheses:

your own age and education is positively related to your score on Y.
the age and education of your partner is positively related to your score on Y.

5.1 OLS_pooled

Prior to this course you would probably have estimated an OLS regression analysis on your complete data in long format.

mlm <- lm(Y ~ age + educ + age_P + educ_P + nkids, data = data_long)
summary(mlm)

## 
## Call:
## lm(formula = Y ~ age + educ + age_P + educ_P + nkids, data = data_long)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.1773  -4.6337   0.0457   4.4454  23.8309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.30347    1.62515   0.802    0.423    
## age         -0.19326    0.04776  -4.047 5.39e-05 ***
## educ         1.15410    0.08291  13.920  < 2e-16 ***
## age_P        0.51442    0.04776  10.771  < 2e-16 ***
## educ_P      -0.04756    0.08291  -0.574    0.566    
## nkids       -1.95273    0.10721 -18.214  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.756 on 1994 degrees of freedom
## Multiple R-squared:  0.3438, Adjusted R-squared:  0.3422 
## F-statistic:   209 on 5 and 1994 DF,  p-value: < 2.2e-16

So, what would you conclude? We have to refute the hypo on own age and the hypo on partners educ. Obviously, the above approach does not yet take into account the interdepencies.

5.2 OLS seperately for men and women

Perhaps, you would have realized (or theorized) that the impact of own and partner’s age and educ may depend on your sex. Thus you may have estimated the model separately for men and women. There would be nothing wrong with that approach, because with one observation per dyad/household the observations can still be regarded as independent. Thus:…

mlm_W <- lm(Y_W ~ age_W + educ_W + age_M + educ_M + nkids, data = data)
summary(mlm_W)
mlm_M <- lm(Y_M ~ age_M + educ_M + age_W + educ_W + nkids, data = data)
summary(mlm_M)

## 
## Call:
## lm(formula = Y_W ~ age_W + educ_W + age_M + educ_M + nkids, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.6319  -4.8446  -0.1661   4.7004  22.8278 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.51260    2.78070  -0.544  0.58659    
## age_W        0.26767    0.08627   3.102  0.00197 ** 
## educ_W       0.89440    0.14891   6.006 2.66e-09 ***
## age_M        0.23414    0.12113   1.933  0.05352 .  
## educ_M       0.41215    0.13209   3.120  0.00186 ** 
## nkids       -1.92004    0.15944 -12.042  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.099 on 994 degrees of freedom
## Multiple R-squared:  0.239,  Adjusted R-squared:  0.2352 
## F-statistic: 62.44 on 5 and 994 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = Y_M ~ age_M + educ_M + age_W + educ_W + nkids, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.2648  -4.2281   0.3472   3.8307  17.9911 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.47847    2.33193   1.063   0.2881    
## age_M        0.19849    0.10158   1.954   0.0510 .  
## educ_M       0.59445    0.11077   5.366    1e-07 ***
## age_W       -0.01770    0.07235  -0.245   0.8067    
## educ_W       0.32121    0.12488   2.572   0.0103 *  
## nkids       -1.98687    0.13371 -14.859   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.953 on 994 degrees of freedom
## Multiple R-squared:  0.2517, Adjusted R-squared:  0.2479 
## F-statistic: 66.86 on 5 and 994 DF,  p-value: < 2.2e-16

So, what would you conclude? For women we cannot refute any of the hypo. For men we have to refute the hypo on the impact of partner’s age. Clearly, both the actor-effects and partner-effects differ for men and women. A disadvantage of the approach above is that although it takes into account that interdependencies are a nuisance it does not yet take the interdependencies as something of interest. Other disadvantages are that it is hard to test whether partner effects significantly differ for men and women, or to set constraints on specific effects.

5.3 SEM with lavaan

One approach is to estimate both models simultaneously within a SEM framework. And this is the APIM for non longitudinal designs.

require(lavaan)
myModel <- {
    "# regressions
Y_M ~ age_M + educ_M + age_W + educ_W + b1*nkids
Y_W ~ age_M + educ_M + age_W + educ_W + b2*nkids
#contraints
b1==b2
# variances and covariances
Y_M ~~ Y_M
Y_M ~~ Y_W
Y_W ~~ Y_W
age_M ~~ age_W
educ_M ~~ educ_W
# intercepts
Y_M ~ 1
Y_W ~ 1
"
}

fit <- sem(myModel, data = data)
summary(fit, standardized = FALSE)

## lavaan 0.6-7 ended normally after 95 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         27
##   Number of equality constraints                     1
##                                                       
##   Number of observations                          1000
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                11.028
##   Degrees of freedom                                 9
##   P-value (Chi-square)                           0.274
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y_M ~                                               
##     age_M             0.199    0.101    1.968    0.049
##     educ_M            0.595    0.110    5.400    0.000
##     age_W            -0.018    0.072   -0.245    0.806
##     educ_W            0.321    0.124    2.581    0.010
##     nkids     (b1)   -1.983    0.133  -14.923    0.000
##   Y_W ~                                               
##     age_M             0.232    0.120    1.931    0.054
##     educ_M            0.410    0.131    3.124    0.002
##     age_W             0.267    0.086    3.110    0.002
##     educ_W            0.896    0.148    6.038    0.000
##     nkids     (b2)   -1.983    0.133  -14.923    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Y_M ~~                                              
##    .Y_W              34.242    1.714   19.979    0.000
##   age_M ~~                                            
##     age_W             4.150    0.264   15.746    0.000
##   educ_M ~~                                           
##     educ_W            2.990    0.172   17.392    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y_M               2.463    2.316    1.063    0.288
##    .Y_W              -1.264    2.749   -0.460    0.646
##     age_M            25.006    0.072  348.097    0.000
##     educ_M            8.033    0.072  112.134    0.000
##     age_W            20.201    0.101  200.776    0.000
##     educ_W           10.076    0.063  158.954    0.000
##     nkids             3.078    0.045   68.966    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y_M              35.229    1.576   22.361    0.000
##    .Y_W              50.102    2.241   22.361    0.000
##     age_M             5.160    0.231   22.361    0.000
##     educ_M            5.132    0.230   22.361    0.000
##     age_W            10.123    0.453   22.361    0.000
##     educ_W            4.018    0.180   22.361    0.000
##     nkids             1.992    0.089   22.361    0.000
## 
## Constraints:
##                                                |Slack|
##     b1 - (b2)                                    0.000

So, what would you conclude now!!?? Exact from the estimate referring the the number of children we obtain very similar results (most differences due to rounding errors). Can we conclude anything with respect to influence versus selection processes?

6 APIM

The key use of APIM models is to assess bidirectional effects within longitudinal designs. Suppose we have measured our outcome variable at two occasions. We formulate our hypotheses:

your own age and education (measured at time 1) is positively related to your score on Y (measured at time 2), even if we take into account stability in outcome Y. Alternatively phrasing, your age and education is positively related to a positive change in your Y.
your partner’s score on outcome Y (measured at time 1) has an impact on your own score on Y (measured at time 2). Alternative phrasing, your partner’s score is positively related to a positive change in your Y.
the age and education of your partner (measured at time 1) is positively related to your score on Y (measured at time 2), even if we take into account stability in outcome Y and that your partner’s score on Y impacts your outcome Y.

myModel <- {
    "# regressions
Y_Mt2 ~ Y_M + Y_W + age_M + educ_M + age_W + educ_W + b1*nkids
Y_Wt2 ~ Y_M + Y_W + age_M + educ_M + age_W + educ_W + b2*nkids
#contraints
b1==b2
# variances and covariances
Y_Mt2 ~~ Y_Mt2
Y_Mt2 ~~ Y_Wt2

Y_M ~~ Y_M + age_M + educ_M + age_W + educ_W + b3*nkids
Y_W ~~ Y_W + age_M + educ_M + age_W + educ_W + b3*nkids
age_M ~~ age_M + age_W + educ_M + educ_W
age_W ~~ age_W + educ_M + educ_W
educ_M ~~ educ_M + educ_W
educ_W ~~ educ_W

# intercepts
Y_M ~ 1
Y_W ~ 1
Y_Mt2 ~ 1
Y_Wt2 ~ 1
"
}

fit <- sem(myModel, data = data)
summary(fit, standardized = FALSE)

## lavaan 0.6-7 ended normally after 183 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         49
##   Number of equality constraints                     2
##                                                       
##   Number of observations                          1000
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                              1285.561
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Y_Mt2 ~                                             
##     Y_M               0.730    0.016   46.529    0.000
##     Y_W               0.457    0.014   32.849    0.000
##     age_M             0.066    0.052    1.269    0.204
##     educ_M            0.130    0.057    2.273    0.023
##     age_W             0.086    0.037    2.304    0.021
##     educ_W            0.279    0.065    4.283    0.000
##     nkids     (b1)   -1.090    0.062  -17.466    0.000
##   Y_Wt2 ~                                             
##     Y_M               0.410    0.014   28.542    0.000
##     Y_W               0.577    0.013   45.363    0.000
##     age_M             0.119    0.047    2.520    0.012
##     educ_M            0.195    0.052    3.729    0.000
##     age_W             0.194    0.034    5.699    0.000
##     educ_W            0.483    0.059    8.126    0.000
##     nkids     (b2)   -1.090    0.062  -17.466    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .Y_Mt2 ~~                                            
##    .Y_Wt2             4.662    0.303   15.387    0.000
##   Y_M ~~                                              
##     age_M            -1.966    0.477   -4.123    0.000
##     educ_M            2.617    0.469    5.583    0.000
##     age_W            -6.673    0.711   -9.380    0.000
##     educ_W           -0.981    0.402   -2.439    0.015
##     nkids     (b3)   -2.204    0.220  -10.020    0.000
##   Y_W ~~                                              
##     age_M             4.765    0.606    7.864    0.000
##     educ_M            2.448    0.562    4.360    0.000
##     age_W            10.738    0.916   11.719    0.000
##     educ_W            6.036    0.552   10.934    0.000
##     nkids     (b3)   -2.204    0.220  -10.020    0.000
##   age_M ~~                                            
##     age_W             4.957    0.301   16.459    0.000
##     educ_M            0.168    0.164    1.027    0.304
##     educ_W            0.335    0.151    2.216    0.027
##   educ_M ~~                                           
##     age_W            -0.211    0.246   -0.856    0.392
##   age_W ~~                                            
##     educ_W            0.719    0.227    3.173    0.002
##   educ_M ~~                                           
##     educ_W            2.786    0.169   16.524    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Y_M               8.980    0.215   41.691    0.000
##     Y_W              16.162    0.259   62.433    0.000
##    .Y_Mt2            -2.187    1.186   -1.844    0.065
##    .Y_Wt2             5.011    1.084    4.621    0.000
##     age_M            25.006    0.074  338.847    0.000
##     educ_M            8.033    0.070  114.331    0.000
##     age_W            20.201    0.111  182.085    0.000
##     educ_W           10.076    0.065  155.501    0.000
##     nkids             3.078    0.043   71.683    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Y_Mt2             9.182    0.411   22.361    0.000
##     Y_M              46.398    2.022   22.944    0.000
##     Y_W              67.017    2.924   22.923    0.000
##     age_M             5.446    0.243   22.382    0.000
##     age_W            12.308    0.544   22.611    0.000
##     educ_M            4.937    0.221   22.373    0.000
##     educ_W            4.199    0.188   22.375    0.000
##    .Y_Wt2             7.629    0.341   22.361    0.000
##     nkids             1.844    0.082   22.411    0.000
## 
## Constraints:
##                                                |Slack|
##     b1 - (b2)                                    0.000

What are our final conclusions?
We observe clear actor and partner effects:

Y_W predicts both Y_Wt2 (actor effect) and Y_Mt2 (partner effect)
Y_M predicts both Y_Mt2 (actor effect) and Y_Wt2 (partner effect)
Even if we take into account that both a person’s own score and partner’s score predict a person’s own outcome, we also observe that a person’s own characteristic and partner’s characteristic predict how a person’s own outcome develops over time.
the reported covariances may be the result of both selection and influence processes.
the reported regression partner-effects are (more likely) the result of influence processes.

In sum,

7 Assignment

Time to practice.

Assignment
Test whether partner effects are significantly different from each other. Use the SEM-framework and estimate a model with and without constraints, compare the model fit of the nested models.
To what extent do actor-variables explain the covariance between the scores on Y? Can we interpret this as a selection effect?
To what extent do partner-variables explain the covariance between the scores on Y? Can we interpret this as an influence effect?
To what extent do dyad-variables explain the covariance between the scores on Y? Let us assume the couples got their kids after they got married. Can we interpret this as an influence effect?
Estimate an APIM model. Thus include also the measurement of Y on timepoint2 in your SEM (Y_Wt2 and Y_Mt2).
Are there significant partner-effects? And which one could we interpret as influence effects?

8 Alternative approaches

8.1 multilevel

We could also estimate APIM models within the multi-level framework. To give you some pointers, see the code below.

# index==1 refers to women.
mlme <- lme(Y ~ 0 + as.factor(index) + age:as.factor(index) + educ:as.factor(index) + age_P:as.factor(index) + 
    educ_P:as.factor(index) + nkids, data = data_long, weights = varIdent(form = ~1 | index), random = list(~0 + 
    as.factor(index) | dyad_id), method = "REML", control = lmeControl(maxIter = 1000, msMaxIter = 1000, 
    niterEM = 500, msMaxEval = 1000))
summary(mlme)

## Linear mixed-effects model fit by REML
##  Data: data_long 
##       AIC      BIC   logLik
##   12125.1 12214.63 -6046.55
## 
## Random effects:
##  Formula: ~0 + as.factor(index) | dyad_id
##  Structure: General positive-definite, Log-Cholesky parametrization
##                   StdDev   Corr  
## as.factor(index)1 6.852783 as.()1
## as.factor(index)2 5.658064 0.888 
## Residual          1.851593       
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | index 
##  Parameter estimates:
## 1 2 
## 1 1 
## Fixed effects: Y ~ 0 + as.factor(index) + age:as.factor(index) + educ:as.factor(index) +      age_P:as.factor(index) + educ_P:as.factor(index) + nkids 
##                               Value Std.Error  DF    t-value p-value
## as.factor(index)1        -1.2644776 2.7592459 991  -0.458269  0.6469
## as.factor(index)2         2.4630262 2.3318270 991   1.056265  0.2911
## nkids                    -1.9829508 0.1336022 999 -14.842201  0.0000
## as.factor(index)1:age     0.2672088 0.0862664 991   3.097486  0.0020
## as.factor(index)2:age     0.1985992 0.1015790 991   1.955121  0.0508
## as.factor(index)1:educ    0.8958841 0.1488877 991   6.017180  0.0000
## as.factor(index)2:educ    0.5945737 0.1107739 991   5.367452  0.0000
## as.factor(index)1:age_P   0.2323626 0.1210941 991   1.918859  0.0553
## as.factor(index)2:age_P  -0.0176752 0.0723510 991  -0.244298  0.8071
## as.factor(index)1:educ_P  0.4101990 0.1320552 991   3.106269  0.0019
## as.factor(index)2:educ_P  0.3211152 0.1248796 991   2.571398  0.0103
##  Correlation: 
##                          as.()1 as.()2 nkids  as.fctr(ndx)1:g as.fctr(ndx)2:g as.fctr(ndx)1:d
## as.factor(index)2         0.823                                                              
## nkids                    -0.191 -0.226                                                       
## as.factor(index)1:age    -0.013 -0.011  0.011                                                
## as.factor(index)2:age    -0.594 -0.723  0.037 -0.468                                         
## as.factor(index)1:educ   -0.308 -0.248 -0.021 -0.008           0.019                         
## as.factor(index)2:educ    0.013  0.017  0.037  0.032          -0.068          -0.537         
## as.factor(index)1:age_P  -0.726 -0.589  0.031 -0.575           0.815           0.024         
## as.factor(index)2:age_P  -0.012 -0.014  0.013  0.815          -0.574          -0.007         
## as.factor(index)1:educ_P  0.019  0.013  0.031  0.039          -0.055          -0.659         
## as.factor(index)2:educ_P -0.250 -0.304 -0.025 -0.007           0.023           0.815         
##                          as.fctr(ndx)2:d as.fctr(ndx)1:g_P as.fctr(ndx)2:g_P as.fctr(ndx)1:d_P
## as.factor(index)2                                                                             
## nkids                                                                                         
## as.factor(index)1:age                                                                         
## as.factor(index)2:age                                                                         
## as.factor(index)1:educ                                                                        
## as.factor(index)2:educ                                                                        
## as.factor(index)1:age_P  -0.055                                                               
## as.factor(index)2:age_P   0.039          -0.468                                               
## as.factor(index)1:educ_P  0.815          -0.068             0.032                             
## as.factor(index)2:educ_P -0.659           0.019            -0.008            -0.537           
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -1.56653057 -0.33278498 -0.01544818  0.33944831  1.65762870 
## 
## Number of Observations: 2000
## Number of Groups: 1000

Latest Version: 08-01-2021

Please email any comments to: j.tolsma@ru.nl

Snijders, T.A.B. $ Boskers, R.J. (1999) Multilevel Analysis. An introduction to basic and advanced multilevel modeling. London: Sage Publications↩︎

Last updated on Aug 19, 2020