Consequences of Dyads (methods)
Figure 1: Basic APIM model
1 Introduction
In this assignment/tutorial I will demonstrate how to estimate an APIM with the R package Lavaan. The Lavaan website can be found here. During the workgroup I will explain all code. For those of you who don’t attend the workgroups, google knows way more than I do.
2 Before you start
Before you start, check whether you run the latest RStudio version (from the Help menu, pick ‘check for updates’ and whether you need to update R.
install.packages("installr") #you first install packages
require(installr) #then you will need to activate packages.
updateR() #run the function to start the update process
Give your script a nice name. Include the author, and data when you last modified the script. Include a lot of comments in your script! Don’t forget, always start with cleaning up your workspace.
And set your working directory.
# set working directory
setwd("C:\\YOURDIR\\YOURSUBDIR\\YOURSUBSUBDIR\\") #change to your own workdirectory
Install the packages you will need.
# install packages
install.packages("lavaan", dependencies = TRUE) # to estimate the APIM model
install.packages("psych") # to describe our dataset
install.packages("MASS") # to simulate variables from a multi-variate normal distribution
install.packages("nlme") # for the multilevel models
Q1: Where and how can you find additional information on R packages?
A1:
- google
- The R Cran website go the package
- check out the manual
- see if the package has vignettes
- read the literature mentioned in the package
- check out the manual
- Check if the package has a designated websites.
3 Data
3.1 Simulate data
If I try to get an understanding of a new method, I usually use a simulated dataset. Then I at least know what the world looks like (I know what I have put in so I know what I should get out of the model). For you guys and gals, it is not necessary to understand the simulation process but feel free to have a close look.
require(MASS)
set.seed(9864) #We set a seed. In this we the random numbers we will generate be the same and we thus end up with the same dataset. Please not that to be absolutaly sure to get the same dataset, we need to run the same R version (and packages)
# let us start with simulating the age of both partners. Naturally, we introduce some correlation
# between the two.
Sigma <- matrix(c(10, 4, 4, 5), 2, 2)
age <- mvrnorm(n = 1000, mu = c(20, 25), Sigma)
age_W <- age[, 1]
age_M <- age[, 2]
# we do the same for the educational level of both partners.
Sigma <- matrix(c(4, 3, 3, 5), 2, 2)
educ <- mvrnorm(n = 1000, mu = c(10, 8), Sigma)
educ_W <- round(educ[, 1])
educ_M <- round(educ[, 2])
# let us simulate a dyad-level variable: the number of kids.
nkids <- sample.int(n = 5, size = 1000, replace = TRUE)
# introduce some randomness at the dyad or household level.
HH_rand <- rnorm(n = 1000, 0, 6)
# and introduce some unique randomness for each partner
W_rand <- rnorm(n = 1000, 2, 4)
M_rand <- rnorm(n = 1000, 5, 1)
# lets construct a dependent variable for the women. note that both characteristics of ego (the
# women in this case), alter (the men in this case) and the dyad/household have an impact on the
# score of the women.
Y_W <- 0.3 * age_W + 0.7 * educ_W + 0.1 * age_M + 0.6 * educ_M + -2 * nkids + HH_rand + W_rand
# let's do something similar for the men but with different effect sizes.
Y_M <- 0.1 * age_M + 0.7 * educ_M + 0.2 * educ_W + -2 * nkids + HH_rand + M_rand
# Introduce some influence effects.
Sigma <- matrix(c(8, 5, 5, 10), 2, 2)
errort2 <- mvrnorm(n = 1000, mu = c(0, 0), Sigma)
Y_Wt2 <- (mean(Y_W) + 3) + 0.6 * (Y_W - mean(Y_W)) + 0.4 * (Y_M - mean(Y_M)) + 0.15 * age_W + 0.4 * educ_W +
0.1 * age_M + 0.3 * educ_M + -1 * nkids + errort2[, 1]
Y_Mt2 <- (mean(Y_M) + 5) + 0.5 * (Y_W - mean(Y_W)) + 0.7 * (Y_M - mean(Y_M)) + 0.05 * age_M + 0.2 * educ_M +
0.2 * educ_W + -1 * nkids + errort2[, 2]
# and let's put everything together
data <- data.frame(age_M, age_W, educ_M, educ_W, nkids, Y_M, Y_W, Y_Mt2, Y_Wt2)
# add some description to the data
attr(data, "description") <- "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one couple/household. Variables with \"_W\" refer to women,\"_M\" refer to men. Variables with no underscore refer to the couple. The dependent variable is measured at two occassions."
# I don't think the variables need any further description.
Q2: Which functions did we use to simulate data? Please list all used functions. Make a good habit of writing down all new functions you encounter.
A2:
- require
- set.seed
- matrix
- c
- mvrnorm
- round
- sample.int
- rnorm
- data.frame
- attr
Q3: What is the difference between ()
, r{}
and r[]
in R?
A3:
- After a function we use
()
. Within()
we identify the arguments of the function. Example:function()
. {}
is used to combine different lines in R (and in if else statements). Example:{some code here
and some code here}
[]
and[[]]
are used to extract elements from objects such as vectors, matrix, arrays, lists. Example:object[1]
.
3.2 Describe data
Lets have a look at our data.
## age_M age_W educ_M educ_W nkids Y_M Y_W Y_Mt2 Y_Wt2
## 1 23.65153 17.18029 8 9 4 12.5966983 24.001964 22.827987 30.69627
## 2 27.11747 25.93062 8 10 1 0.1857714 5.201483 9.688008 25.84382
## 3 22.20588 20.02249 8 9 5 0.9591408 12.854358 7.710286 24.28463
## 4 27.48565 18.99072 6 7 3 10.8892384 11.547787 15.247796 29.79648
## 5 28.29219 19.02406 9 12 5 11.3599118 11.441439 18.001653 27.64218
## 6 24.72061 24.40805 8 12 5 6.2244939 19.168225 12.614080 23.46683
## 'data.frame': 1000 obs. of 9 variables:
## $ age_M : num 23.7 27.1 22.2 27.5 28.3 ...
## $ age_W : num 17.2 25.9 20 19 19 ...
## $ educ_M: num 8 8 8 6 9 8 9 10 9 10 ...
## $ educ_W: num 9 10 9 7 12 12 12 8 12 12 ...
## $ nkids : int 4 1 5 3 5 5 5 5 5 4 ...
## $ Y_M : num 12.597 0.186 0.959 10.889 11.36 ...
## $ Y_W : num 24 5.2 12.9 11.5 11.4 ...
## $ Y_Mt2 : num 22.83 9.69 7.71 15.25 18 ...
## $ Y_Wt2 : num 30.7 25.8 24.3 29.8 27.6 ...
## - attr(*, "description")= chr "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one"| __truncated__
## age_M age_W educ_M educ_W nkids Y_M
## Min. :17.01 Min. :10.66 Min. : 1.000 Min. : 3.00 Min. :1.000 Min. :-13.397
## 1st Qu.:23.52 1st Qu.:18.14 1st Qu.: 7.000 1st Qu.: 9.00 1st Qu.:2.000 1st Qu.: 4.465
## Median :25.01 Median :20.22 Median : 8.000 Median :10.00 Median :3.000 Median : 9.151
## Mean :25.01 Mean :20.20 Mean : 8.033 Mean :10.08 Mean :3.078 Mean : 8.980
## 3rd Qu.:26.55 3rd Qu.:22.17 3rd Qu.:10.000 3rd Qu.:11.00 3rd Qu.:4.000 3rd Qu.: 13.773
## Max. :31.67 Max. :30.48 Max. :17.000 Max. :17.00 Max. :5.000 Max. : 28.029
## Y_W Y_Mt2 Y_Wt2
## Min. :-8.637 Min. :-15.552 Min. : 1.074
## 1st Qu.:10.619 1st Qu.: 8.457 1st Qu.:21.805
## Median :16.273 Median : 15.799 Median :27.930
## Mean :16.162 Mean : 15.639 Mean :27.984
## 3rd Qu.:21.741 3rd Qu.: 22.523 3rd Qu.:33.998
## Max. :42.715 Max. : 41.200 Max. :52.909
## [1] "This is a simulated dataset to illustrate the use of APIM. The dataset is in wide-format: one row refers to one couple/household. Variables with \"_W\" refer to women,\"_M\" refer to men. Variables with no underscore refer to the couple. The dependent variable is measured at two occassions."
## vars n mean sd median trimmed mad min max range skew kurtosis se
## age_M 1 1000 25.01 2.27 25.01 25.02 2.25 17.01 31.67 14.66 -0.06 -0.03 0.07
## age_W 2 1000 20.20 3.18 20.22 20.19 3.01 10.66 30.48 19.81 0.05 0.01 0.10
## educ_M 3 1000 8.03 2.27 8.00 8.03 2.97 1.00 17.00 16.00 0.07 0.08 0.07
## educ_W 4 1000 10.08 2.01 10.00 10.09 1.48 3.00 17.00 14.00 -0.01 0.20 0.06
## nkids 5 1000 3.08 1.41 3.00 3.10 1.48 1.00 5.00 4.00 -0.08 -1.28 0.04
## Y_M 6 1000 8.98 6.86 9.15 9.04 6.93 -13.40 28.03 41.43 -0.10 -0.20 0.22
## Y_W 7 1000 16.16 8.12 16.27 16.17 8.27 -8.64 42.71 51.35 -0.03 -0.10 0.26
## Y_Mt2 8 1000 15.64 9.90 15.80 15.62 10.39 -15.55 41.20 56.75 -0.05 -0.25 0.31
## Y_Wt2 9 1000 27.98 8.97 27.93 27.99 9.06 1.07 52.91 51.84 -0.04 -0.16 0.28
4 Check for interdependencies
When you reached this point in the course, you should know that within social networks observations are not independent. This has theoretical and methodological consequences. A first step is therefore to assess whether our observations are interdependent (covary, or correlate). There are naive and less naive ways to check for interdependence.
For more background information see this page by David A. Kenny.
Also check out paragraph 3.3 of the book by Snijders and Boskers (1999).1
Lets start with something that pops up in your mind immediately…a correlation and for those of you with some experience in multilevel analysis…intraclass correlation (ICC).
4.1 correlation (wide format)
##
## Pearson's product-moment correlation
##
## data: data$Y_M and data$Y_W
## t = 50.326, df = 998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8284633 0.8636065
## sample estimates:
## cor
## 0.8469575
This would indicate a strong and significant correlation. Remember thar our data is in wide format. A better way (i.e. the results more closely approaches the ICC) is to calculate the correlation on a long dataset. This method is called the double entry method.
4.2 correlation (long format)
##
## Pearson's product-moment correlation
##
## data: var1 and var2
## t = 25.393, df = 1998, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4600764 0.5263813
## sample estimates:
## cor
## 0.4939466
Much lower but still significant. But what is the correct interdependence? Well that can be calculated by the ICC. And there are two definitions. See the refs above.
4.3 ICC (ANOVA tradition)
$$ ICC = \frac{(MS_B - MS_W)}{(MS_B + MS_W)} $$
where,
MSB = VAR(X̄dyad) * 2
and
MSW = ∑(Xego − Xalter)2/(2 * Ndyads)
Let’s have a go!
MSB <- var((data$Y_M + data$Y_W)/2) * 2
MSW <- (sum((data$Y_M - data$Y_W)^2))/(2 * length(data$Y_W))
ICC_anova <- (MSB - MSW)/(MSB + MSW)
ICC_anova
## [1] 0.4943247
Do you see that the correct ICC is very close to the correlation based on a dataset in long format (double entry method)? Thus in practice, the double entry method is very convenient to check for interdependencies.
4.4 ICC (regression tradition)
Most of you are probably more familiar with definitions of the ICC as provided within textbooks on multi-level analysis. Where the intraclass variance - at least for continuous dependent variables - is defined as the between variance (i.e. the variance in dyad means) divided by the total variance (i.e. the sum of the between and within variance). There is only one problem, we need these variances present in the ‘real population’. In our data we only observe the variances present in our sample. The observed between variance needs to be corrected. Below I will show you how to do that.
# first make a dataset in longformat.
dyadmean <- (data$Y_M + data$Y_W)/2
data_long <- rbind(data, data)
data_long$index <- rep(1:2, each = 1000)
data_long$dyad_id <- rep(1:1000, times = 2)
data_long$dyadmean <- c(dyadmean, dyadmean)
head(data_long)
## age_M age_W educ_M educ_W nkids Y_M Y_W Y_Mt2 Y_Wt2 index dyad_id
## 1 23.65153 17.18029 8 9 4 12.5966983 24.001964 22.827987 30.69627 1 1
## 2 27.11747 25.93062 8 10 1 0.1857714 5.201483 9.688008 25.84382 1 2
## 3 22.20588 20.02249 8 9 5 0.9591408 12.854358 7.710286 24.28463 1 3
## 4 27.48565 18.99072 6 7 3 10.8892384 11.547787 15.247796 29.79648 1 4
## 5 28.29219 19.02406 9 12 5 11.3599118 11.441439 18.001653 27.64218 1 5
## 6 24.72061 24.40805 8 12 5 6.2244939 19.168225 12.614080 23.46683 1 6
## dyadmean
## 1 18.299331
## 2 2.693627
## 3 6.906749
## 4 11.218513
## 5 11.400675
## 6 12.696360
# lets the first dyad entry refer to the women and the second to the men
data_long$age <- ifelse(data_long$index == 1, data_long$age_W, data_long$age_M)
data_long$educ <- ifelse(data_long$index == 1, data_long$educ_W, data_long$educ_M)
data_long$Y <- ifelse(data_long$index == 1, data_long$Y_W, data_long$Y_M)
# also define the age and educa of the partner
data_long$age_P <- ifelse(data_long$index == 2, data_long$age_W, data_long$age_M)
data_long$educ_P <- ifelse(data_long$index == 2, data_long$educ_W, data_long$educ_M)
data_long$Y_P <- ifelse(data_long$index == 2, data_long$Y_W, data_long$Y_M)
head(data_long)
## age_M age_W educ_M educ_W nkids Y_M Y_W Y_Mt2 Y_Wt2 index dyad_id
## 1 23.65153 17.18029 8 9 4 12.5966983 24.001964 22.827987 30.69627 1 1
## 2 27.11747 25.93062 8 10 1 0.1857714 5.201483 9.688008 25.84382 1 2
## 3 22.20588 20.02249 8 9 5 0.9591408 12.854358 7.710286 24.28463 1 3
## 4 27.48565 18.99072 6 7 3 10.8892384 11.547787 15.247796 29.79648 1 4
## 5 28.29219 19.02406 9 12 5 11.3599118 11.441439 18.001653 27.64218 1 5
## 6 24.72061 24.40805 8 12 5 6.2244939 19.168225 12.614080 23.46683 1 6
## dyadmean age educ Y age_P educ_P Y_P
## 1 18.299331 17.18029 9 24.001964 23.65153 8 12.5966983
## 2 2.693627 25.93062 10 5.201483 27.11747 8 0.1857714
## 3 6.906749 20.02249 9 12.854358 22.20588 8 0.9591408
## 4 11.218513 18.99072 7 11.547787 27.48565 6 10.8892384
## 5 11.400675 19.02406 12 11.441439 28.29219 9 11.3599118
## 6 12.696360 24.40805 12 19.168225 24.72061 8 6.2244939
With this dataset in longformat we can calculate the ICC.
# first calculate the between variance of our sample. Note that this we only need observations of the
# dyads once (thus N_dyads=1000)
S_B <- var(data_long$dyadmean[1:1000])
# within variance
SW <- sum((data_long$Y - data_long$dyadmean)^2)/1000 #we divide by the number of dyads
# We now need to correct the observed between variance to reflect the population between variance.
S_B_E <- S_B - SW/2
ICC_ML <- S_B_E/(S_B_E + SW)
ICC_ML
## [1] 0.4943247
Of course exactly similar to the ICC_anova. But this procedure is of course quite cumbersome. Thus in practice I personally simply calculate a correlation on the variabels of interest in a dataset in long format. Or, alternatively, estimate an empty multi-level model. Which also spits out the ICC (after some tweaking). See below.
require(nlme)
# estimate empty model with ML
mlme <- lme(Y ~ 1, data = data_long, random = list(~1 | dyad_id), )
summary(mlme)
# Standard deviations are reported instead of variances. extract the variances.
VarCorr(mlme)
# the intercept variance is at the between-level. the residual variances are at the observation /
# within-level. thus based on these numbers we may calculate the ICC ourselves.
varests <- as.numeric(VarCorr(mlme)[1:2])
ICC_MLb <- varests[1]/sum(varests)
ICC_MLb
## Linear mixed-effects model fit by REML
## Data: data_long
## AIC BIC logLik
## 13881.45 13898.25 -6937.723
##
## Random effects:
## Formula: ~1 | dyad_id
## (Intercept) Residual
## StdDev: 5.857116 5.923985
##
## Fixed effects: Y ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 12.5713 0.2277117 1000 55.2071 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.99822956 -0.62468953 -0.08257107 0.61939001 2.77997551
##
## Number of Observations: 2000
## Number of Groups: 1000
## dyad_id = pdLogChol(1)
## Variance StdDev
## (Intercept) 34.30581 5.857116
## Residual 35.09360 5.923985
## [1] 0.4943242
The take home message is that the two observations for each dyad are indeed correlated. Is this a lot? Is this significant?
5 Hypothesis testing
Suppose we wish to test the following hypotheses:
- your own age and education is positively related to your score on Y.
- the age and education of your partner is positively related to your score on Y.
5.1 OLS_pooled
Prior to this course you would probably have estimated an OLS regression analysis on your complete data in long format.
##
## Call:
## lm(formula = Y ~ age + educ + age_P + educ_P + nkids, data = data_long)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.1773 -4.6337 0.0457 4.4454 23.8309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.30347 1.62515 0.802 0.423
## age -0.19326 0.04776 -4.047 5.39e-05 ***
## educ 1.15410 0.08291 13.920 < 2e-16 ***
## age_P 0.51442 0.04776 10.771 < 2e-16 ***
## educ_P -0.04756 0.08291 -0.574 0.566
## nkids -1.95273 0.10721 -18.214 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.756 on 1994 degrees of freedom
## Multiple R-squared: 0.3438, Adjusted R-squared: 0.3422
## F-statistic: 209 on 5 and 1994 DF, p-value: < 2.2e-16
So, what would you conclude? We have to refute the hypo on own age and the hypo on partners educ. Obviously, the above approach does not yet take into account the interdepencies.
5.2 OLS seperately for men and women
Perhaps, you would have realized (or theorized) that the impact of own and partner’s age and educ may depend on your sex. Thus you may have estimated the model separately for men and women. There would be nothing wrong with that approach, because with one observation per dyad/household the observations can still be regarded as independent. Thus:…
mlm_W <- lm(Y_W ~ age_W + educ_W + age_M + educ_M + nkids, data = data)
summary(mlm_W)
mlm_M <- lm(Y_M ~ age_M + educ_M + age_W + educ_W + nkids, data = data)
summary(mlm_M)
##
## Call:
## lm(formula = Y_W ~ age_W + educ_W + age_M + educ_M + nkids, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.6319 -4.8446 -0.1661 4.7004 22.8278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.51260 2.78070 -0.544 0.58659
## age_W 0.26767 0.08627 3.102 0.00197 **
## educ_W 0.89440 0.14891 6.006 2.66e-09 ***
## age_M 0.23414 0.12113 1.933 0.05352 .
## educ_M 0.41215 0.13209 3.120 0.00186 **
## nkids -1.92004 0.15944 -12.042 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.099 on 994 degrees of freedom
## Multiple R-squared: 0.239, Adjusted R-squared: 0.2352
## F-statistic: 62.44 on 5 and 994 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = Y_M ~ age_M + educ_M + age_W + educ_W + nkids, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.2648 -4.2281 0.3472 3.8307 17.9911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.47847 2.33193 1.063 0.2881
## age_M 0.19849 0.10158 1.954 0.0510 .
## educ_M 0.59445 0.11077 5.366 1e-07 ***
## age_W -0.01770 0.07235 -0.245 0.8067
## educ_W 0.32121 0.12488 2.572 0.0103 *
## nkids -1.98687 0.13371 -14.859 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.953 on 994 degrees of freedom
## Multiple R-squared: 0.2517, Adjusted R-squared: 0.2479
## F-statistic: 66.86 on 5 and 994 DF, p-value: < 2.2e-16
So, what would you conclude? For women we cannot refute any of the hypo. For men we have to refute the hypo on the impact of partner’s age. Clearly, both the actor-effects and partner-effects differ for men and women. A disadvantage of the approach above is that although it takes into account that interdependencies are a nuisance it does not yet take the interdependencies as something of interest. Other disadvantages are that it is hard to test whether partner effects significantly differ for men and women, or to set constraints on specific effects.
5.3 SEM with lavaan
One approach is to estimate both models simultaneously within a SEM framework. And this is the APIM for non longitudinal designs.
require(lavaan)
myModel <- {
"# regressions
Y_M ~ age_M + educ_M + age_W + educ_W + b1*nkids
Y_W ~ age_M + educ_M + age_W + educ_W + b2*nkids
#contraints
b1==b2
# variances and covariances
Y_M ~~ Y_M
Y_M ~~ Y_W
Y_W ~~ Y_W
age_M ~~ age_W
educ_M ~~ educ_W
# intercepts
Y_M ~ 1
Y_W ~ 1
"
}
fit <- sem(myModel, data = data)
summary(fit, standardized = FALSE)
## lavaan 0.6-7 ended normally after 95 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 27
## Number of equality constraints 1
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 11.028
## Degrees of freedom 9
## P-value (Chi-square) 0.274
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y_M ~
## age_M 0.199 0.101 1.968 0.049
## educ_M 0.595 0.110 5.400 0.000
## age_W -0.018 0.072 -0.245 0.806
## educ_W 0.321 0.124 2.581 0.010
## nkids (b1) -1.983 0.133 -14.923 0.000
## Y_W ~
## age_M 0.232 0.120 1.931 0.054
## educ_M 0.410 0.131 3.124 0.002
## age_W 0.267 0.086 3.110 0.002
## educ_W 0.896 0.148 6.038 0.000
## nkids (b2) -1.983 0.133 -14.923 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .Y_M ~~
## .Y_W 34.242 1.714 19.979 0.000
## age_M ~~
## age_W 4.150 0.264 15.746 0.000
## educ_M ~~
## educ_W 2.990 0.172 17.392 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .Y_M 2.463 2.316 1.063 0.288
## .Y_W -1.264 2.749 -0.460 0.646
## age_M 25.006 0.072 348.097 0.000
## educ_M 8.033 0.072 112.134 0.000
## age_W 20.201 0.101 200.776 0.000
## educ_W 10.076 0.063 158.954 0.000
## nkids 3.078 0.045 68.966 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Y_M 35.229 1.576 22.361 0.000
## .Y_W 50.102 2.241 22.361 0.000
## age_M 5.160 0.231 22.361 0.000
## educ_M 5.132 0.230 22.361 0.000
## age_W 10.123 0.453 22.361 0.000
## educ_W 4.018 0.180 22.361 0.000
## nkids 1.992 0.089 22.361 0.000
##
## Constraints:
## |Slack|
## b1 - (b2) 0.000
So, what would you conclude now!!?? Exact from the estimate referring the the number of children we obtain very similar results (most differences due to rounding errors). Can we conclude anything with respect to influence versus selection processes?
6 APIM
The key use of APIM models is to assess bidirectional effects within longitudinal designs. Suppose we have measured our outcome variable at two occasions. We formulate our hypotheses:
- your own age and education (measured at time 1) is positively related to your score on Y (measured at time 2), even if we take into account stability in outcome Y. Alternatively phrasing, your age and education is positively related to a positive change in your Y.
- your partner’s score on outcome Y (measured at time 1) has an impact on your own score on Y (measured at time 2). Alternative phrasing, your partner’s score is positively related to a positive change in your Y.
- the age and education of your partner (measured at time 1) is positively related to your score on Y (measured at time 2), even if we take into account stability in outcome Y and that your partner’s score on Y impacts your outcome Y.
myModel <- {
"# regressions
Y_Mt2 ~ Y_M + Y_W + age_M + educ_M + age_W + educ_W + b1*nkids
Y_Wt2 ~ Y_M + Y_W + age_M + educ_M + age_W + educ_W + b2*nkids
#contraints
b1==b2
# variances and covariances
Y_Mt2 ~~ Y_Mt2
Y_Mt2 ~~ Y_Wt2
Y_M ~~ Y_M + age_M + educ_M + age_W + educ_W + b3*nkids
Y_W ~~ Y_W + age_M + educ_M + age_W + educ_W + b3*nkids
age_M ~~ age_M + age_W + educ_M + educ_W
age_W ~~ age_W + educ_M + educ_W
educ_M ~~ educ_M + educ_W
educ_W ~~ educ_W
# intercepts
Y_M ~ 1
Y_W ~ 1
Y_Mt2 ~ 1
Y_Wt2 ~ 1
"
}
fit <- sem(myModel, data = data)
summary(fit, standardized = FALSE)
## lavaan 0.6-7 ended normally after 183 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 49
## Number of equality constraints 2
##
## Number of observations 1000
##
## Model Test User Model:
##
## Test statistic 1285.561
## Degrees of freedom 7
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Y_Mt2 ~
## Y_M 0.730 0.016 46.529 0.000
## Y_W 0.457 0.014 32.849 0.000
## age_M 0.066 0.052 1.269 0.204
## educ_M 0.130 0.057 2.273 0.023
## age_W 0.086 0.037 2.304 0.021
## educ_W 0.279 0.065 4.283 0.000
## nkids (b1) -1.090 0.062 -17.466 0.000
## Y_Wt2 ~
## Y_M 0.410 0.014 28.542 0.000
## Y_W 0.577 0.013 45.363 0.000
## age_M 0.119 0.047 2.520 0.012
## educ_M 0.195 0.052 3.729 0.000
## age_W 0.194 0.034 5.699 0.000
## educ_W 0.483 0.059 8.126 0.000
## nkids (b2) -1.090 0.062 -17.466 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .Y_Mt2 ~~
## .Y_Wt2 4.662 0.303 15.387 0.000
## Y_M ~~
## age_M -1.966 0.477 -4.123 0.000
## educ_M 2.617 0.469 5.583 0.000
## age_W -6.673 0.711 -9.380 0.000
## educ_W -0.981 0.402 -2.439 0.015
## nkids (b3) -2.204 0.220 -10.020 0.000
## Y_W ~~
## age_M 4.765 0.606 7.864 0.000
## educ_M 2.448 0.562 4.360 0.000
## age_W 10.738 0.916 11.719 0.000
## educ_W 6.036 0.552 10.934 0.000
## nkids (b3) -2.204 0.220 -10.020 0.000
## age_M ~~
## age_W 4.957 0.301 16.459 0.000
## educ_M 0.168 0.164 1.027 0.304
## educ_W 0.335 0.151 2.216 0.027
## educ_M ~~
## age_W -0.211 0.246 -0.856 0.392
## age_W ~~
## educ_W 0.719 0.227 3.173 0.002
## educ_M ~~
## educ_W 2.786 0.169 16.524 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## Y_M 8.980 0.215 41.691 0.000
## Y_W 16.162 0.259 62.433 0.000
## .Y_Mt2 -2.187 1.186 -1.844 0.065
## .Y_Wt2 5.011 1.084 4.621 0.000
## age_M 25.006 0.074 338.847 0.000
## educ_M 8.033 0.070 114.331 0.000
## age_W 20.201 0.111 182.085 0.000
## educ_W 10.076 0.065 155.501 0.000
## nkids 3.078 0.043 71.683 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Y_Mt2 9.182 0.411 22.361 0.000
## Y_M 46.398 2.022 22.944 0.000
## Y_W 67.017 2.924 22.923 0.000
## age_M 5.446 0.243 22.382 0.000
## age_W 12.308 0.544 22.611 0.000
## educ_M 4.937 0.221 22.373 0.000
## educ_W 4.199 0.188 22.375 0.000
## .Y_Wt2 7.629 0.341 22.361 0.000
## nkids 1.844 0.082 22.411 0.000
##
## Constraints:
## |Slack|
## b1 - (b2) 0.000
What are our final conclusions?
We observe clear actor and partner effects:
- Y_W predicts both Y_Wt2 (actor effect) and Y_Mt2 (partner effect)
- Y_M predicts both Y_Mt2 (actor effect) and Y_Wt2 (partner effect)
- Even if we take into account that both a person’s own score and partner’s score predict a person’s own outcome, we also observe that a person’s own characteristic and partner’s characteristic predict how a person’s own outcome develops over time.
- the reported covariances may be the result of both selection and influence processes.
- the reported regression partner-effects are (more likely) the result of influence processes.
In sum,
7 Assignment
Time to practice.
Assignment
- Test whether partner effects are significantly different from each other. Use the SEM-framework and estimate a model with and without constraints, compare the model fit of the nested models.
- To what extent do actor-variables explain the covariance between the scores on Y? Can we interpret this as a selection effect?
- To what extent do partner-variables explain the covariance between the scores on Y? Can we interpret this as an influence effect?
- To what extent do dyad-variables explain the covariance between the scores on Y? Let us assume the couples got their kids after they got married. Can we interpret this as an influence effect?
- Estimate an APIM model. Thus include also the measurement of Y on timepoint2 in your SEM (Y_Wt2 and Y_Mt2).
- Are there significant partner-effects? And which one could we interpret as influence effects?
8 Alternative approaches
8.1 multilevel
We could also estimate APIM models within the multi-level framework. To give you some pointers, see the code below.
# index==1 refers to women.
mlme <- lme(Y ~ 0 + as.factor(index) + age:as.factor(index) + educ:as.factor(index) + age_P:as.factor(index) +
educ_P:as.factor(index) + nkids, data = data_long, weights = varIdent(form = ~1 | index), random = list(~0 +
as.factor(index) | dyad_id), method = "REML", control = lmeControl(maxIter = 1000, msMaxIter = 1000,
niterEM = 500, msMaxEval = 1000))
summary(mlme)
## Linear mixed-effects model fit by REML
## Data: data_long
## AIC BIC logLik
## 12125.1 12214.63 -6046.55
##
## Random effects:
## Formula: ~0 + as.factor(index) | dyad_id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## as.factor(index)1 6.852783 as.()1
## as.factor(index)2 5.658064 0.888
## Residual 1.851593
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | index
## Parameter estimates:
## 1 2
## 1 1
## Fixed effects: Y ~ 0 + as.factor(index) + age:as.factor(index) + educ:as.factor(index) + age_P:as.factor(index) + educ_P:as.factor(index) + nkids
## Value Std.Error DF t-value p-value
## as.factor(index)1 -1.2644776 2.7592459 991 -0.458269 0.6469
## as.factor(index)2 2.4630262 2.3318270 991 1.056265 0.2911
## nkids -1.9829508 0.1336022 999 -14.842201 0.0000
## as.factor(index)1:age 0.2672088 0.0862664 991 3.097486 0.0020
## as.factor(index)2:age 0.1985992 0.1015790 991 1.955121 0.0508
## as.factor(index)1:educ 0.8958841 0.1488877 991 6.017180 0.0000
## as.factor(index)2:educ 0.5945737 0.1107739 991 5.367452 0.0000
## as.factor(index)1:age_P 0.2323626 0.1210941 991 1.918859 0.0553
## as.factor(index)2:age_P -0.0176752 0.0723510 991 -0.244298 0.8071
## as.factor(index)1:educ_P 0.4101990 0.1320552 991 3.106269 0.0019
## as.factor(index)2:educ_P 0.3211152 0.1248796 991 2.571398 0.0103
## Correlation:
## as.()1 as.()2 nkids as.fctr(ndx)1:g as.fctr(ndx)2:g as.fctr(ndx)1:d
## as.factor(index)2 0.823
## nkids -0.191 -0.226
## as.factor(index)1:age -0.013 -0.011 0.011
## as.factor(index)2:age -0.594 -0.723 0.037 -0.468
## as.factor(index)1:educ -0.308 -0.248 -0.021 -0.008 0.019
## as.factor(index)2:educ 0.013 0.017 0.037 0.032 -0.068 -0.537
## as.factor(index)1:age_P -0.726 -0.589 0.031 -0.575 0.815 0.024
## as.factor(index)2:age_P -0.012 -0.014 0.013 0.815 -0.574 -0.007
## as.factor(index)1:educ_P 0.019 0.013 0.031 0.039 -0.055 -0.659
## as.factor(index)2:educ_P -0.250 -0.304 -0.025 -0.007 0.023 0.815
## as.fctr(ndx)2:d as.fctr(ndx)1:g_P as.fctr(ndx)2:g_P as.fctr(ndx)1:d_P
## as.factor(index)2
## nkids
## as.factor(index)1:age
## as.factor(index)2:age
## as.factor(index)1:educ
## as.factor(index)2:educ
## as.factor(index)1:age_P -0.055
## as.factor(index)2:age_P 0.039 -0.468
## as.factor(index)1:educ_P 0.815 -0.068 0.032
## as.factor(index)2:educ_P -0.659 0.019 -0.008 -0.537
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.56653057 -0.33278498 -0.01544818 0.33944831 1.65762870
##
## Number of Observations: 2000
## Number of Groups: 1000
Latest Version: 08-01-2021
Please email any comments to: j.tolsma@ru.nl
Snijders, T.A.B. $ Boskers, R.J. (1999) Multilevel Analysis. An introduction to basic and advanced multilevel modeling. London: Sage Publications↩︎