Appendix to: Common Practice (misuse of) Moderation in Regression

Simulate Data

We’ll simulate a tri-variate data set, with two uncorrelated variables that are correlated to a third variable. To start, all variables will be centered at 0 (mu) and scaled to 1 (diagonal of Sigma):

library(tidyverse)
Sigma <- matrix(c(1.0, 0.6, 0.6,
                  0.6, 1.0, 0.0,
                  0.6, 0.0, 1.0),
                nrow = 3)
data <- MASS::mvrnorm(n = 1000,
                      mu = rep(0,3),
                      Sigma = Sigma,
                      empirical = T) %>% as.data.frame()

Let’s re scale V2 to increase it’s slope when predicting V1:

data <- data %>% 
  mutate(V2 = 5*V2+10)

Lets look at the correlation matrix (should be the same as Sigma):

knitr::kable(cor(data))
V1 V2 V3
V1 1.0 0.6 0.6
V2 0.6 1.0 0.0
V3 0.6 0.0 1.0

and the covariance matrix (should only be different in the scale of V2):

knitr::kable(cov(data))
V1 V2 V3
V1 1.0 3 0.6
V2 3.0 25 0.0
V3 0.6 0 1.0

Fit lavaan model

library(lavaan)
my_model <- '
V1 ~ a*V2 + b*V3
diff := a - b
'
fit <- sem(my_model, data = data)

If diff is computed on the standardized coefficients, we expect it to be 0.
If diff is computed on the unstandardized coefficients, we expect it to not 0.

summary(fit, standardized = T)
## lavaan 0.6-5 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##   Number of observations                          1000
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   V1 ~                                                                  
##     V2         (a)    0.120    0.003   35.857    0.000    0.120    0.600
##     V3         (b)    0.600    0.017   35.857    0.000    0.600    0.600
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1                0.280    0.013   22.361    0.000    0.280    0.280
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     diff             -0.480    0.017  -28.128    0.000   -0.480    0.000

We can see that the Estimate of diff is non-zero, implying that it was computed on the non-standardized coefficients. BUT we also see that the Std.all of diff is 0, implying that it was computed on the standardized coefficients.
What about the significance test?

parameterEstimates(fit)[7,] %>% knitr::kable()
lhs op rhs label est se z pvalue ci.lower ci.upper
7 diff := a-b diff -0.48 0.0170646 -28.12843 0 -0.513446 -0.446554
standardizedSolution(fit)[7,] %>% knitr::kable()
lhs op rhs est.std se z pvalue ci.lower ci.upper
7 diff := a-b 0 0.0236643 1e-07 0.9999999 -0.0463812 0.0463812

We can see that we get two different \(z\)-tests, depending on the type of estimates we get. It seems that the test results returned from summary() are based on the standardizedSolution() function, and thus based on the unstandardized results.

comments powered by Disqus