Analysis of variance (ANOVA) is a statistical procedure, developed by R. A. Fisher, used to analyze the relationship between a continuous outcome (dependent variable) and categorical predictors (independent variables). This procedure produces a linear model, which can be used to estimate the conditional and marginal means of the outcome; In the presence of multiple factorial predictors, the model will include all interaction terms between the factors.

The ANOVA is part of a wider family of statistical procedures that include ANCOVA (which incorporates continuous predictors) and Analysis of Deviance (which allow for non-continuous outcomes¹). This family of procedures all produce an ANOVA table (or ANOVA-like table) which summarizes the relationship between the underlying model and the outcome by partitioning the variation in the outcome into components which can be uniquely attributable to different sources according to the law of total variance. Essentially, each of the model’s terms is represented in a line in the ANOVA table which answers the question how much of the variation in $Y$ can be attributed to the variation in $X$??² Where applicable, each source of variance has an accompanying test statistic (often F), sometimes called the omnibus test, which indicates the significance of the variance attributable to that term, often accompanied by some measure of effect size.

In this post I will demonstrate the various types of ANOVA tables, how R does ANOVA (what the defaults are, and how to produce alternatives).

Along the way, I hope to illustrate the applicability of ANOVA tables to other types of models - besides the classical case of a maximal model (all main effects and interactions) with strictly categorical predictors and a continuous outcome.

Here are some assumptions I make about you, the reader, in this post:

You’re familiar with the ideas of multi-factor ANOVAs (what a main effect is, what interactions are…).
You know some R - how to fit a linear model, how to wrangle some data.
~~You are IID and normally distributed.~~

Let’s dive right in!

ANOVAs in R

Toy Data

d <- tibble::tribble(
  ~id, ~group,    ~X, ~Z,       ~Rx, ~condition,       ~Y,
   1L,   "Gb", "102", 1L, "Placebo",       "Ca", "584.07",
   2L,   "Ga",  "52", 1L, "Placebo",       "Cb", "790.29",
   3L,   "Gb", "134", 2L, "Dose100",       "Ca", "875.76",
   4L,   "Gb", "128", 3L, "Dose100",       "Cb", "848.37",
   5L,   "Ga",  "78", 1L, "Dose250",       "Ca", "270.42",
   6L,   "Gb", "150", 2L, "Dose250",       "Cb", "999.87",
   7L,   "Ga",  "73", 1L, "Placebo",       "Ca",  "364.1",
   8L,   "Ga",  "87", 7L, "Placebo",       "Cb", "420.84",
   9L,   "Gb", "115", 6L, "Dose100",       "Ca", "335.78",
  10L,   "Gb", "113", 4L, "Dose100",       "Cb",    "627",
  11L,   "Gc", "148", 3L, "Dose250",       "Ca", "607.79",
  12L,   "Gc",  "82", 3L, "Dose250",       "Cb", "329.32",
  13L,   "Ga", "139", 1L, "Placebo",       "Ca", "335.56",
  14L,   "Ga",  "65", 2L, "Placebo",       "Cb", "669.04",
  15L,   "Gb", "139", 1L, "Dose100",       "Ca", "405.04",
  16L,   "Gc",  "96", 1L, "Dose100",       "Cb", "367.15",
  17L,   "Gb",  "50", 5L, "Dose250",       "Ca",  "27.37",
  18L,   "Gc",  "90", 2L, "Dose250",       "Cb", "468.69",
  19L,   "Ga",  "90", 2L, "Placebo",       "Ca", "584.67",
  20L,   "Ga", "116", 2L, "Placebo",       "Cb", "277.71",
  21L,   "Gb",  "78", 2L, "Dose100",       "Ca", "266.01",
  22L,   "Gb",  "60", 1L, "Dose100",       "Cb",   "0.04",
  23L,   "Gc", "112", 4L, "Dose250",       "Ca", "593.25",
  24L,   "Ga",  "63", 4L, "Dose250",       "Cb", "512.26",
  25L,   "Ga",  "89", 1L, "Placebo",       "Ca", "635.57",
  26L,   "Ga",  "97", 2L, "Placebo",       "Cb", "468.69",
  27L,   "Gc",  "76", 3L, "Dose100",       "Ca", "514.66",
  28L,   "Gb",  "83", 1L, "Dose100",       "Cb", "264.87",
  29L,   "Gc",  "84", 4L, "Dose250",       "Ca", "220.34",
  30L,   "Gb",  "88", 1L, "Dose250",       "Cb", "216.54"
  )

d$id <- factor(d$id)
d$group <- factor(d$group)
d$Rx <- factor(d$Rx, levels = c("Placebo", "Dose100", "Dose250"))
d$condition <- factor(d$condition)

dplyr::glimpse(d)

#> Rows: 30
#> Columns: 7
#> $ id        <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ group     <fct> Gb, Ga, Gb, Gb, Ga, Gb, Ga, Ga, Gb, Gb, Gc, Gc, Ga, Ga, Gb, …
#> $ X         <dw_trnsf> 102, 52, 134, 128, 78, 150, 73, 87, 115, 113, 148, 82, …
#> $ Z         <int> 1, 1, 2, 3, 1, 2, 1, 7, 6, 4, 3, 3, 1, 2, 1, 1, 5, 2, 2, 2, …
#> $ Rx        <fct> Placebo, Placebo, Dose100, Dose100, Dose250, Dose250, Placeb…
#> $ condition <fct> Ca, Cb, Ca, Cb, Ca, Cb, Ca, Cb, Ca, Cb, Ca, Cb, Ca, Cb, Ca, …
#> $ Y         <dw_trnsf> 584.07, 790.29, 875.76, 848.37, 270.42, 999.87, 364.10,…

m <- lm(Y ~ group + X, data = d)

This is a multiple regression model with a covariable X and a 3-level factor group. We can summarize the results in a coefficient table (aka a “regression” table):

summary(m)

#> 
#> Call:
#> lm(formula = Y ~ group + X, data = d)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -372.51 -166.47  -53.67  134.57  451.16 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  118.608    145.557   0.815  0.42256   
#> groupGb     -102.591     94.853  -1.082  0.28937   
#> groupGc      -92.384    107.302  -0.861  0.39712   
#> X              4.241      1.504   2.820  0.00908 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 218.8 on 26 degrees of freedom
#> Multiple R-squared:  0.2383, Adjusted R-squared:  0.1504 
#> F-statistic: 2.711 on 3 and 26 DF,  p-value: 0.06556

Or we can produce an ANOVA table:

anova(m)

#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value   Pr(>F)   
#> group      2    8783    4391  0.0918 0.912617   
#> X          1  380471  380471  7.9503 0.009077 **
#> Residuals 26 1244265   47856                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

While the group term had 2 parameters in the coefficient table, it now has a single test with a Df of 2. This omnibus test can be thought of as representing the total significance of the two parameters combined!

Note

Note that the model being summarized here is neither completely factorial (X is a continuous covariable), nor maximal (the group:X interaction is missing)!

By default R calculates type 1 sums of squares (SS) - these are also called sequential SS, because each term is attributed with a portion of the variation (represented by its SS) in $Y$ that has not yet been attributed to any of the PREVIOUS terms! Thus, in our example, the effect of X represents only what X explains on top of what group has already explained - the variance attributed to X is strictly the variance that can be uniquely attributed to X, controlling for group; the effect of group however does not represent its unique contribution to $Y$ ’s variance, but instead its total contribution.

This means that although the following models have the same terms, they will produce different type 1 ANOVA tables because those terms are in a different order:

anova(lm(Y ~ group + X, data = d))

#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value   Pr(>F)   
#> group      2    8783    4391  0.0918 0.912617   
#> X          1  380471  380471  7.9503 0.009077 **
#> Residuals 26 1244265   47856                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(Y ~ X + group, data = d))

#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value  Pr(>F)  
#> X          1  325745  325745  6.8067 0.01486 *
#> group      2   63509   31754  0.6635 0.52353  
#> Residuals 26 1244265   47856                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can recreate the ANOVA table above by building a sequence of models, and comparing them (see (Judd, McClelland, & Ryan, 2017)[https://doi.org/10.4324/9781315744131]):

m0 <- lm(Y ~ 1, data = d) # Intercept-only model
m1 <- lm(Y ~ group, data = d)

anova(m0, m1, m)

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ 1
#> Model 2: Y ~ group
#> Model 3: Y ~ group + X
#>   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
#> 1     29 1633519                                
#> 2     27 1624737  2      8783 0.0918 0.912617   
#> 3     26 1244265  1    380471 7.9503 0.009077 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m) # Same SS values

#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value   Pr(>F)   
#> group      2    8783    4391  0.0918 0.912617   
#> X          1  380471  380471  7.9503 0.009077 **
#> Residuals 26 1244265   47856                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Simultaneous Sum of Squares

There is also type 2 SS - also called simultaneous SS, because each term is attributed with a portion of the variation in $Y$ that is not attributable to any of the other terms in the model - its unique contribution while controlling for the other terms. Type 2 SS can be obtained with the Anova() function from the {car} package:

car::Anova(m, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>            Sum Sq Df F value   Pr(>F)   
#> group       63509  2  0.6635 0.523533   
#> X          380471  1  7.9503 0.009077 **
#> Residuals 1244265 26                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can recreate the above ANOVA table by building two sequences of models, and comparing them:

m_sans_X <- lm(Y ~ group, data = d)
m_sans_group <- lm(Y ~ X, data = d)

anova(m_sans_group, m) # Same SS as the type 2 test for group

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ X
#> Model 2: Y ~ group + X
#>   Res.Df     RSS Df Sum of Sq      F Pr(>F)
#> 1     28 1307774                           
#> 2     26 1244265  2     63509 0.6635 0.5235

anova(m_sans_X,     m) # Same SS as the type 2 test for X

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ group
#> Model 2: Y ~ group + X
#>   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
#> 1     27 1624737                                
#> 2     26 1244265  1    380471 7.9503 0.009077 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because the order of terms is usually of little importance, type 1 tests are rarely used in practice…

Unfortunately, they are R’s default…

Adding Interactions

Things get a bit more complicated when interactions are involved, as type 2 SS treat interactions differently than main effects:

m_int <- lm(Y ~ group * X, data = d)

Each main effect term is attributed with variance (its SS) that is unique to it and that is not attributable to any of the other main effects (simultaneously, as we’ve already seen) but without accounting for the variance attributable to interactions, while the SS of the interaction term represents its unique variance after accounting for the underlying main effects (sequentially). So we get the unique contribution of each main effect when controlling only for the other main effects, and the unique contribution of the interactions controlling for the already-included combined contribution of the main effects.

car::Anova(m_int, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>           Sum Sq Df F value    Pr(>F)    
#> group      63509  2  1.3555 0.2768607    
#> X         380471  1 16.2410 0.0004884 ***
#> group:X   682026  2 14.5566 7.246e-05 ***
#> Residuals 562240 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can again recreate this type 2 ANOVA table with model comparisons³:

anova(m_sans_group, m)     # Same SS as the type 2 test for group

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ X
#> Model 2: Y ~ group + X
#>   Res.Df     RSS Df Sum of Sq      F Pr(>F)
#> 1     28 1307774                           
#> 2     26 1244265  2     63509 0.6635 0.5235

anova(m_sans_X,     m)     # Same SS as the type 2 test for X

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ group
#> Model 2: Y ~ group + X
#>   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
#> 1     27 1624737                                
#> 2     26 1244265  1    380471 7.9503 0.009077 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m,            m_int) # Same SS as the type 2 test for group:X

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ group + X
#> Model 2: Y ~ group * X
#>   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#> 1     26 1244265                                  
#> 2     24  562240  2    682026 14.557 7.246e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In designs of higher order, each “order” is tested in a similar simultaneous-sequential manner. E.g., in a 3-way design, all main effects (1st order) are tested simultaneously (accounting for one another), then all 2-way interactions (2nd order) are tested simultaneously (accounting for the main effects and one another), and then the 3-way interaction is tested (accounting for all main effects and 2-way interactions).

Type 3

There is another type of simultaneous SS - the type 3 test, which treats interactions and main effects equally: the SS for each main effect or interaction is calculated as its unique contribution that is not attributable to any of the other effects in the model - main effects or interactions. So the effect of X is its unique contribution while controlling both for group and for group:X!

However, remember how we previously saw that these methods in R actually produce omnibus tests for the combined effect of the parameters of each term. But in the m_int model the parameters labeled X, groupGb, and groupGc no longer represent parameters of the main effects - instead they are parameters of simple (i.e., conditional) effects!

summary(m_int)

#> 
#> Call:
#> lm(formula = Y ~ group * X, data = d)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -370.18  -67.15   33.68  111.35  172.14 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   833.130    173.749   4.795 6.99e-05 ***
#> groupGb     -1308.898    233.218  -5.612 8.90e-06 ***
#> groupGc      -752.718    307.747  -2.446   0.0222 *  
#> X              -4.041      1.942  -2.081   0.0482 *  
#> groupGb:X      13.041      2.419   5.390 1.55e-05 ***
#> groupGc:X       7.731      3.178   2.432   0.0228 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 153.1 on 24 degrees of freedom
#> Multiple R-squared:  0.6558, Adjusted R-squared:  0.5841 
#> F-statistic: 9.146 on 5 and 24 DF,  p-value: 5.652e-05

X is the slope of X when group=Ga (When both dummy variables are fixed at 0, as Ga is the reference level)
groupGb is the difference between group=Ga and group=Gb when X=0
groupGc is the difference between group=Ga and group=Gc when X=0

(Pay attention to the “when” - this is what makes them conditional.)

We can see that changing the reference group changes the test for X:

d$group <- relevel(d$group, ref = "Gb")
m_int2 <- lm(Y ~ group * X, data = d)

car::Anova(m_int, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>             Sum Sq Df F value    Pr(>F)    
#> (Intercept) 538630  1 22.9922 6.994e-05 ***
#> group       738108  2 15.7536 4.269e-05 ***
#> X           101495  1  4.3325   0.04823 *  
#> group:X     682026  2 14.5566 7.246e-05 ***
#> Residuals   562240 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

car::Anova(m_int2, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>             Sum Sq Df F value    Pr(>F)    
#> (Intercept) 219106  1  9.3528  0.005402 ** 
#> group       738108  2 15.7536 4.269e-05 ***
#> X           910646  1 38.8722 1.918e-06 ***
#> group:X     682026  2 14.5566 7.246e-05 ***
#> Residuals   562240 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How can we resolve this?

Centering

By centering our predictors! Centering is transforming our data in such a way that 0 represents the overall mean. When this is done, conditioning on 0 is the same as conditioning on the overall mean = looking at the main effect!

For covariables this is easy enough:

d$X_c <- d$X - mean(d$X) # or scale(d$X, center = TRUE, scale = FALSE)

But how do we center a factor??

The answer is - use some type of orthogonal coding, for example contr.sum() (effects coding). This makes the coefficients harder to interpret ⁴, but we’re not looking at those anyway!

contrasts(d$group) <- contr.sum

Now when looking at type 3 tests, the main effects terms actually are main effects!

m_int3 <- lm(Y ~ group*X_c, data = d)
car::Anova(m_int3, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>              Sum Sq Df  F value    Pr(>F)    
#> (Intercept) 4743668  1 202.4902 3.401e-13 ***
#> group         19640  2   0.4192   0.66231    
#> X_c          143772  1   6.1371   0.02067 *  
#> group:X_c    682026  2  14.5566 7.246e-05 ***
#> Residuals    562240 24                       
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remember: ABC - Always Be Centering (your predictors) - type 3 ANOVA tables make little sense without centering⁵!

Recreate the Type 3 ANOVA

Unfortunately, we can’t just build a model without an interaction term and use it to recreate the type 3 ANOVA. Instead, we need to actually build the model matrix (i.e., the design matrix), and drop the columns of each term in turn:

mm <- model.matrix(m_int2)
head(mm)

#>   (Intercept) groupGa groupGc   X groupGa:X groupGc:X
#> 1           1       0       0 102         0         0
#> 2           1       1       0  52        52         0
#> 3           1       0       0 134         0         0
#> 4           1       0       0 128         0         0
#> 5           1       1       0  78        78         0
#> 6           1       0       0 150         0         0

A type 3 test for a term, is equal to the comparison between a model without the parameters associated with that term and the full model:

m_sans_group <- lm(Y ~ mm[,-(2:3)], data = d)
m_sans_X <-     lm(Y ~ mm[,-4],     data = d)
m_sans_int <-   lm(Y ~ mm[,-(5:6)], data = d)

anova(m_sans_group, m_int2) # Same SS as type 3 for group

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ mm[, -(2:3)]
#> Model 2: Y ~ group * X
#>   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#> 1     26 1300348                                  
#> 2     24  562240  2    738108 15.754 4.269e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m_sans_X,     m_int2) # Same SS as type 3 for X

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ mm[, -4]
#> Model 2: Y ~ group * X
#>   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#> 1     25 1472886                                  
#> 2     24  562240  1    910646 38.872 1.918e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m_sans_int,   m_int2) # Same SS as type 3 for group:X

#> Analysis of Variance Table
#> 
#> Model 1: Y ~ mm[, -(5:6)]
#> Model 2: Y ~ group * X
#>   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#> 1     26 1244265                                  
#> 2     24  562240  2    682026 14.557 7.246e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compare to:

car::Anova(m_int2, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>             Sum Sq Df F value    Pr(>F)    
#> (Intercept) 219106  1  9.3528  0.005402 ** 
#> group       738108  2 15.7536 4.269e-05 ***
#> X           910646  1 38.8722 1.918e-06 ***
#> group:X     682026  2 14.5566 7.246e-05 ***
#> Residuals   562240 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Okay, that’s some ugly stuff. Let us never look at that again.

Type 2 vs Type 3

As mentioned above, the distinction between types 2 and 3 comes from how they estimate main effects in the presence of interactions.

Let’s look at the following factorial design:

m_factorial <- lm(Y ~ condition * group, data = d,
                   # Another way to specify effects coding:
                   contrasts = list(condition = contr.sum,
                                    group = contr.sum))

car::Anova(m_factorial, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>                  Sum Sq Df F value Pr(>F)
#> condition         12048  1  0.1840 0.6718
#> group              7165  2  0.0547 0.9469
#> condition:group   41204  2  0.3146 0.7330
#> Residuals       1571485 24

car::Anova(m_factorial, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>                  Sum Sq Df F value    Pr(>F)    
#> (Intercept)     5858845  1 89.4773 1.439e-09 ***
#> condition          3452  1  0.0527    0.8203    
#> group              8913  2  0.0681    0.9344    
#> condition:group   41204  2  0.3146    0.7330    
#> Residuals       1571485 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

But where do these differences between type 2 and 3 come from?

Type 2 SS looks at the SS between the means of A, across the levels of B. So the marginal mean of the first group is estimated as:

$\bar{Y}_{1.} = \frac{\sum{Y_{1.}}}{N_{1.}} = \frac{\sum{Y_{11}}+\sum{Y_{12}}}{N_{11}+N_{12}}$

Type 3 SS however looks at the SS between the means of group, weighted by condition. So the marginal mean of group a is estimated as:

$\bar{Y}_{1.} = \frac{\frac{\sum{Y_{11}}}{N_{11}} + \frac{\sum{Y_{12}}}{N_{12}}}{2}$

This makes type 3 SS invariant to the cell frequencies!

But as we will soon see, this need not always be the case…

A lot has been said about type 2 vs type 3. I will not go into the weeds here, but it is important to note that

Most statistical softwares (SAS, Stata, SPSS, …) default to type 3 SS with orthogonal factor coding (but covariables are not mean-centered in most cases by default) (see Langsrud, 2003). This makes R inconsistent as we’ve seen it defaults to type 1 ANOVA and treatment coding.
Often in factorial designs, any imbalance in the design is incidental, so it is often beneficial to have a method that is invariant to such imbalances. (Though this may not be true if the data is observational.)
Coefficient tables give results that are analogous to type 3 SS when all terms are covariables:

m_covs <- lm(Y ~ X * Z, data = d)

car::Anova(m_covs, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>            Sum Sq Df F value  Pr(>F)  
#> X          324676  1  6.7657 0.01513 *
#> Z            2430  1  0.0506 0.82373  
#> X:Z         57640  1  1.2011 0.28315  
#> Residuals 1247704 26                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

car::Anova(m_covs, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>              Sum Sq Df F value Pr(>F)
#> (Intercept)   83130  1  1.7323 0.1996
#> X              5866  1  0.1222 0.7294
#> Z             59917  1  1.2486 0.2740
#> X:Z           57640  1  1.2011 0.2831
#> Residuals   1247704 26

summary(m_covs) # same p-values as type 3

#> 
#> Call:
#> lm(formula = Y ~ X * Z, data = d)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -384.97 -153.72  -23.98  141.28  422.79 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  366.352    278.348   1.316    0.200
#> X              1.014      2.900   0.350    0.729
#> Z           -112.681    100.843  -1.117    0.274
#> X:Z            1.175      1.072   1.096    0.283
#> 
#> Residual standard error: 219.1 on 26 degrees of freedom
#> Multiple R-squared:  0.2362, Adjusted R-squared:  0.1481 
#> F-statistic:  2.68 on 3 and 26 DF,  p-value: 0.06772

Note

Note once again that the model being summarized here is not factorial at all - both Z and X are continuous covariables!

Balanced vs. Unbalanced Data

The distinction between types 1, 2 and 3 SS is only relevant when there is some dependency between predictors (aka some collinearity). In our example, we can see that group and X are somewhat co-linear (VIF / tolerance are not strictly 1):

performance::check_collinearity(m)

#> # Check for Multicollinearity
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>  group 1.08 [1.00, 5.73]         1.04      0.92     [0.17, 1.00]
#>      X 1.08 [1.00, 5.73]         1.04      0.92     [0.17, 1.00]

In a factorial design, we might call this dependence / collinearity among our predictors an “unbalanced design” (the number of observations differs between cells), and when the predictors are completely independent we would call this a “balanced design” (equal number of observations in all cells).

Let’s look at two examples:

Balanced data

We can see that Rx and condition are balanced:

table(d$Rx, d$condition)

#>          
#>           Ca Cb
#>   Placebo  5  5
#>   Dose100  5  5
#>   Dose250  5  5

chisq.test(d$Rx, d$condition)$statistic # Chisq is exactly 0

#> X-squared 
#>         0

And so type 1, 2 and 3 ANOVA tables are identical:

contrasts(d$Rx) <- contr.sum
contrasts(d$condition) <- contr.sum

m_balanced <- lm(Y ~ condition * Rx, data = d)

anova(m_balanced)

#> Analysis of Variance Table
#> 
#> Response: Y
#>              Df  Sum Sq Mean Sq F value Pr(>F)
#> condition     1   13666   13666  0.2162 0.6461
#> Rx            2   41379   20690  0.3273 0.7240
#> condition:Rx  2   61444   30722  0.4860 0.6210
#> Residuals    24 1517030   63210

car::Anova(m_balanced, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>               Sum Sq Df F value Pr(>F)
#> condition      13666  1  0.2162 0.6461
#> Rx             41379  2  0.3273 0.7240
#> condition:Rx   61444  2  0.4860 0.6210
#> Residuals    1517030 24

car::Anova(m_balanced, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>               Sum Sq Df  F value    Pr(>F)    
#> (Intercept)  6422803  1 101.6112 4.204e-10 ***
#> condition      13666  1   0.2162    0.6461    
#> Rx             41379  2   0.3273    0.7240    
#> condition:Rx   61444  2   0.4860    0.6210    
#> Residuals    1517030 24                       
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unbalanced data

However, condition and group are NOT balanced:

table(d$group, d$condition)

#>     
#>      Ca Cb
#>   Gb  6  6
#>   Ga  5  6
#>   Gc  4  3

chisq.test(d$group, d$condition)$statistic # Chisq is NOT 0

#> Warning in chisq.test(d$group, d$condition): Chi-squared approximation may be
#> incorrect

#> X-squared 
#> 0.2337662

And so type 1, 2 and type 3 ANOVA tables are NOT identical (recall how type 2 and 3 estimate marginal means differently in the presence of interactions):

anova(m_factorial)

#> Analysis of Variance Table
#> 
#> Response: Y
#>                 Df  Sum Sq Mean Sq F value Pr(>F)
#> condition        1   13666   13666  0.2087 0.6519
#> group            2    7165    3582  0.0547 0.9469
#> condition:group  2   41204   20602  0.3146 0.7330
#> Residuals       24 1571485   65479

car::Anova(m_factorial, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>                  Sum Sq Df F value Pr(>F)
#> condition         12048  1  0.1840 0.6718
#> group              7165  2  0.0547 0.9469
#> condition:group   41204  2  0.3146 0.7330
#> Residuals       1571485 24

car::Anova(m_factorial, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>                  Sum Sq Df F value    Pr(>F)    
#> (Intercept)     5858845  1 89.4773 1.439e-09 ***
#> condition          3452  1  0.0527    0.8203    
#> group              8913  2  0.0681    0.9344    
#> condition:group   41204  2  0.3146    0.7330    
#> Residuals       1571485 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the deviation from perfect balance (i.e. independence) is larger, so will the differences between the types increase.

Why Does This Happen?

We’ve seen that types 1, 2 and 3 all attribute variance in $Y$ to the model’s terms by partialling out the model’s other terms in some way - sequentially, simultaneously, or some mix of both.

However, if the predictors in the model are independent (such as in a balanced design, zero collinearity), then regardless of the order of their inclusion in the model, all of the variance that can be attributable to some term A is unique to A - none of it is also attributable to any of the other term in the model, and vice versa - there is nothing to partial out, so the order does not matter!

This also means that because the SS returned by both type 2 and 3 ANOVA tables represent the terms’ uniquely attributable variation in $Y$, then when the design is not balanced / there is collinearity in the data, the SS in the ANOVA table will not sum to the total SS - as there is some overlap (some non-unique variation) that is not represented in the ANOVA table. However… this is where type 1 ANOVA tables shine, as their sequential nature means the SS in the ANOVA table will sum to the total SS!

Let’s look at an extreme example of collinearity:

Collinear data

d2 <- MASS::mvrnorm(
  n = 100,
  mu = rep(0, 3),
  Sigma = matrix(c(1, 0.99, 0.4,
                   0.99, 1, 0.41,
                   0.4, 0.41, 1), nrow = 3)
)

d2 <- data.frame(d2)
colnames(d2) <- c("X", "Z", "Y")

m_collinear <- lm(Y ~ X + Z, data = d2)

performance::check_collinearity(m_collinear)

#> # Check for Multicollinearity
#> 
#> High Correlation
#> 
#>  Term   VIF     VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>     X 47.02 [32.64, 67.95]         6.86      0.02     [0.01, 0.03]
#>     Z 47.02 [32.64, 67.95]         6.86      0.02     [0.01, 0.03]

Looking a type 1 ANOVA table we can see that X accounts for a significant amount of variation in Y, but that Z does not add anything significant on top of X:

anova(m_collinear)

#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df Sum Sq Mean Sq F value    Pr(>F)    
#> X          1 16.927 16.9265  22.664 6.743e-06 ***
#> Z          1  0.000  0.0000   0.000    0.9999    
#> Residuals 97 72.444  0.7468                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

However, were we to look at a type 2 ANOVA table, we might get the impression that neither X nor Z contribute to the model:

car::Anova(m_collinear, type = 2)

#> Anova Table (Type II tests)
#> 
#> Response: Y
#>           Sum Sq Df F value Pr(>F)
#> X          0.360  1  0.4822 0.4891
#> Z          0.000  1  0.0000 0.9999
#> Residuals 72.444 97

This demonstrates the importance of always interpreting type 2 and 3 ANOVA tables in light of any collinearity that might exist between your predictors; Remember: ABC - Always Be mindful of Collinearity _{(okay, that one was a bit of a stretch)}.

ANOVA Made Easy

We can use the lm() -> car::Anova() method to conduct a proper ANOVA on a maximal factorial design. However, making sure that our factors are orthogonally coded is a pain in the @$$.

d$group <- factor(d$group)
d$condition <- factor(d$condition)
contrasts(d$group) <- contr.sum
contrasts(d$condition) <- contr.sum
m_lm <- lm(Y ~ group * condition, d)
car::Anova(m_lm, type = 3)

#> Anova Table (Type III tests)
#> 
#> Response: Y
#>                  Sum Sq Df F value    Pr(>F)    
#> (Intercept)     5858845  1 89.4773 1.439e-09 ***
#> group              8913  2  0.0681    0.9344    
#> condition          3452  1  0.0527    0.8203    
#> group:condition   41204  2  0.3146    0.7330    
#> Residuals       1571485 24                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thankfully, we have the afex package which turns all of that mess into something much more palatable:

afex::aov_car(Y ~ group * condition + Error(id), data = d)

#> Anova Table (Type 3 tests)
#> 
#> Response: Y
#>            Effect    df      MSE    F  ges p.value
#> 1           group 2, 24 65478.53 0.07 .006    .934
#> 2       condition 1, 24 65478.53 0.05 .002    .820
#> 3 group:condition 2, 24 65478.53 0.31 .026    .733
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Much easier!

Other Types of Models

So far we’ve seen how ANOVA tables are applied to non-factorial linear OLS models. However the idea of omnibus tests per-term can be extended to many other types of models. For models where SS cannot be calculated, analogous methods based on deviance or likelihood are used instead (read more in the car::Anova() docs). Here are some examples:

GLMs

Logistic

m_logistic <- glm(condition ~ group * X_c, data = d,
                  family = binomial())
car::Anova(m_logistic, type = 2)

#> Analysis of Deviance Table (Type II tests)
#> 
#> Response: condition
#>           LR Chisq Df Pr(>Chisq)
#> group      0.15679  2     0.9246
#> X_c        0.75134  1     0.3861
#> group:X_c  1.10225  2     0.5763

Poisson

m_poisson <- glm(Z ~ group * X_c, data = d,
                 family = poisson())
car::Anova(m_poisson, type = 3)

#> Analysis of Deviance Table (Type III tests)
#> 
#> Response: Z
#>           LR Chisq Df Pr(>Chisq)
#> group      0.88074  2     0.6438
#> X_c        0.04370  1     0.8344
#> group:X_c  0.11357  2     0.9448

Ordinal

m_ordinal <- ordinal::clm(group ~ X_c * condition, data = d)
anova(m_ordinal)

#> Type I Analysis of Deviance Table with Wald chi-square tests
#> 
#>               Df  Chisq Pr(>Chisq)
#> X_c            1 0.4469     0.5038
#> condition      1 0.1584     0.6907
#> X_c:condition  1 0.6129     0.4337

Note

Note that all of the models summarized here with an ANOVA table do not have a continuous (conditionally normal) outcome, and not all of their predictors are categorical. We may have been inclined to summarize these models with a coefficient table, but it is equally valid to present an ANOVA table!

(G)LMMs

m_mixed <- lmerTest::lmer(Y ~ X_c * group + (group | Z), data = d)
anova(m_mixed, type = 2)

#> Type II Analysis of Variance Table with Satterthwaite's method
#>           Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
#> X_c       290878  290878     1 23.645 14.4210 0.0008951 ***
#> group       7198    3599     2  9.520  0.1784 0.8393243    
#> X_c:group 582071  291036     2 21.938 14.4288 0.0001001 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Also for GLMMs:

m_mixed2 <- lme4::glmer(condition ~ X_c * group + (1 | Z), data = d,
                        family = binomial())
car::Anova(m_mixed2, type = 3)

#> Analysis of Deviance Table (Type III Wald chisquare tests)
#> 
#> Response: condition
#>              Chisq Df Pr(>Chisq)
#> (Intercept) 0.0886  1     0.7660
#> X_c         1.1917  1     0.2750
#> group       0.0829  2     0.9594
#> X_c:group   0.9972  2     0.6074

Concluding Remarks

Although we might be more inclined to summarize our model with an ANOVA table when our model contains categorical predictors, especially when interactions are involved, we’ve seen that ANOVA tables do not require any special data (factorial, balanced, normal outcome), and can be used as an alternative to a coefficient table.

Regardless of how we summarize our model - with a coefficient table or with an ANOVA table, using type 1, 2 or 3 SS, with orthogonal or treatment coding, with centered or uncentered covariables - our underlying model is equivalent - and will produce the same estimated simple effects, marginal means and contrasts. That is, the method we use to summarize our model will not have any bearing on whatever follow-up analysis we may wish to carry out (using emmeans of course! Check out the materials from my R course: Analysis of Factorial Designs).⁶

I hope you now have a fuller grasp of what goes on behind the scenes when producing ANOVA tables, how the different types of ANOVA tables work, when they should be used, and how to interpret their results. ANOVA tables are a powerful tool that can be applied not only to factorial data coupled with an OLS model, but also to a wide variety of (generalized) linear (mixed) regression models.