Introduction to two-way ANOVA and expansion to an N-way ANOVA

Let’s start with when to use a two-way ANOVA, and how it differs from a one-way ANOVA. Hopefully you already know when to use a one-way ANOVA, if not, a one-way ANOVA should be used if you have 1 categorical independent variable (IV) with 3+ categories or groups, and 1 continuous dependent variable (DV); this is a 1 factor design. A quick digression, if a one-way ANOVA is conducted when the IV has 2 groups, there will be the same finding as conducting a two-sample independent t-test.

The two-way ANOVA is an extension to the one-way ANOVA and should be used if you have 2 categorical IVs with 2+ groups, and 1 continuous DV; this is a multi-factor design, specifically a 2 factor design. It’s a 2 factor design, because there are 2 IVs. In the ANOVA framework, IVs are often called factors and each category/group within an IV is called a level. Just as with a one-way ANOVA, a two-way ANOVA tests if there is a difference between the means, but it does not tell which groups differ. To get this information, one has to conduct post-hoc testing.

The logic that I am about to explain can be used to expand from a two-way ANOVA to an N-way ANOVA where N is any number of factors. As a warning, it’s about to get complex so take the time to understand everything.

To talk us through this, imagine a study where there are 2 IVs that each have 2 levels (groups/categories) and a continuous DV (outcome). We have independent variable A with 2 levels (A1 and A2), independent variable B with 2 levels (B1 and B2) and our DV which is most commonly annotated as Y. We want to see the effects of A on Y, B on Y, and to test if there is an interaction effect of A and B on Y. With this design, the following hypotheses are being tested:

  • Factor A x Factor B interactions:
    • H0 : there is not interaction
    • Ha : there is an interaction
  • Main effect of factor A:
    • H0 : μ1 = μ2
    • Ha : Not all of the means (μ) are equal
  • Main effect of factor B:
    • H0 : μ1 = μ2
    • Ha : Not all of the means (μ) are equal

With a multi-factor model, the significance of the interaction should be tested as the first step. If the interaction is significant, the sole main effects of factor A or factor B are not really interpretable by themselves since the significant interaction indicates that the effect of factor A depends on the level of factor A and the the level of factor B, and vise versa. If the interaction effect between factor A and factor B is not significant, then you remove the interaction from the model and test for significant main effects by themselves.

Expansion to N-way ANOVA

This model can be expanded to any number of factors. For each factor that is in the model, an interaction effect should be tested.

A 3-way ANOVA example; a study is testing the effects of the type of fertilizer, the amount of water, and the amount of sun on the mean crop yield. One should test the main interaction which includes all factors (fertilizer*water*sun), each sub-interaction (fertilizer*water, fertilizer*sun, water*sun), and the main effects. With this design, the following hypotheses are being tested:

  • fertilizer * water * sun interactions:
    • H0 : there is not interaction
    • Ha : there is an interaction
  • fertilizer * water interactions:
    • H0 : there is not interaction
    • Ha : there is an interaction
  • fertilizer * sun interactions:
    • H0 : there is not interaction
    • Ha : there is an interaction
  • water * sun interactions:
    • H0 : there is not interaction
    • Ha : there is an interaction
  • Main effect of fertilizer:
    • H0 : μ1 = μ2
    • Ha : Not all of the means (μ) are equal
  • Main effect of water:
    • H0 : μ1 = μ2
    • Ha : Not all of the means (μ) are equal
  • Main effect of sun:
    • H0 : μ1 = μ2
    • Ha : Not all of the means (μ) are equal

If the main interaction (fertilizer * water * sun) is significant, the other interactions do not matter as much because the significant main interaction is indicating the the effects of each factor depends on the levels of each other factor. To find out which combinations of the levels are different from other combinations would require post-hoc testing. If the main interaction is not significant, you remove it from the model and re-run it looking at the lower-level interactions, and so fourth.

This 3-way ANOVA should highlight how complex the designs can get very quickly. Hopefully this logic is clear and you can apply it to any number of factors in your N-way model.

Data used in this example

This data is fictional; a study was conducted to test the effect of 2 new fertilizers, and the amount of water on the mean crop yield. If you wish to follow along. You can import the required libraries and download the data using the following code:

import pandas
import researchpy as rp
import seaborn as sns

import statsmodels.api as sm
from statsmodels.formula.api import ols
import statsmodels.stats.multicomp



df = pandas.read_csv("https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/crop_yield.csv")

 

Now let’s take a look at the Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, as well as by the factors grouped together.

rp.summary_cont(df['Yield'])

 

Variable N Mean SD SE 95% Conf. Interval
0 Yield 20.0 29.04 4.230516 0.945972 27.060058 31.019942
rp.summary_cont(df.groupby(['Fert']))['Yield']

 

N Mean SD SE 95% Conf. Interval
Fert
A 10 30.90 3.283968 1.038482 28.864576 32.935424
B 10 27.18 4.394390 1.389628 24.456329 29.903671
rp.summary_cont(df.groupby(['Water']))['Yield']

 

N Mean SD SE 95% Conf. Interval
Water
High 10 30.82 3.244756 1.026082 28.808879 32.831121
Low 10 27.26 4.495974 1.421752 24.473367 30.046633
rp.summary_cont(df.groupby(['Fert', 'Water']))['Yield']

 

N Mean SD SE 95% Conf. Interval
Fert Water
A High 5 31.80 3.146427 1.407125 29.042036 34.557964
Low 5 30.00 3.512834 1.570987 26.920866 33.079134
B High 5 29.84 3.374611 1.509172 26.882023 32.797977
Low 5 24.52 3.791042 1.695406 21.197005 27.842995

2-way ANOVA Example

You could calculate the ANOVA by hand, but that’s unnecessary because statsmodels has good support already. In this example, I will use Type II sum of squares. There are 3 types of sum of squares that should be considered when conducting an ANOVA, by default Python and R uses Type I, whereas SAS tends to use Type III. The differences in the types of sum of squares is out of this page’s scope; but you should research the differences to decide which type you should use for your study.

    # Fits the model with the interaction term
    # This will also automatically include the main effects for each factor
    model = ols('Yield ~ C(Fert)*C(Water)', df).fit()

    # Seeing if the overall model is significant
    print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")
Overall model F( 3, 16) = 4.112, p = 0.0243

 

Excellent, the overall model is significant. Now we need to check the assumptions of the ANOVA, normality and homogeneity of variance. Statsmodels already provides model diagnostics in the model summary table. The summary table is also going to output information that we don’t necessarily care about coming from the ANOVA framework so we can ignore this and just look at the diagnostic information at the bottom of the table.

model.summary()

 

OLS Regression Results
Dep. Variable: Yield R-squared: 0.435
Model: OLS Adj. R-squared: 0.330
Method: Least Squares F-statistic: 4.112
Date: Mon, 03 Sep 2018 Prob (F-statistic): 0.0243
Time: 15:12:16 Log-Likelihood: -50.996
No. Observations: 20 AIC: 110.0
Df Residuals: 16 BIC: 114.0
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 31.8000 1.549 20.527 0.000 28.516 35.084
C(Fert)[T.B] -1.9600 2.191 -0.895 0.384 -6.604 2.684
C(Water)[T.Low] -1.8000 2.191 -0.822 0.423 -6.444 2.844
C(Fert)[T.B]:C(Water)[T.Low] -3.5200 3.098 -1.136 0.273 -10.088 3.048
Omnibus: 3.427 Durbin-Watson: 2.963
Prob(Omnibus): 0.180 Jarque-Bera (JB): 1.319
Skew: -0.082 Prob(JB): 0.517
Kurtosis: 1.752 Cond. No. 6.85

The Durban-Watson tests is to detect the presence of autocorrelation, Jarque-Bera tests the assumption of normality, Omnibus tests the assumption of homogeneity of variance, and the Condition Number assess multicollinearity. Condition Number values over 20 are indicative of multicollinearity. The model passes the assumption check, which is excellent. Now let’s take a look at the ANOVA table.

# Creates the ANOVA table
res = sm.stats.anova_lm(model, typ= 2)
res

 

sum_sq df F PR(>F)
C(Fert) 69.192 1.0 5.766000 0.028847
C(Water) 63.368 1.0 5.280667 0.035386
C(Fert):C(Water) 15.488 1.0 1.290667 0.272656
Residual 192.000 16.0 NaN NaN

The interaction term is not significant. This indicates that there is no interaction effect between the type of fertilizer and the amount of water on the mean crop yield. Since this is not significant, the interaction term is to be removed from the model and it needs to be re-ran so we can look at the main effects of each factor independently.

# Fits the model
model2 = ols('Yield ~ C(Fert)+ C(Water)', df).fit()

print(f"Overall model F({model2.df_model: .0f},{model2.df_resid: .0f}) = {model2.fvalue: .3f}, p = {model2.f_pvalue: .4f}")
Overall model F( 2, 17) = 5.430, p = 0.0150

 

Excellent, the model is still significant. Again, let’s look at the model’s summary table to check the assumptions.

model2.summary()

 

OLS Regression Results
Dep. Variable: Yield R-squared: 0.390
Model: OLS Adj. R-squared: 0.318
Method: Least Squares F-statistic: 5.430
Date: Mon, 03 Sep 2018 Prob (F-statistic): 0.0150
Time: 15:18:08 Log-Likelihood: -51.772
No. Observations: 20 AIC: 109.5
Df Residuals: 17 BIC: 112.5
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 32.6800 1.353 24.153 0.000 29.825 35.535
C(Fert)[T.B] -3.7200 1.562 -2.381 0.029 -7.016 -0.424
C(Water)[T.Low] -3.5600 1.562 -2.279 0.036 -6.856 -0.264
Omnibus: 1.169 Durbin-Watson: 2.736
Prob(Omnibus): 0.557 Jarque-Bera (JB): 0.820
Skew: -0.081 Prob(JB): 0.664
Kurtosis: 2.022 Cond. No. 3.19

Just as with before, the model without the interaction term passes the assumption check. Now let’s look at this model’s ANOVA table.

# Creates the ANOVA table
res2 = sm.stats.anova_lm(model2, typ= 2)
res2

 

sum_sq df F PR(>F)
C(Fert) 69.192 1.0 5.669070 0.029228
C(Water) 63.368 1.0 5.191895 0.035887
Residual 207.488 17.0 NaN NaN

Each factor has an independent significant effect on the mean crop yield. While it’s good to know if there is a statistically significant effect of some intervention on the outcome, it’s as important to know the size of the effect the intervention has on the outcome. Statistical significance does not always translate into a large effect in the world. This is important to consider because everything cost money and time. To see this, we can calculate the effect size. The following code uses the ANOVA table produced by statsmodels and appends the effect size measures of eta-squared (η2) and omega-squared (ω2)

# Calculating effect size
def anova_table(aov):
    aov['mean_sq'] = aov[:]['sum_sq']/aov[:]['df']
    
    aov['eta_sq'] = aov[:-1]['sum_sq']/sum(aov['sum_sq'])
    
    aov['omega_sq'] = (aov[:-1]['sum_sq']-(aov[:-1]['df']*aov['mean_sq'][-1]))/(sum(aov['sum_sq'])+aov['mean_sq'][-1])
    
    cols = ['sum_sq', 'mean_sq', 'df', 'F', 'PR(>F)', 'eta_sq', 'omega_sq']
    aov = aov[cols]
    return aov

anova_table(res2)

 

sum_sq mean_sq df F PR(>F) eta_sq omega_sq
C(Fert) 69.192 69.192000 1.0 5.669070 0.029228 0.203477 0.161778
C(Water) 63.368 63.368000 1.0 5.191895 0.035887 0.186350 0.145244
Residual 207.488 12.205176 17.0 NaN NaN NaN NaN

ω2 is a better measure of effect size since it’s unbiased in it’s calculation. It takes into account the degrees of freedom, whereas η2 does not. Side note, η2 and R2 are the same thing in the ANOVA framework. Each factor, fertilizer and water, has a small effect on the mean crop yield.

Knowing that using fertilizer or using water has a statistical significant effect on the mean crop yield should lead you to ask the following question, “is there a difference between the type of fertilizer, and is there a difference in the amount of water used?” We can answer this question with post-hoc testing.

Post-hoc Testing

There are a few different methods of post-hoc testing to find a difference between groups of factors. I will show how to use Tukey’s HSD. We have to test for difference for each factor separately.

mc = statsmodels.stats.multicomp.MultiComparison(df['Yield'], df['Fert'])
mc_results = mc.tukeyhsd()
print(mc_results)

 

Multiple Comparison of Means – Tukey HSD,FWER=0.05
group1 group2 meandiff lower upper reject
A B -3.72 -7.3647 -0.0753 True

There is a statistically significant different between the mean crop yield between fertilizer A and B; fertilizer A yields a significantly higher crop yield than fertilizer B.

mc = statsmodels.stats.multicomp.MultiComparison(df['Yield'], df['Water'])
mc_results = mc.tukeyhsd()
print(mc_results)

 

Multiple Comparison of Means – Tukey HSD,FWER=0.05
group1 group2 meandiff lower upper reject
High Low -3.56 -7.2436 0.1236 False

There is not a statistically significant difference in the mean crop yield between the amount of water used.