What is in this section:

## Introduction to two-way ANOVA and expansion to an N-way ANOVA

Let’s start with when to use a two-way ANOVA, and how it differs from a one-way ANOVA. Hopefully you already know when to use a one-way ANOVA, if not, a one-way ANOVA should be used if you have 1 categorical independent variable (IV) with 3+ categories or groups, and 1 continuous dependent variable (DV); this is a 1 factor design. A quick digression, if a one-way ANOVA is conducted when the IV has 2 groups, there will be the same finding as conducting a two-sample independent t-test.

The two-way ANOVA is an extension to the one-way ANOVA and should be used if you have 2 categorical IVs with 2+ groups, and 1 continuous DV; this is a multi-factor design, specifically a 2 factor design. It’s a 2 factor design, because there are 2 IVs. In the ANOVA framework, IVs are often called factors and each category/group within an IV is called a level. Just as with a one-way ANOVA, a two-way ANOVA tests if there is a difference between the means, but it does not tell which groups differ. To get this information, one has to conduct post-hoc testing.

The logic that I am about to explain can be used to expand from a two-way ANOVA to an N-way ANOVA where N is any number of factors. As a warning, it’s about to get complex so take the time to understand everything.

To talk us through this, imagine a study where there are 2 IVs that each have 2 levels (groups/categories) and a continuous DV (outcome). We have independent variable A with 2 levels (A1 and A2), independent variable B with 2 levels (B1 and B2) and our DV which is most commonly annotated as Y. We want to see the effects of A on Y, B on Y, and to test if there is an interaction effect of A and B on Y. With this design, the following hypotheses are being tested:

- Factor A x Factor B interactions:
- H
_{0}: there is not interaction - H
_{a}: there is an interaction

- H
- Main effect of factor A:
- H
_{0}: μ_{1}= μ_{2} - H
_{a}: Not all of the means (μ) are equal

- H
- Main effect of factor B:
- H
_{0}: μ_{1}= μ_{2} - H
_{a}: Not all of the means (μ) are equal

- H

With a multi-factor model, the significance of the interaction should be tested as the first step. If the interaction is significant, the sole main effects of factor A or factor B are not really interpretable by themselves since the significant interaction indicates that the effect of factor A depends on the level of factor A and the the level of factor B, and vise versa. If the interaction effect between factor A and factor B is not significant, then you remove the interaction from the model and test for significant main effects by themselves.

### Expansion to N-way ANOVA

This model can be expanded to any number of factors. For each factor that is in the model, an interaction effect should be tested.

A 3-way ANOVA example; a study is testing the effects of the type of fertilizer, the amount of water, and the amount of sun on the mean crop yield. One should test the main interaction which includes all factors (fertilizer*water*sun), each sub-interaction (fertilizer*water, fertilizer*sun, water*sun), and the main effects. With this design, the following hypotheses are being tested:

- fertilizer * water * sun interactions:
- H
_{0}: there is not interaction - H
_{a}: there is an interaction

- H
- fertilizer * water interactions:
- H
_{0}: there is not interaction - H
_{a}: there is an interaction

- H
- fertilizer * sun interactions:
- H
_{0}: there is not interaction - H
_{a}: there is an interaction

- H
- water * sun interactions:
- H
_{0}: there is not interaction - H
_{a}: there is an interaction

- H
- Main effect of fertilizer:
- H
_{0}: μ_{1}= μ_{2} - H
_{a}: Not all of the means (μ) are equal

- H
- Main effect of water:
- H
_{0}: μ_{1}= μ_{2} - H
_{a}: Not all of the means (μ) are equal

- H
- Main effect of sun:
- H
_{0}: μ_{1}= μ_{2} - H
_{a}: Not all of the means (μ) are equal

- H

If the main interaction (fertilizer * water * sun) is significant, the other interactions do not matter as much because the significant main interaction is indicating the the effects of each factor depends on the levels of each other factor. To find out which combinations of the levels are different from other combinations would require post-hoc testing. If the main interaction is not significant, you remove it from the model and re-run it looking at the lower-level interactions, and so fourth.

This 3-way ANOVA should highlight how complex the designs can get very quickly. Hopefully this logic is clear and you can apply it to any number of factors in your N-way model.

##### Data used in this example

This data is fictional; a study was conducted to test the effect of 2 new fertilizers, and the amount of water on the mean crop yield. If you wish to follow along. You can import the required libraries and download the data using the following code:

import pandas import researchpy as rp import seaborn as sns import statsmodels.api as sm from statsmodels.formula.api import ols import statsmodels.stats.multicomp df = pandas.read_csv("https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/crop_yield.csv")

Now let’s take a look at the Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, as well as by the factors grouped together.

rp.summary_cont(df['Yield'])

Variable | N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|---|

0 | Yield | 20.0 | 29.04 | 4.230516 | 0.945972 | 27.060058 | 31.019942 |

rp.summary_cont(df.groupby(['Fert']))['Yield']

N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|

Fert | ||||||

A | 10 | 30.90 | 3.283968 | 1.038482 | 28.864576 | 32.935424 |

B | 10 | 27.18 | 4.394390 | 1.389628 | 24.456329 | 29.903671 |

rp.summary_cont(df.groupby(['Water']))['Yield']

N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|

Water | ||||||

High | 10 | 30.82 | 3.244756 | 1.026082 | 28.808879 | 32.831121 |

Low | 10 | 27.26 | 4.495974 | 1.421752 | 24.473367 | 30.046633 |

rp.summary_cont(df.groupby(['Fert', 'Water']))['Yield']

N | Mean | SD | SE | 95% Conf. | Interval | ||
---|---|---|---|---|---|---|---|

Fert | Water | ||||||

A | High | 5 | 31.80 | 3.146427 | 1.407125 | 29.042036 | 34.557964 |

Low | 5 | 30.00 | 3.512834 | 1.570987 | 26.920866 | 33.079134 | |

B | High | 5 | 29.84 | 3.374611 | 1.509172 | 26.882023 | 32.797977 |

Low | 5 | 24.52 | 3.791042 | 1.695406 | 21.197005 | 27.842995 |

## 2-way ANOVA Example

You could calculate the ANOVA by hand, but that’s unnecessary because statsmodels has good support already. In this example, I will use Type II sum of squares. There are 3 types of sum of squares that should be considered when conducting an ANOVA, by default Python and R uses Type I, whereas SAS tends to use Type III. The differences in the types of sum of squares is out of this page’s scope; but you should research the differences to decide which type you should use for your study.

# Fits the model with the interaction term # This will also automatically include the main effects for each factor model = ols('Yield ~ C(Fert)*C(Water)', df).fit() # Seeing if the overall model is significant print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")

Excellent, the overall model is significant. Now we need to check the assumptions of the ANOVA, normality and homogeneity of variance. Statsmodels already provides model diagnostics in the model summary table. The summary table is also going to output information that we don’t necessarily care about coming from the ANOVA framework so we can ignore this and just look at the diagnostic information at the bottom of the table.

model.summary()

Dep. Variable: | Yield | R-squared: | 0.435 |
---|---|---|---|

Model: | OLS | Adj. R-squared: | 0.330 |

Method: | Least Squares | F-statistic: | 4.112 |

Date: | Mon, 03 Sep 2018 | Prob (F-statistic): | 0.0243 |

Time: | 15:12:16 | Log-Likelihood: | -50.996 |

No. Observations: | 20 | AIC: | 110.0 |

Df Residuals: | 16 | BIC: | 114.0 |

Df Model: | 3 | ||

Covariance Type: | nonrobust |

coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|

Intercept | 31.8000 | 1.549 | 20.527 | 0.000 | 28.516 | 35.084 |

C(Fert)[T.B] | -1.9600 | 2.191 | -0.895 | 0.384 | -6.604 | 2.684 |

C(Water)[T.Low] | -1.8000 | 2.191 | -0.822 | 0.423 | -6.444 | 2.844 |

C(Fert)[T.B]:C(Water)[T.Low] | -3.5200 | 3.098 | -1.136 | 0.273 | -10.088 | 3.048 |

Omnibus: | 3.427 | Durbin-Watson: | 2.963 |
---|---|---|---|

Prob(Omnibus): | 0.180 | Jarque-Bera (JB): | 1.319 |

Skew: | -0.082 | Prob(JB): | 0.517 |

Kurtosis: | 1.752 | Cond. No. | 6.85 |

The Durban-Watson tests is to detect the presence of autocorrelation, Jarque-Bera tests the assumption of normality, Omnibus tests the assumption of homogeneity of variance, and the Condition Number assess multicollinearity. Condition Number values over 20 are indicative of multicollinearity. The model passes the assumption check, which is excellent. Now let’s take a look at the ANOVA table.

# Creates the ANOVA table res = sm.stats.anova_lm(model, typ= 2) res

sum_sq | df | F | PR(>F) | |
---|---|---|---|---|

C(Fert) | 69.192 | 1.0 | 5.766000 | 0.028847 |

C(Water) | 63.368 | 1.0 | 5.280667 | 0.035386 |

C(Fert):C(Water) | 15.488 | 1.0 | 1.290667 | 0.272656 |

Residual | 192.000 | 16.0 | NaN | NaN |

The interaction term is not significant. This indicates that there is no interaction effect between the type of fertilizer and the amount of water on the mean crop yield. Since this is not significant, the interaction term is to be removed from the model and it needs to be re-ran so we can look at the main effects of each factor independently.

# Fits the model model2 = ols('Yield ~ C(Fert)+ C(Water)', df).fit() print(f"Overall model F({model2.df_model: .0f},{model2.df_resid: .0f}) = {model2.fvalue: .3f}, p = {model2.f_pvalue: .4f}")

Excellent, the model is still significant. Again, let’s look at the model’s summary table to check the assumptions.

model2.summary()

Dep. Variable: | Yield | R-squared: | 0.390 |
---|---|---|---|

Model: | OLS | Adj. R-squared: | 0.318 |

Method: | Least Squares | F-statistic: | 5.430 |

Date: | Mon, 03 Sep 2018 | Prob (F-statistic): | 0.0150 |

Time: | 15:18:08 | Log-Likelihood: | -51.772 |

No. Observations: | 20 | AIC: | 109.5 |

Df Residuals: | 17 | BIC: | 112.5 |

Df Model: | 2 | ||

Covariance Type: | nonrobust |

coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|

Intercept | 32.6800 | 1.353 | 24.153 | 0.000 | 29.825 | 35.535 |

C(Fert)[T.B] | -3.7200 | 1.562 | -2.381 | 0.029 | -7.016 | -0.424 |

C(Water)[T.Low] | -3.5600 | 1.562 | -2.279 | 0.036 | -6.856 | -0.264 |

Omnibus: | 1.169 | Durbin-Watson: | 2.736 |
---|---|---|---|

Prob(Omnibus): | 0.557 | Jarque-Bera (JB): | 0.820 |

Skew: | -0.081 | Prob(JB): | 0.664 |

Kurtosis: | 2.022 | Cond. No. | 3.19 |

Just as with before, the model without the interaction term passes the assumption check. Now let’s look at this model’s ANOVA table.

# Creates the ANOVA table res2 = sm.stats.anova_lm(model2, typ= 2) res2

sum_sq | df | F | PR(>F) | |
---|---|---|---|---|

C(Fert) | 69.192 | 1.0 | 5.669070 | 0.029228 |

C(Water) | 63.368 | 1.0 | 5.191895 | 0.035887 |

Residual | 207.488 | 17.0 | NaN | NaN |

Each factor has an independent significant effect on the mean crop yield. While it’s good to know if there is a statistically significant effect of some intervention on the outcome, it’s as important to know the size of the effect the intervention has on the outcome. Statistical significance does not always translate into a large effect in the world. This is important to consider because everything cost money and time. To see this, we can calculate the effect size. The following code uses the ANOVA table produced by statsmodels and appends the effect size measures of eta-squared (η^{2}) and omega-squared (ω^{2})

# Calculating effect size def anova_table(aov): aov['mean_sq'] = aov[:]['sum_sq']/aov[:]['df'] aov['eta_sq'] = aov[:-1]['sum_sq']/sum(aov['sum_sq']) aov['omega_sq'] = (aov[:-1]['sum_sq']-(aov[:-1]['df']*aov['mean_sq'][-1]))/(sum(aov['sum_sq'])+aov['mean_sq'][-1]) cols = ['sum_sq', 'mean_sq', 'df', 'F', 'PR(>F)', 'eta_sq', 'omega_sq'] aov = aov[cols] return aov anova_table(res2)

sum_sq | mean_sq | df | F | PR(>F) | eta_sq | omega_sq | |
---|---|---|---|---|---|---|---|

C(Fert) | 69.192 | 69.192000 | 1.0 | 5.669070 | 0.029228 | 0.203477 | 0.161778 |

C(Water) | 63.368 | 63.368000 | 1.0 | 5.191895 | 0.035887 | 0.186350 | 0.145244 |

Residual | 207.488 | 12.205176 | 17.0 | NaN | NaN | NaN | NaN |

ω^{2} is a better measure of effect size since it’s unbiased in it’s calculation. It takes into account the degrees of freedom, whereas η^{2} does not. Side note, η^{2} and R^{2} are the same thing in the ANOVA framework. Each factor, fertilizer and water, has a small effect on the mean crop yield.

Knowing that using fertilizer or using water has a statistical significant effect on the mean crop yield should lead you to ask the following question, “is there a difference between the type of fertilizer, and is there a difference in the amount of water used?” We can answer this question with post-hoc testing.

## Post-hoc Testing

There are a few different methods of post-hoc testing to find a difference between groups of factors. I will show how to use Tukey’s HSD. We have to test for difference for each factor separately.

mc = statsmodels.stats.multicomp.MultiComparison(df['Yield'], df['Fert']) mc_results = mc.tukeyhsd() print(mc_results)

group1 | group2 | meandiff | lower | upper | reject |
---|---|---|---|---|---|

A | B | -3.72 | -7.3647 | -0.0753 | True |

There is a statistically significant different between the mean crop yield between fertilizer A and B; fertilizer A yields a significantly higher crop yield than fertilizer B.

mc = statsmodels.stats.multicomp.MultiComparison(df['Yield'], df['Water']) mc_results = mc.tukeyhsd() print(mc_results)

group1 | group2 | meandiff | lower | upper | reject |
---|---|---|---|---|---|

High | Low | -3.56 | -7.2436 | 0.1236 | False |

There is not a statistically significant difference in the mean crop yield between the amount of water used.