What is it?

Welch’s t-test is a nonparametric univariate test that tests for a significant difference between the mean of two unrelated groups. It is an alternative to the independent t-test when there is a violation in the assumption of equality of variances.

The hypothesis being tested is:

  • Null hypothesis (H0): u1 = u2, which translates to the mean of sample 1 is equal to the mean of sample 2
  • Alternative hypothesis (HA): u1 ≠ u2, which translates to the mean of sample 1 is not equal to the mean of sample 2

If the p-value is less than what is tested at, most commonly 0.05, one can reject the null hypothesis.

Welch’s t-test Assumptions

Like every test, this inferential statistic test has assumptions. The assumptions that the data must meet in order for the test results to be valid are:

  • The independent variable (IV) is categorical with at least two levels (groups)
  • The dependent variable (DV) is continuous which is measured on an interval or ratio scale
  • The distribution of the two groups should follow the normal distribution

If any of these assumptions are violated then another test should be used.

Data used in this example

The data used in this example is from Kaggle.com and was posted by the user Web IR. The link to the data set is here. The data set contains the sepal and petal length and width of various floral species. We will be testing to see if there is a significant difference in the petal lenght between the species Iris-setosa and Iris-virginica which are variables “petal_length” and “species” respectively.

Let’s import pandas as pd, the data, and then take a look at what we will be working with!

import pandas as pd

df= pd.read_csv("Iris_Data.csv")



species count mean std min 25% 50% 75% max
Iris-setosa 50.0 1.464 0.173511 1.0 1.4 1.50 1.575 1.9
Iris-versicolor 50.0 4.260 0.469911 3.0 4.0 4.35 4.600 5.1
Iris-virginica 50.0 5.552 0.551895 4.5 5.1 5.55 5.875 6.9

To make the code in the next steps a bit cleaner to read, I will create 2 data frames that are subsets of the original data where each data frame only contains data for a respective flower species.

setosa = df[(df['species'] == 'Iris-setosa')]
virginica = df[(df['species'] == 'Iris-virginica')]


Welch’s t-test Example

The first thing we need to do is import scipy.stats as stats and then test our assumptions. We can test the assumption of normality using the stats.shapiro(). Unfortunately, the output is not labeled. The first value in the tuple is the W test statistic, and the second value is the p-value.

from scipy import stats

(0.9549458622932434, 0.05464918911457062)


(0.9621862769126892, 0.10977369546890259)


Neither of the variables of interest violates the assumption of normality so we can continue with our analysis plan. To conduct a Welch’s t-test, one needs to use the stats.ttest_ind() method while passing “False” in the “equal_var=” argument.

stats.ttest_ind(setosa['petal_length'], virginica['petal_length'], equal_var = False)
Ttest_indResult(statistic=-49.965703359355636, pvalue=9.7138670616970964e-50)


The p-value is significant, therefore one can reject the null hypothesis in support of the alternative.

Another piece of information you will need to report is the degrees of freedom (DoF). However, there is not a built-in method for this currently. Below are 2 functions that will give you what you need. The first, only calculates the DoF as a two tail test and returns it. The second, conducts the Welch’s test, calculates the DoF as a two tail test, and returns all the needed information.

def welch_dof(x,y):
        dof = (x.var()/x.size + y.var()/y.size)**2 / ((x.var()/x.size)**2 / (x.size-1) + (y.var()/y.size)**2 / (y.size-1))
        print(f"Welch-Satterthwaite Degrees of Freedom= {dof:.4f}")
welch_dof(setosa['petal_length'], virginica['petal_length'])
Welch-Satterthwaite Degrees of Freedom= 58.5928


def welch_ttest(x, y): 
    ## Welch-Satterthwaite Degrees of Freedom ##
    dof = (x.var()/x.size + y.var()/y.size)**2 / ((x.var()/x.size)**2 / (x.size-1) + (y.var()/y.size)**2 / (y.size-1))
    t, p = stats.ttest_ind(x, y, equal_var = False)
          f"Welch's t-test= {t:.4f}", "\n",
          f"p-value = {p:.4f}", "\n",
          f"Welch-Satterthwaite Degrees of Freedom= {dof:.4f}")

welch_ttest(setosa['petal_length'], virginica['petal_length'])
Welch’s t-test= -49.9657
p-value = 0.0000
Welch-Satterthwaite Degrees of Freedom= 58.5928


Welch’s t-test Interpretation

The current study aimed to test if there was a significant difference in the petal length between the floral species Setosa and Virginica. Setosa has shorter petal length (M= 1.464 units, SD= 0.174 units) compared to Virginica (M= 5.552 units, SD= 0.552 units). Welch’s t-test was selected to analyze the data because Levene’s test for homogeneity of variances indicated unequal variances between groups (F= 39.977, p< 0.0001). The difference in petal length between the two species is significantly different (Welch’s t(-49.966)= 58.593, p< 0.0001).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.