The independent t-test is also called the two sample t-test, student’s t-test, or unpaired t-test. It’s an univariate test that tests for a significant difference between the mean of two independent (unrelated) groups.

The hypothesis being tested is:

  • Null hypothesis (H0): u1 = u2, which translates to the mean of sample 1 is equal to the mean of sample 2
  • Alternative hypothesis (HA): u1 ≠ u2, which translates to the mean of sample 1 is not equal to the mean of sample 2

To determine if there is a significant difference between two groups one must set a p-value before hand; if the test produces a p-value which is less than the set p-value then the test is said to be significant and that there is a significant difference between the two means. This means one rejects the null hypothesis in support of the alternative. Most commonly the p-value is set to 0.05.

Independent T-test Assumptions

Like every test, this inferential statistic test has assumptions. The assumptions that the data must meet in order for the test results to be valid are:

  • The samples are independently and randomly drawn
  • The distribution of the residuals between the two groups should follow the normal distribution
  • The variances between the two groups are equal

If any of these assumptions are violated then another test should be used. The dependent variable (outcome being measured) should be continuous which is measured on an interval or ratio scale.

Data used in this example

The data used in this example is from and was posted by the user Web IR. The link to the data set is here or can be downloaded from our GitHub to follow along in this example – code will be shown later.

The data set contains the sepal length and width, as well as the petal length and width of various floral species. I will test to see if there is a significant difference in the sepal width between the species Iris-setosa and Iris-versicolor which are variables “sepal_width” and “species” respectively.

Let’s import the required libraries, the data, and take a look at the data!

# Importing the required libraries
import pandas as pd

# Two different methods of conducting a t-test
import researchpy as rp
from scipy import stats

df = pd.read_csv("")

It’s always a good idea to get a feel for the data and look at some general descriptive statistics. Pandas has a built-in in .describe() method which gives good information, and Researchpy also has a method that provides good descriptive statistics. Some information of there two methods overlap, but the .summary_cont() method from Researchpy also calculates the standard error and the confidence intervals. Click on each method to be taken to their full documentation page. I will demonstrate each below.

# Showing descriptive statistics from pandas.describe()


species count mean std min 25% 50% 75% max
Iris-setosa 50.0 3.418 0.381024 2.3 3.125 3.4 3.675 4.4
Iris-versicolor 50.0 2.770 0.313798 2.0 2.525 2.8 3.000 3.4
Iris-virginica 50.0 2.974 0.322497 2.2 2.800 3.0 3.175 3.8

# Showing descriptive statistics from researchpy.summary_cont()

N Mean SD SE 95% Conf. Interval
Iris-setosa 50 3.418 0.381024 0.053885 3.311313 3.524687
Iris-versicolor 50 2.770 0.313798 0.044378 2.682136 2.857864
Iris-virginica 50 2.974 0.322497 0.045608 2.883701 3.064299

Ignoring the species Iris-virginica, since it is not of interest in this example, there is a difference between the average sepal width between the setosa and versicolor species, the standard deviations are small for both groups, and the 95% confidence intervals do not overlap so there should be a significant difference between these two – but, let’s not get ahead.

To make the code a bit cleaner to read in the rest of the example, I will create 2 data frames that are subsets of the original data where each data frame only contains data for a respective flower species. The index’s should be reset so the residuals can be calculated later in the example.

setosa = df[(df['species'] == 'Iris-setosa')]
setosa.reset_index(inplace= True)

versicolor = df[(df['species'] == 'Iris-versicolor')]
versicolor.reset_index(inplace= True)

Assumption Check

Before the t-test can be conducted, the assumptions of the t-test need to be checked to see if the t-test results can be trusted.

Homogeneity of variances

First I will check for homogeneity of variances. To do this, I will use Levene’s test for homogeneity of variance which is the stats.levene() method from scipy.stats. Full documentation can be found here.

stats.levene(setosa['sepal_width'], versicolor['sepal_width'])
LeveneResult(statistic=0.66354593329432332, pvalue=0.41728596812962038)

The test is not significant meaning there is homogeneity of variances and we can proceed. If the test were to be significant, a viable alternative would be to conduct a Welch’s t-test.

Normal distribution of residuals

Next to test the assumption of normality. The residuals needs to be normally distributed. To calculate the residuals between the groups, subtract the values of one group from the values of the other group.

diff = setosa['sepal_width'] - versicolor['sepal_width']

Checking for normality can be done visually or with a formula test. Visually, one can use a p-p plot, a q-q plot, or histogram, and/or one can use the Shapiro-Wilk test to formally test for normality. To test for normality formally, use stats.shaprio() which is from the scipy.stats library that was imported. Full documentation on this method can be found here.

First let’s check for normality visually with a p-p plot and a histogram plot.

diff.plot(kind= "hist", title= "Sepal Width Residuals")
plt.xlabel("Length (cm)")
plt.savefig("Residuals Plot of Sepal Width.png")

p-p plot normality normal distribution python pandas scipy.stats scipy stats researchpy rp

If you are unfamiliar with how to read a p-p or q-q plot, the dots should fall on the red line. If the dots are not on the red line then it’s an indication that there is deviation from normality. Some deviations from normality is fine, as long as it’s not severe.

The p-p plot shows that the data maintains normality. Let’s take a look at the histogram next.

diff.plot(kind= "hist", title= "Sepal Width Residuals")
plt.xlabel("Length (cm)")
plt.savefig("Residuals Plot of Sepal Width.png")

histogram t-test t test independent python pandas scipy scipy.stats researchpy

Between the two, I prefer to use a p-p or q-q plot over the histogram when visually checking for normality. I find it easier to tell where, if any, deviations are present. Now for the formal test on normality.

(0.9859335422515869, 0.8108891248703003)

The output is not labeled, but the first value is the W test statistic and the second value is the p-value. Since the test statistic does not produce a significant p-value, the data is indicated to be normally distributed.

The data met all the assumptions for the t-test which indicates the results can be trusted and the t-test is an appropriate test to be used.

Independent t-test example

I will demonstrate how to conduct the independent t-test using methods from scipy.stats and from researchpy.

Independent t-test using scipy.stats

To conduct the independent t-test using scipy.stats, use the stats.ttest_ind() method. Full documentation can be found here.

stats.ttest_ind(setosa['sepal_width'], versicolor['sepal_width'])
Ttest_indResult(statistic=9.2827725555581111, pvalue=4.3622390160102143e-15)

This method is missing some useful information, like the degrees of freedom, the difference between the groups mean, and a measure of effect size. All of this can be calculated manually. However, this information is provided by the method from researchpy.

Independent t-test using researchpy

To conduct the independent t-test using researchpy, use the researchpy.ttest() method. Full documentation for this method can be found here. This method outputs the returned information in two dataframes. For a cleaner output and presentation, I will assign the descriptive statistics table (the first table returned) and the test results (second table returned) as separate objects.

descriptives, results = rp.ttest(setosa['sepal_width'], versicolor['sepal_width'])


Variable N Mean SD SE 95% Conf. Interval
0 sepal_width 50.0 3.418 0.381024 0.053885 3.309714 3.526286
1 sepal_width 50.0 2.770 0.313798 0.044378 2.680820 2.859180
2 combined 100.0 3.094 0.476057 0.047606 2.999540 3.188460

This descriptive statistics table is the same as the one produced earlier for this example. However, this method uses the column name of the variable which happens to be the exact same for both dataframes in this example making it not as clean as the table produced earlier. Now let’s see the results table.


Independent t-test results
0 Difference (sepal_width – sepal_width) = 0.6480
1 Degrees of freedom = 98.0000
2 t = 9.2828
3 Two side test p value = 0.0000
4 Mean of sepal_width > mean of sepal_width p va… 1.0000
5 Mean of sepal_width < mean of sepal_width p va... 0.0000
6 Cohen’s d = 1.8566
7 Hedge’s g = 1.8423
8 Glass’s delta = 1.7007
9 r = 0.6840

The results are the same using both methods, there is a significant difference in the sepal length between the floral species setosa and versicolor. Using researchpy, we also get one-sided p-values, the degrees of freedom, and a few effect size measures.

Interpretation of Results

The purpose of the current study was to test if there is a significant difference in the sepal width between the floral species Iris-setosa and Iris-versicolor. Iris-setosa’s average sepal width (M= 3.418, SD= 0.381) is wider and has slightly larger variation than Iris-versicolor (M= 2.770, SD= 0.314). An independent t-test was used to test for a difference. There is a statistically significant difference between the sepal width of Iris-setosa and Iris-versicolor (t(98)= 9.282, p< 0.0001, r= 0.6840).


Jaccard, J., and Becker, M. (2002). Statistics for the behavioral sciences (4th ed.). Belmon, CA: Wadsworth.
Ott, R., and Longnecker, M. An introduction to statistical methods and data analysis (6th ed.). Belmont, CA: Brooks/Cole.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.