Norfeldt Norfeldt - 9 months ago 172
Python Question

Perform 2 sample t-test

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

n is different for sample 1 and sample 1 and I want to do a weighted (take n into account) two-tailed t-test.

I tried using the scipy.stat module by creating my numbers with

, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.

Any help on how to get the p-value would be highly appreciated.


If you have the original data as arrays a and b, you can use scipy.stats.ttest_ind with the argument equal_var=False:

t, p = ttest_ind(a, b, equal_var=False)

If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats (added to scipy in version 0.16) or from the formula (

The following script shows the possibilities.

from __future__ import print_function

import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr


# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)

# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind:            t = %g  p = %g" % (t, p))

# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size
adof = na - 1

bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1

# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
                              bbar, np.sqrt(bvar), nb,
print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))

# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))

print("formula:              t = %g  p = %g" % (tf, pf))

The output:

ttest_ind:            t = -1.5827  p = 0.118873
ttest_ind_from_stats: t = -1.5827  p = 0.118873
formula:              t = -1.5827  p = 0.118873