Norfeldt - 1 year ago 385
Python Question

# Perform 2 sample t-test

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

n is different for sample 1 and sample 1 and I want to do a weighted (take n into account) two-tailed t-test.

I tried using the scipy.stat module by creating my numbers with

`np.random.normal`
, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.

Any help on how to get the p-value would be highly appreciated.

If you have the original data as arrays `a` and `b`, you can use `scipy.stats.ttest_ind` with the argument `equal_var=False`:

``````t, p = ttest_ind(a, b, equal_var=False)
``````

If you have only the summary statistics of the two data sets, you can calculate the t value using `scipy.stats.ttest_ind_from_stats` (added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).

The following script shows the possibilities.

``````from __future__ import print_function

import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr

np.random.seed(1)

# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)

# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind:            t = %g  p = %g" % (t, p))

# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size

bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1

# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
bbar, np.sqrt(bvar), nb,
equal_var=False)
print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))

# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))

print("formula:              t = %g  p = %g" % (tf, pf))
``````

The output:

``````ttest_ind:            t = -1.5827  p = 0.118873
ttest_ind_from_stats: t = -1.5827  p = 0.118873
formula:              t = -1.5827  p = 0.118873
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download