Norfeldt - 1 year ago 261

Python Question

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

n is different for sample 1 and sample 1 and I want to do a weighted (take n into account) two-tailed t-test.

I tried using the scipy.stat module by creating my numbers with

`np.random.normal`

Any help on how to get the p-value would be highly appreciated.

Answer Source

If you have the original data as arrays `a`

and `b`

, you can use `scipy.stats.ttest_ind`

with the argument `equal_var=False`

:

```
t, p = ttest_ind(a, b, equal_var=False)
```

If you have only the summary statistics of the two data sets, you can calculate the t value using `scipy.stats.ttest_ind_from_stats`

(added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).

The following script shows the possibilities.

```
from __future__ import print_function
import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr
np.random.seed(1)
# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)
# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind: t = %g p = %g" % (t, p))
# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size
adof = na - 1
bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1
# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
bbar, np.sqrt(bvar), nb,
equal_var=False)
print("ttest_ind_from_stats: t = %g p = %g" % (t2, p2))
# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))
print("formula: t = %g p = %g" % (tf, pf))
```

The output:

```
ttest_ind: t = -1.5827 p = 0.118873
ttest_ind_from_stats: t = -1.5827 p = 0.118873
formula: t = -1.5827 p = 0.118873
```