bcf - 1 year ago 207
Python Question

# Scipy Non-central Chi-Squared Random Variable

Consider a sum of

`n`
squared iid normal random variables
`S = sum (Z^2(mu, sig^2))`
. According to this question,
`S / sig^2`
has a noncentral chi-squared distribution with degrees of freedom =
`n`
and non-centrality parameter =
`n*mu^2`
.

However, compare generating
`N`
of these variables
`S`
by summing squared normals with generating
`N`
noncentral chi-squared random variables directly using
`scipy.ncx2`
:

``````import numpy as np
from scipy.stats import ncx2, chi2
import matplotlib.pyplot as plt

n = 1000  # number of normals in sum
N_MC = 100000  # number of trials

mu = 0.05
sig = 0.3

### Generate sums of squared normals ###
Z = np.random.normal(loc=mu, scale=sig, size=(N_MC, n))
S = np.sum(Z**2, axis=1)

### Generate non-central chi2 RVs directly ###
dof = n
non_centrality = n*mu**2
NCX2 = sig**2 * ncx2.rvs(dof, non_centrality, size=N_MC)
# NCX2 = sig**2 * chi2.rvs(dof, size=N_MC)  # for mu = 0.0

### Plot histos ###
fig, ax = plt.subplots()
ax.hist(S, bins=50, label='S')
ax.hist(NCX2, bins=50, label='NCX2', alpha=0.7)
ax.legend()
plt.show()
``````

This results in the histograms

I believe the mathematics is correct; could the discrepancy be a bug in the
`ncx2`
implementation? Setting
`mu = 0`
and using
`scipy.chi2`
looks much better:

The problem is in the second sentence of the question: "`S / sig^2` has a noncentral chi-squared distribution with degrees of freedom = `n` and non-centrality parameter = `n*mu^2`." That non-centrality parameter is not correct. It should be `n*(mu/sig)^2`.

The standard definition of the noncentral chi-squared distribution is that it is the sum of the squares of normal variates that have mean mu and standard deviation 1. You are computing `S` using normal variates with standard deviation `sig`. Let's write that distribution as `N(mu, sig**2)`. By using the location-scale properties of the normal distribution, we have

``````N(mu, sig**2) = mu + sig*N(0, 1) = sig*(mu/sig + N(0,1)) = sig*N(mu/sig, 1)
``````

So summing the squares of variates from `N(mu, sig**2)` is equivalent to summing the squares of `sig*N(mu/sig, 1)`. That gives `sig**2` times a noncentral chi-squared variate with noncentrality `mu/sig`.

If you change the line where `non_centrality` is computed to

``````non_centrality = n*(mu/sig)**2
``````

the histograms line up as you expect.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download