bcf bcf - 2 months ago 7x
Python Question

Scipy Non-central Chi-Squared Random Variable

Consider a sum of

squared iid normal random variables
S = sum (Z^2(mu, sig^2))
. According to this question,
S / sig^2
has a noncentral chi-squared distribution with degrees of freedom =
and non-centrality parameter =

However, compare generating
of these variables
by summing squared normals with generating
noncentral chi-squared random variables directly using

import numpy as np
from scipy.stats import ncx2, chi2
import matplotlib.pyplot as plt

n = 1000 # number of normals in sum
N_MC = 100000 # number of trials

mu = 0.05
sig = 0.3

### Generate sums of squared normals ###
Z = np.random.normal(loc=mu, scale=sig, size=(N_MC, n))
S = np.sum(Z**2, axis=1)

### Generate non-central chi2 RVs directly ###
dof = n
non_centrality = n*mu**2
NCX2 = sig**2 * ncx2.rvs(dof, non_centrality, size=N_MC)
# NCX2 = sig**2 * chi2.rvs(dof, size=N_MC) # for mu = 0.0

### Plot histos ###
fig, ax = plt.subplots()
ax.hist(S, bins=50, label='S')
ax.hist(NCX2, bins=50, label='NCX2', alpha=0.7)

This results in the histograms
comparison of distros

I believe the mathematics is correct; could the discrepancy be a bug in the
implementation? Setting
mu = 0
and using
looks much better:
distros good


The problem is in the second sentence of the question: "S / sig^2 has a noncentral chi-squared distribution with degrees of freedom = n and non-centrality parameter = n*mu^2." That non-centrality parameter is not correct. It should be n*(mu/sig)^2.

The standard definition of the noncentral chi-squared distribution is that it is the sum of the squares of normal variates that have mean mu and standard deviation 1. You are computing S using normal variates with standard deviation sig. Let's write that distribution as N(mu, sig**2). By using the location-scale properties of the normal distribution, we have

N(mu, sig**2) = mu + sig*N(0, 1) = sig*(mu/sig + N(0,1)) = sig*N(mu/sig, 1)

So summing the squares of variates from N(mu, sig**2) is equivalent to summing the squares of sig*N(mu/sig, 1). That gives sig**2 times a noncentral chi-squared variate with noncentrality mu/sig.

If you change the line where non_centrality is computed to

non_centrality = n*(mu/sig)**2

the histograms line up as you expect.