Amir - 7 months ago 14
Python Question

Efficient way of computing Kullback–Leibler divergence in Python

I have to compute the Kullback-Leibler Divergence (KLD) between thousands of discrete probability vectors. Currently I am using the following code but it's way too slow for my purposes. I was wondering if there is any faster way to compute KL Divergence?

``````import numpy as np
import scipy.stats as sc

#n is the number of data points
kld = np.zeros(n, n)
for i in range(0, n):
for j in range(0, n):
if(i != j):
kld[i, j] = sc.entropy(distributions[i, :], distributions[j, :])
``````

Update:

``````kld = sc.entropy(distrib.T[:,:,None], distrib.T[:,None,:])
``````

Scipy's `stats.entropy` in its default sense invites inputs as 1D arrays giving us a scalar, which is being done in the listed question. Internally this function also allows `broadcasting`, which we can abuse in here for a vectorized solution.

From the `docs` -

scipy.stats.entropy(pk, qk=None, base=None)

If only probabilities pk are given, the entropy is calculated as S = -sum(pk * log(pk), axis=0).

If qk is not None, then compute the Kullback-Leibler divergence S = sum(pk * log(pk / qk), axis=0).

In our case, we are doing these entropy calculations for each row against all rows, performing sum reductions to have a scalar at each iteration with those two nested loops. Thus, the output array would be of shape `(M,M)`, where `M` is the number of rows in input array.

Now, the catch here is that `stats.entropy()` would sum along `axis=0`, so we will feed it two versions of `distributions`, both of whom would have the rowth-dimension brought to `axis=0` for reduction along it and the other two axes interleaved - `(M,1)` & `(1,M)` to give us a `(M,M)` shaped output array using `broadcasting`.

Thus, a vectorized and much more efficient way to solve our case would be -

``````from scipy import stats
kld = stats.entropy(distributions.T[:,:,None], distributions.T[:,None,:])
``````

Runtime tests and verify -

``````In [15]: def entropy_loopy(distrib):
...:     n = distrib.shape[0] #n is the number of data points
...:     kld = np.zeros((n, n))
...:     for i in range(0, n):
...:         for j in range(0, n):
...:             if(i != j):
...:                 kld[i, j] = stats.entropy(distrib[i, :], distrib[j, :])
...:     return kld
...:

In [16]: distrib = np.random.randint(0,9,(100,100)) # Setup input

In [17]: out = stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])

In [18]: np.allclose(entropy_loopy(distrib),out) # Verify
Out[18]: True

In [19]: %timeit entropy_loopy(distrib)
1 loops, best of 3: 800 ms per loop

In [20]: %timeit stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])
10 loops, best of 3: 104 ms per loop
``````