aromatvanili - 1 year ago 230

Python Question

I have some statistics over some properties like:

`1st iter : p1:10 p2:0 p3:12 p4:33 p5:0.17 p6:ok p8:133 p9:89`

2nd iter : p1:43 p2:1 p6:ok p8:12 p9:33

3rd iter : p1:14 p2:0 p3:33 p5:0.13 p9:2

...

(p1 -> number of tries, p2 -> try done well, p3..pN -> properties of try).

I need to calculate the amount of information of each property.

After some procedures of quantization (for ex. to 10 levels) to make all input numbers on the same level the input file starts to look like:

`p0: 4 3 2 4 5 5 6 7`

p3: 4 5 3 3

p4: 5 3 3 2 1 2 3

...

Where

`p(0) = funct(p1,p2)`

Not every input line got every

`pK`

`len(pk) <= len(p0)`

Now I know how to calculate entropy of each property via Shannon entropy for each line. I need to calculate mutual information from here.

Calculation of joint entropy for mutual information

`I(p0,pK)`

I'm calculating entropy for one element like this:

`def entropy(x):`

probs = [np.mean(x == c) for c in set(x)]

return np.sum(-p * np.log2(p) for p in probs)

So, for joint I need to use

`product`

`x`

`zip(p0,pk)`

`set(x)`

Answer

I'm assuming that you want to calculate mutual information between each `p1`

and each of `p2`

, `p3`

,... subsequently.

1) Calculate `H(X)`

as entropy from p1 with:

each `x`

being subsequent element from `p1`

.

2) Calculate `H(Y)`

as entropy from `pK`

with the same equation, with each `x`

being subsequent element from `p1`

3) Create a new pair collection out of `p1`

and `pK`

:

```
pairs = zip(p1, pK)
```

Note that if the values in columns of your data have different meaning then you should probably fill the missing data (for example using `0`

s or values from previous iteration).

4) Calculate joint entropy `H(X,Y)`

using:

Note that you can't just use the first equation and treat each pair as a single element - you must iterate through the whole Cartesian product between `p1`

and `pK`

in this equation, calculating probabilities using `pairs`

collection. So, for iterating over the whole Cartesian product use `for xy in itertools.product(p1, pK): ...`

.

5) Then you can have the mutual information between `p1`

and `pK`

as:

Using numpy capabilities you can calculate joint entropy as presented here:

```
def entropy(X, Y):
probs = []
for c1 in set(X):
for c2 in set(Y):
probs.append(np.mean(np.logical_and(X == c1, Y == c2)))
return np.sum(-p * np.log2(p) for p in probs if p > 0)
```

where `if p > 0`

is consistent with entropy's definition:

In the case of p(x

_{i}) = 0 for some i, the value of the corresponding summand 0 log_{b}(0) is taken to be 0

If you don't want to use `numpy`

, then a version without it might look something like:

```
def entropyPart(p):
if not p:
return 0
return -p * math.log(p)
def entropy(X, Y):
pairs = zip(X, Y)
probs = []
for pair in itertools.product(X,Y):
probs.append(1.0 * sum([p == pair for p in pairs]) / len(pairs))
return sum([entropyPart(p) for p in probs])
```

Source (Stackoverflow)