alvas alvas - 28 days ago 18
Python Question

Softmax function - python

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:

enter image description here

Where

S(y_i)
is the softmax function of
y_i
and
e
is the exponentia and
j
is the no. of columns in the input vector Y.

I've tried the following:

import numpy as np

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))


which returns:

[ 0.8360188 0.11314284 0.05083836]


And the suggested solution was:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)


And it outputs the same output as the first implementation that really tax the difference of each column and the max and then divided by the sum.

Can someone show mathematically why? Is one correct and the other one wrong?

Are the implementation similar in terms of code and time complexity? Which is more efficient?

Answer

They're both correct but yours has an unnecessary term.

You start with

e ^ (x - max(x)) / sum(e^(x - max(x))

By using the fact that a^(b - c) = (a^b)/(a^c) we have

= e ^ x / e ^ max(x) * sum(e ^ x / e ^ max(x))

= e ^ x / sum(e ^ x)

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

Comments