alvas alvas - 7 months ago 127
Python Question

Softmax function - python

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:

enter image description here


is the softmax function of
is the exponentia and
is the no. of columns in the input vector Y.

I've tried the following:

import numpy as np

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]

which returns:

[ 0.8360188 0.11314284 0.05083836]

And the suggested solution was:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)

And it outputs the same output as the first implementation that really tax the difference of each column and the max and then divided by the sum.

Can someone show mathematically why? Is one correct and the other one wrong?

Are the implementation similar in terms of code and time complexity? Which is more efficient?


They're both correct but yours has an unnecessary term.

You start with

e ^ (x - max(x)) / sum(e^(x - max(x))

By using the fact that a^(b - c) = (a^b)/(a^c) we have

= e ^ x / e ^ max(x) * sum(e ^ x / e ^ max(x))

= e ^ x / sum(e ^ x)

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.