marcman - 4 months ago 60

C++ Question

I was looking through the code of Caffe's SigmoidCrossEntropyLoss layer and the docs and I'm a bit confused. The docs list the loss function as the logit loss (I'd replicate it here, but without Latex, the formula would be difficult to read. Check out the docs link, it's at the very top).

However, the code itself (

`Forward_cpu(...)`

`Dtype loss = 0;`

for (int i = 0; i < count; ++i) {

loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -

log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));

}

top[0]->mutable_cpu_data()[0] = loss / num;

Is it because this is accounting for the sigmoid function having already been applied to the input?

However, even so, the

`(input_data[i] >= 0)`

`(input_data[i] >= 0)`

`1`

Can someone please explain this to me?

Answer

The `SigmoidCrossEntropy`

layer in caffe combines 2 steps(`Sigmoid`

+ `CrossEntropy`

) that will perform on `input_data`

into one piece of code :

```
Dtype loss = 0;
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
}
top[0]->mutable_cpu_data()[0] = loss / num;
```

In fact, no matter whether `input_data >= 0`

or not, the above code is always equivalent to the following code in math:

```
Dtype loss = 0;
for (int i = 0; i < count; ++i) {
loss -= input_data[i] * (target[i] - 1) -
log(1 + exp(-input_data[i]);
}
top[0]->mutable_cpu_data()[0] = loss / num;
```

, this code is based on the straightforward math formula after applying `Sigmoid`

and `CrossEntropy`

on `input_data`

and making some combination in math.

**But the first piece of code(caffe uses) owns more numerical stability and takes less risk of overflow, because it avoids calculating a large exp(input_data)(or exp(-input_data)) when the absolute value of input_data is too large. That's why you saw that code in caffe.**