13 Custom loss
Following the paper I originally used the binary cross-entropy as loss function of the AutoEncoder. At this point, however, I had the idea to modify it to better suit my data.
After some thought my hypothesis was:
- The dataset is sparse. There are only a few “1” values with respect to “0” values.
- This problem might be exacerbated by the sparse nature of the more uncommon events.
- The loss which is the Binary Cross Entropy in theory should handle class imbalance but maybe not on sparse data.
I tried then to think about a possible solution:
- The loss is generated by values which are 0 but should be 1, never the other way around
- Maybe the loss is not penalizing enough the case when 0 should be a 1
- Which function can penalizes values near 0 but not near 1? The Log is a good candidate
- I first tried to take the negative Log of the syntetic data and obviously everything degenerated to 1
- I then thought to compare the synthetic and real data: if they are both the same then the loss should be 0
My final solution was:
\[ - Ln(mean(synthetic + e^{-10})) + Ln(mean(real)) \]
This formula penalizes when the sum of the synthetic data is lower than the one on real data. The loss is especially larger when the sum of the synthetic data is near 0.
There are more refined approaches to this problem but this seemed to work.
(Note: I needed to add the exponential to make the model more robust. Otherwise in case the syntethic data in the batch was all 0 at any point it would be impossible for the loss to be calculated)
Using this formula as a loss penalty in addition to the binary cross-entropy I was able to get the loss of the model with maximum size 4096 to around 200. Now the model partially encode and decode correctly some of the variables. Those variables are in general the more common events, the rarer events remain at 0.
This is still not sufficient and I tried different approaches in next sections