A GAN can have two loss functions: one for generator training and one for discriminator training. How can two loss functions work together to reflect a distance measure between probability distributions?
In the loss schemes, we’ll look at here, the generator and discriminator losses derive from a single measure of distance between probability distributions. In both of these schemes, however, the generator can only affect one term in the distance measure: the term that reflects the distribution of the fake data. So during generator training, we drop the other term, which reflects the distribution of the real data.
The generator and discriminator losses look different in the end, even though they derive from a single formula.
In the paper that introduced GANs, the generator tries to minimize the following function while the discriminator tries to maximize it:
In this function:
D(x)is the discriminator’s estimate of the probability that real data instance x is real.
- Ex is the expected value over all real data instances.
G(z)is the generator’s output when given noise z.
D(G(z))is the discriminator’s estimate of the probability that a fake instance is real.
- Ez is the expected value over all random inputs to the generator (in effect, the expected value over all generated fake instances G(z)).
- The formula derives from the cross-entropy between the real and generated distributions.
The generator can’t directly affect the
log(D(x)) term in the function, so, for the generator, minimizing the loss is equivalent to minimizing
log(1 - D(G(z))).