Variational autoencoders (VAEs) use the autoencoder architecture to encode latent space in a way that can be used for generative tasks such as image generation.
Unlike most autoencoders, which are "deterministic" models that encode a single vector of discrete values for each latent variable of training data, VAES are "probabilistic" models that encode latent space as a range of possibilities. By interpolating from within this range of encoded possibilities, VAEs can synthesize new data samples that, while unique and original unto themselves, resemble the original training data.
To enable the generation of completely new data samples (rather than simply re-creating or combining samples from training data), the latent space must exhibit 2 types of regularity:
- Continuity: Nearby points in latent space should yield similar content when decoded.
- Completeness: Any point sampled from the latent space should yield meaningful content when decoded.
A simple way to enforce continuity and completeness in latent space is to force it to follow a normal (Gaussian) distribution. Therefore, VAEs encode 2 different vectors for each latent attribute of training data: a vector of means, “μ,” and a vector of standard deviations, “σ.” In essence, these 2 vectors represent the range of possibilities for each latent variable and the expected variance within each range of possibilities, respectively.
VAEs accomplish this by adding an additional loss function alongside reconstruction loss: Kullback-Leibler divergence (or KL divergence). More specifically, the VAE is trained to minimize the divergence between a standard Gaussian distribution and the latent space learned by minimizing reconstruction loss.