Generative Adversarial Networks
Get Notion free

Generative Adversarial Networks

Github repo: GANs-Gallery

Generator vs Discriminator

FID: frechet inception distance

measures the similarity between two distributions

KID: kernel inception distance

IS: Inception Score

given a set of generated images x1,x2,,xn{x_1, x_2, \dots, x_n}, the inception score is defined as:
where p(y)=1Ni=1Np(yxi)p(y) = \frac{1}{N}\sum_{i=1}^N p(y|x_i)
high score ⇒ indiv points are sharply classified + classes diversity
low scores ⇒ blurry images / low diversity
(flat p(y/x)p(y/x) + peaky p(y)p(y))

LPIPS: Learned Perceptual Image Patch Similarity

weighted average of distances between feature map outputs from a vision network

G_EMA:

exponential moving average of the Generator weights

overfitting heuristics:

0 ⇒ not overfitting
1 ⇒ overfitting
GAN FID evaluation GAN training loss
ALT

Losses

non-saturating loss:

LSGAN: Least Squares GAN

minimizes pearson’s χ2\chi^2 divergence

Wassertein GAN

minimizes 1-wassertein’s distance
enforcing DD to be 1-LipschitzLipschitz through weight clipping

RaGAN

Regularizers

WGAN-GP

WGAN + a gradient penalty (: enforcing 1-Lipschitz)
interpolating x^\hat x samples from the fake-real path

R1, R2

in practice R1 have sticked around while R2 turned out to be less stable in practice

path length penalty

computing the deviation of the output generated image wrt perturbation of intermediate latent state, enforcing it to be close to normal distribution over an EMA
efficient estimate / alternative: directional derivative

Training GANs with limited data (ADA): paper

overfitting in GANs:
always (p) using augmentations on real and fake images
invertible augmentations: invertible in the sense that the undderlying distribution is still learnable
p < .8 ⇒ aug leaks unlikely to happen
best observed transformations for small datasets:
pixel blinting
geometric transforms
color transforms
Adaptive Discriminator Augmentation
r_t & r_v: measuring overfitting ⇒ used to adapt p during training
target .6 gave consistantly good results
evaluate every N steps ⇒
define p update speed
update p
clamp to [0, 1]
Evalutation
PA-GANs: progressive augmentation
WGANS: using wasserstein distance + grad penalty ⇒ restricting lipschtiz constraint on D
KID is more informative than FID when training on a small dataset

GANS trained by Two Time-scale Update Rule Converge to a local nash equilibrium: paper

main points
Generator lr = a, Discriminator lr = a / b
n=0a(n)=,n=0b(n)=,n=0a(n)2<,n=0b(n)2<,a(n)b(n)0as n\sum_{n=0}^{\infty} a(n) = \infty, \quad \sum_{n=0}^{\infty} b(n) = \infty, \newline \sum_{n=0}^{\infty} a(n)^2 < \infty, \quad \sum_{n=0}^{\infty} b(n)^2 < \infty, \newline \frac{a(n)}{b(n)} \to 0 \quad \text{as } n \to \infty
note: D should be updated more frequently / careful steps G learns through D’s gradient, thus D should be “near-optimal”
evaluate FID every 1K Discriminator step
lipschitz continuty assumed (use ELU or other smooth variants of ReLU, or relying on weight decay for smoothing)

Wasserstein GAN w/ Gradient Penalty

Wassertein distance:

1-wasserstein distance (a.k.a earth mover’s, how dramatic)
kantorovich rubinstein dual form
parametrizing f as a neul-net
enforcing 1-Lipschitz on f
clamping [-c, c]
gradient penalty
Generator objective function
,D is called the critic here (makin it sound fancy)

unpaired image-to-image translation using CycleGAN: paper

StyleGANs core innovations (super duper cool):

Generator:

mapping network: latent space disentanglement
starting from a learned initial `canvas`
Noise injection in style blocks
Modulated Convolution & Style vectors
Equalized Linear layer
Equalized Convolution

Discriminator

batch std

Concurrent CUDA streams during training:

aiming to maximize device usage, luckily, multiple penalties / losses can be independently computed, with few entanglements

To compute:

G loss
D loss
Gradient Penalty
R1 Penalty
Path Length Penalty
stream 1
fake images
fake logits x
G loss
Dloss
Stream 2
real logits x
R1 penalty
Gradient Penalty
Path Length Penalty
→ METRIC: waited for an event
METRIC →: a computation is waiting
METRIC x: needed to compute another metric