August 4th, 2024

The Path to StyleGan2 – Implementing the Progressive Growing GAN

The article outlines the implementation of Progressive Growing GAN (PGGAN) for generating high-resolution images, focusing on a two-phase approach and key concepts to enhance training efficiency and image quality.

Read original articleLink Icon
The Path to StyleGan2 – Implementing the Progressive Growing GAN

The article discusses the implementation of the Progressive Growing GAN (PGGAN), which serves as a foundation for the StyleGAN2 architecture. Traditional Generative Adversarial Networks (GANs) struggle with generating high-resolution images due to difficulties in assessing the performance of the generator and discriminator models. The PGGAN addresses these issues by progressively increasing the resolution of images during training, allowing the network to learn the overall structure before focusing on finer details. The author plans to implement the PGGAN in two phases: first, a simplified version (Gulrajani case) that generates 64x64 images, and then a full version that generates 256x256 images. Key concepts to be explored include the growing scheme, minibatch standard deviation, equalized learning rate, pixel normalization, and the WGAN-GP loss function. The article also outlines the architecture of the generator model, detailing the layers and their functions, and emphasizes the importance of starting with low-resolution images to facilitate learning. The author notes that while the original paper aimed for 1024x1024 images, they will limit their implementation to 256x256 for efficiency.

- The Progressive Growing GAN improves image generation quality by gradually increasing resolution.

- The implementation will be done in two phases: a simplified version generating 64x64 images and a full version for 256x256 images.

- Key concepts include minibatch standard deviation, equalized learning rate, and pixel normalization.

- The architecture of the generator model is designed to learn overall structure before fine details.

- The author aims to optimize training time and memory usage by using smaller image sizes.

Link Icon 1 comments
By @godelski - 9 months
I know GANs aren't all the rage now, but if you're interested in ML, they should not be overlooked.

We still use GANs a lot. They're way faster than diffusion models. Good luck getting a diffusion model to perform upscaling and denoising on a real time video call. I'm sure we'll get there, but right now you can do this with a GAN on cheap consumer hardware. You don't need a 4080, DLSS was released with the 20 series cards. They are just naturally computationally cheaper, but yeah, they do have trade-offs (though arguable since ML goes through hype phases and everyone jumps ship from one thing to another and few revisit. But when revisits happen, they tend to be competitive. See ResNets Strike Back for even CNNs vs ViTs. But there's more nuance here).

There is a reason your upscaling model is a GAN. Sure, diffusion can do this too. But why is everyone using ESRGAN? There's a reason for this.

Also, I think it is important to remember that GAN is really about a technique, not about generating images. You have a model generating things, and another model telling you something is a good output or not. LLM people... does this sound familiar?

To the author: I think it is worth pointing to Tero Karras's Nivida page. This group defined the status quo of GANs. You'll find that the vast majority of GAN research built off of their research. As quite a large portion of are literal forks. Though a fair amount of this is due to the great optimization they did, with custom cuda kernels (this is not the limiting compute factor in diffusion). https://research.nvidia.com/person/tero-karras