GAN Series Part 1: Introduction to GANs

Humans, the most intelligent beings to have walked on earth, and possibly the universe, have been looking at the advent of Artificial Intelligence with curiosity, awe and a bit of suspicion, as AI progressively conquered the fields that humans once considered their forte. Many a sophisticated task that previously required subject matter experts spending their valuable hours are now being taken care by AI, at significantly superior quality and pace. Tasks like cancer detection that requires an experienced medical practitioner going through the histopathology images figuring out whether a cell is cancerous or not or a stock market expert predicting the market swings are now being discharged by well refined AI engines. Machines have surpassed humans in detecting objects from an image long back and in many complicated games, AI players are already indomitable for human players. As machines kept on getting better at solving problems involving classification, detection, prediction and likes, creativity was considered their Achilles heel, as creating something using intelligence –be it a painting, a symphony or even an imaginatively written passage– has always been considered a metier of humans.

This scenario changed dramatically with the emergence of Generative Adversarial Networks (GANs), by 2014, showcasing the creative flair of machines. As this technology kept proliferating, machines acquired the ability to be creative, that is to generate new data including music, paintings and text. A perfect example at this point would be the ‘This Person Does Not Exist‘ website which uses StyleGAN2, a type of GAN, to create stupefyingly realistic human facial images, of people who do not actually exist!

Images of some people who do not exist, generated by StyleGAN2, taken from ‘This Person Does Not Exist

In this series, we will look closer at these Artificially Intelligent artists, starting with an introduction to GANs in general in this part.

How GANs work

As the name suggests, Generative Adversarial Networks generate data by pitting two neural networks –the Generator and the Discriminator– against each other. The Generator generates data and the Discriminator evaluates them. That is, the Discriminator tries to figure out whether the data has been generated by the Generator network or is it from an original distribution. Initially the Discriminator will be able to distinguish the generated data from original data, but with each time the Discriminator does this, the Generator learns from this feedback and generates better data, which will be harder for the Discriminator to discriminate. This goes on for several iterations till the point where the Discriminator is no longer able to differentiate the data generated by the Generator from original data. This is how GANs generate data from the scratch.

Some GAN Architectures

Let’s quickly look at some of the different types of GANs. We will learn these in detail through this series, but for now a basic level of understanding shall suffice.

Deep Convolutional GAN (DCGAN)

Composed of convolutional layers, with Max Pooling layers replaced by convolutional strides, DCGANs are one of the most popular and successful GAN network designs. The simplicity of the design is its success and is used in Style Transfer.

Conditional GAN (cGAN)

In cGANs, both the Generator and Discriminator networks receive some additional conditioning input, like a label or class of the image, giving more control on the output being produced by the GAN.


StackGANs generate 256×256 photo-realistic images from text descriptions. StackGANs have two stages, with stage one generating a low resolution image of the object from text description and stage two creating a high resolution image out of the output from stage one, managing the challenging problem of generating images from text descriptions by decomposing it to simpler problems.

Discover Cross-Domain Relations with Generative Adversarial Networks(DiscoGAN)

DiscoGAN is a GAN that tries to discover the relations between different domains using which the network transfers style from one domain to another. An example would be transfering the style from handbag images to images of shoes.

Some Interesting Applications Of GANs

Now let’s discuss some of the applications of GANs, though we will not be able to cover all the applications, we will go through some of the most relevant ones.

Text to Image Translation

In the 2016 paper titled ‘StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks’, Han Zhang, et al. exhibited the ability of GANs to generate realistic images of textually described objects like flowers, birds and so on.

Image to Image Translation

This is basically a pixel to pixel approach, as demonstrated by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks.” The applications of Image to Image translations are plenty, like Day to Night translation of photos, translating black and white pictures to colour, generating coloured images from sketches, to mention a few.

Generating Frontal View of Face

Rui Huang, et al. in their 2017 paper “Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis” illustrated the ability of GANs to generate frontal view of face from photos taken at an angle, which could be used as input to a face identification system.

Music Generation

In their 2017 Paper,”MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment”, Hao-Wen Dong, et al. presented three models for music generation using GANs.

Text Generation

Md. Akmal Haidar, et al. in their 2019 paper titled “TextKD-GAN: Text Generation using Knowledge Distillation and Generative Adversarial Networks”, introduced a method using knowledge distillation for text generation using GANs.

Image Inpainting

Image inpainting is the process of restoration of missing regions from images in a manner where the observer is unable to perceive that the image has undergone restoration. Deepak Pathak, et al. in their 2016 paper titled “Context Encoders: Feature Learning by Inpainting” demonstrates how GANs can accomplish this.

These are just a few of the applications of GANs, the actual extend of the horizon of possibilities that GANs offer is much bigger. Through this series on GANs, we will learn more about the marvellous applications of GANs with some codes to try out.

See you in our next post.

Leave a Reply

Your email address will not be published. Required fields are marked *