Stable Diffusion Text Image Generation - Computer Vision

  • Tech Stack: PyTorch, Hugging Face Transformers, Hugging Face Diffusers, Computer Vision, Deep Learning, Natural Language Processing
  • Github URL: Project Link

Text-to-image generator model creates an image from scratch based on a text description. Stable Diffusion is an open-source text-to-image latent diffusion model developed by researchers and engineers from CompVis, Stability AI, and LAION. Trained on 512x512 images from a subset of the LAION-5B database, this model utilizes a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.

The provided Colab notebook implements Stable Diffusion and demonstrates its usage with the Hugging Face Diffusers library. It showcases results of testing the model with famous faces, including:

  • "A still of Boris Johnson in Game of Thrones"
  • "Emma Watson as a pirate from Pirates of the Caribbean"
  • "A still of Mark Zuckerberg in Lord of the Rings"
  • "Boris Johnson as the Hulk"
  • "Emma Watson as Batman"
  • "A still of Mark Zuckerberg in Avatar"