Tutorial

Generate Anime with LoRA and Diffusion Model

Updated on December 23, 2024
Generate Anime with LoRA and Diffusion Model

Introduction

Welcome, fellow anime enthusiasts! Picture this: you’re are surrounded by posters of your favorite anime characters, and suddenly, inspiration strikes! You want to create your own anime masterpiece, now, but how can you do it?

With LoRA and diffusion models we will learn how to create amazing anime characters!

In this article we will learn how to generate anime images with the most talked about duo: LoRA and Stable Diffusion (SD). We will try to understand the basics of LoRA and why is it used and also get an overview of SD.

Prerequisites

  • Basic Knowledge: Familiarity with deep learning concepts and stable diffusion models.
  • Environment Setup: Access to a Python environment with libraries like PyTorch, Diffusers, and Transformers installed.
  • Hardware: A GPU with sufficient VRAM (e.g., NVIDIA 16GB+ recommended).
  • Pre-trained Model: A Stable Diffusion model checkpoint.
  • LoRA Files: Low-Rank Adaptation (LoRA) files fine-tuned for anime generation.
  • Datasets: Optional, for custom fine-tuning if needed.

What makes Anime portraits so special

Anime characters are like old friends; we never get tired of them. They come in all shapes and sizes, from fierce warriors to quirky high school students to BTS characters. They are very special to us. But what makes them truly special is their ability to capture our hearts and whisk us away to fantastical worlds where anything is possible.

Previously, creating custom Anime style artwork required several things, most notably artistic talent. Now with Stable Diffusion, it is possible to take advantage of the AI revolution and make our own artwork with little to no training.

image

Empower your graphics and compute-intensive tasks with a robust 16 GB of GDDR6 memory featuring ECC, doubling the memory capacity compared to the previous generation. Feel free to check the NVIDIA page to learn more.

What is LoRA

As we scale up our model to a larger models, conducting full fine-tuning, which typically involves retraining all the model parameters. This process becomes tedious and costly not only in terms of money but also computational expenses too. To address this challenge Low-Rank Adaptation, or LoRA was developed.

LoRA works by freezing the pre-trained model weights and introducing trainable rank decomposition matrices into every layer of the Transformer architecture. This approach significantly reduces the number of trainable parameters required for downstream tasks.

One of the examples introduced in the original research paper highlights the fact that– when compared to fine-tuning GPT-3 175B with Adam - LoRA can decrease the number of trainable parameters by a factor of 10,000 and reduce GPU memory requirements by threefold.

Furthermore, despite reducing model training to have fewer parameters, LoRA demonstrates comparable or superior model quality to fine-tuning methods on various architectures such as RoBERTa, DeBERTa, GPT-2, and GPT-3. Moreover, LoRA tuning achieves higher training efficacy throughput and does not incur additional inference latency, unlike adapter-based approaches.

We have a detailed article on “Training a LoRA model for Stable Diffusion XL” and we highly recommend the article as a pre-requisite to better understand the model.

Overview of Stable Diffusion and the model used

Stable diffusion, is a generative artificial intelligence (generative AI) model which utilizes diffusion technology and uses latent space to generate photorealistic images. One can run the model using CPU as well however works well if you have a GPU. Essentially, diffusion models includes Gaussian noise for encoding an image. Subsequently, they use a noise predictor along with a reverse diffusion process to reconstruct the original image.

The main components of Stable Diffusion includes a variational autoencoder, reverse diffusion, a noise predictor, and text conditioning.

In a variational autoencoder, there are two main components: an encoder and a decoder. The encoder compresses a large 512x512 pixel image into a smaller 64x64 representation in a latent space that’s easier to handle. Later, the decoder reconstructs this compressed representation back into a full-size 512x512 pixel image.

Forward diffusion involves gradually adding Gaussian noise to an image until it’s completely obscured by random noise. During training, all images undergo this process, although it’s typically used only for image-to-image conversions later on.

Reverse diffusion is the opposite process, essentially undoing the forward diffusion step by step. For instance, if you train the model with images of cats and dogs, the reverse diffusion process would tend to reconstruct either a cat or a dog, with little in between. In practice, training involves vast amounts of images and uses prompts to create diverse and unique outputs.

A noise predictor, implemented as a U-Net model, plays a crucial role in denoising images. U-Net models, originally designed for biomedical image segmentation, are employed to estimate the noise in the latent space and subtract it from the image. This process is repeated for a specified number of steps, gradually reducing noise according to user-defined parameters. The noise predictor is influenced by conditioning prompts, which guide the final image generation.

Text conditioning is a common form of conditioning, where textual prompts are used to guide the image generation process. Each word in the prompt is analyzed and embedded into a 768-value vector by a CLIP tokenizer. Up to 75 tokens can be used in a prompt. Stable Diffusion utilizes these prompts by feeding them through a text transformer from the text encoder to the U-Net noise predictor. By setting the seed of the random number generator, different images can be generated in the latent space.

In this demo we have used Pastel Anime LoRA for SDXL. This model represents a high-resolution, Low-Rank Adaptation model for Stable Diffusion XL. The model has been fine-tuned with a learning rate set at 1e-5 across 1300 global steps and a batch size of 24, it uses a dataset comprising of superior-quality anime-style images. Derived from Animagine XL, this model, very similar to other anime-style Stable Diffusion models, facilitates image generation utilizing Danbooru tags.

Demo

Before we start we will do a quick check

!nvidia-smi

image

1.Install the necessary packages and modules to run the model

!pip install diffusers --upgrade
!pip install invisible_watermark transformers accelerate safetensors
!pip install -U peft

2.Import the libraries

import torch
from torch import autocast
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

3.Specify the base model for generating images and safetensors to the variable

base_model = "Linaqruf/animagine-xl"
lora_model_id = "Linaqruf/pastel-anime-xl-lora"
lora_filename = "pastel-anime-xl.safetensors"

4.Next, we will initialize a pipeline for stable diffusion XL model with the specific configurations. We will load the pre-trained model, specify the torch data type to be used for the model’s computations. Further, using float16 helps to reduce memory usage and speed up computation, especially on GPUs.

pipe = StableDiffusionXLPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
    )

5.Update the scheduler of a diffusion XL pipeline and then moves the pipeline object to the GPU for accelerated computation.

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

6.Load the weights

pipe.load_lora_weights(lora_model_id, weight_name=lora_filename)

7.Use the model to produce captivating anime creations

prompt = "face focus, cute, masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    width=1024,
    height=1024,
    guidance_scale=12,
    target_size=(1024,1024),
    original_size=(4096,4096),
    num_inference_steps=50
    ).images[0]

image.save("anime_girl.png")

image

image

image

We will highly encourage our readers to unleash their creativity when providing prompts for image generation.

Conclusion

In this article we explored how to generate anime characters using LoRA and Stable Diffusion. Stable Diffusion’s ability to generate images with fine details, varied styles, and controlled attributes makes it a valuable tool for numerous applications, including art, design, and entertainment.

As research and development in Gen AI continues to progress, we anticipate further innovations and refinements in these models. Stable Diffusion along with LoRA will undoubtedly reshape the landscape of image synthesis and push the boundaries of creativity and expression. These novel approaches will no doubt revolutionize how we perceive and interact with digital imagery in the years to come.

We hope you enjoyed reading the article!

Reference

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Limited Time: Introductory GPU Droplet pricing.

Get simple AI infrastructure starting at $2.99/GPU/hr on-demand. Try GPU Droplets now!

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.