Tutorial

How to generate the highest quality AI images with Fooocus

Published on October 4, 2024

Technical Evangelist // AI Arcanist

How to generate the highest quality AI images with Fooocus

Introduction

In the past few weeks, we have seen in an explosion in popularity for the latest Stable Diffusion application Fooocus. Fooocus is a Gradio based image generating software that has been designed by the notable open-source developer Illyasviel, who also brought us ControlNet. It offers a novel approach to the image synthesis pipeline as an alternative to popular pipelines like AUTOMATIC1111’s Stable Diffusion Web UI or MidJourney.

In this article, we will start with a brief expose on the features and capabilities of this new platform. We intend to highlight the differences and advancements it offers in comparison to some of the other tools we have showcased in the past on this blog, and present an argument for why this tool should be added to your image synthesis toolset. Afterwards, we will start our demo, wherein we will walk through the steps needed to set the application and to begin generating images. Readers can expect to finish this blog with a full understanding of the intricacies of the Fooocus applications variety of useful settings and built-in features.

Prerequisites

  • GPU Requirements: Ensure you have a powerful GPU (NVIDIA recommended) for faster rendering. A minimum of 8GB VRAM is suggested for optimal performance.
  • Pre-trained Models: Download the relevant pre-trained models (e.g., Stable Diffusion variants) for your preferred art style or application.
  • Prompt Mastery: Learn to write detailed and descriptive prompts to guide the model for precise image generation.
  • Optional AI Upscaling: For better final image quality, use AI upscaling tools like ESRGAN after generation.

These basics will help you set up Fooocus for high-quality AI image generation.

What does the Fooocus application offer to Stable Diffusion users?

To start, the first thing to know about Fooocus is its commitment to abstracting away many of the complicated settings required to make high quality generated images. They outline this in their tech list on the github page, but let’s go through each of these improvements here.

  • Style - the Fooocus V2 style system allows for MidJourney style prompt expansion to alter and improve the text representation using GPT-2 at low cost. This prompt expansion allows for simplified inputs to generate far more varied and complex outputs, and the motifs can be mixed and matched to the users preference
  • Native refiner swapping inside one single k-sampler - this allows for seamless carry over of the base model’s parameters to the refiner by maintaining the sampler. This is also used in the AUTOMATIC1111 Web UI
  • Negative image aspect ratio (ADM) guidance - the highest resolutions of Stable Diffusion XL reportedly do not come with cross-attention, so the Fooocus implementation modifies the positive and negative signals to compensate for the resultant lack of classifier-free guidance contrast for the highest resolution outputs
  • Self-Attention Guidance - following the research shown here, the Fooocus development team has implemented a form of Self-Attention Guidance at a low setting. Along with the negative ADM guidance, this helps mitigate a smoothing effect that is commonly reported in image outputs with SD XL
  • Automatic implementation of the “sd_xl_offset_example-lora_1.0.safetensors” LoRA model at a strength of .1. Their experiments determined that values below .5 were always better than those without
  • Sampler parameters are finely adjusted and optimized by the developers
  • Implemented a hard resolution setting based on the optimal aspect ratios for image synthesis with SD XL

These cumulative features make for an extremely low level of coding knowledge to run the image generation after setup. Now that we have discussed what makes the platform so nice to use, let’s walk through setting up the UI before comparing weighing its pros and cons.

Fooocus demo

Setup

To run the demo, we first need to install conda however.

In the terminal, we are going to begin pasting in everything needed to run the notebook. Once we are done, we can just click on the shared Gradio link to get started generating images.

First, paste in the following to the terminal:


wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Follow the instructions in the terminal to complete the install by answering yes to each question when prompted. When that’s complete, close the terminal using the trash bin icon in the terminal window on the left, and then open a new one. This will complete the Miniconda installation.

Afterwards, we need to paste in another set of installs to complete setup:


conda env create -f environment.yaml
conda activate fooocus
pip install -r requirements_versions.txt

These will install everything needed to run Fooocus in the Notebook. Additionally, it’s worth mentioning that the application will download a Stable Diffusion XL model from HuggingFace for us to use on launch. When we paste in the final snippet, we will see that occur first before the application itself is launched. Be sure to use the public link, so that we are able to access the link from our local machine’s browser. The process to complete the launch and install may take a couple minutes.


python entry_with_update.py --listen --share

Using the Fooocus application

Now, we can begin actually synthesizing new images. This will share a lot of similarities with familiar Stable Diffusion and MidJourney pipelines, but has some obvious differences in implementation, some of which we covered above.

The first thing we want to do is show a quick test of the basic image generation using the base settings. To do this, all we need to do is enter in a prompt and hit ‘Generate’. Fooocus will have automatically downloaded a merged model, namely “juggernautXL_version6Rundiffusion.safetensors”, that is capable of handling a wide variety of both realistic and artistic medium outputs. Above, we can see an example generation using the Fooocus web ui’s basic input. The default settings will generate two images at a resolution of 1152x896 (a 9:7 ratio), and we can watch the diffusion process occur in real-time.

image

A comparison of three prompts across the different performance values

From here, we can begin to look at different advanced settings for the Fooocus by clicking the toggle at the bottom of the screen. The advanced settings will be displayed on the right, in convenient Gradio tabs. The first, and probably most important tab is the ‘Setting’. Here, we can see one of the first things abstracted away from a typical pipeline: the Performance settings. These are pre-set and optimized to run a different number of diffusion steps to achieve different qualities of images at different speeds. Speed has 30 steps, Quality has 60, and Extreme Speed will run for 8. While the application seems to have the capability of using other K samplers, it seems to be hard defaulted to use “dpmpp_2m_sde_gpu” for all performances.

image

Next, we have the resolutions. These are all optimized resolutions to run Stable Diffusion XL models in particular. They included this hard restriction on image sizing to further optimize users outputs; images generated at suboptimal resolutions are far more likely to appear strange. Above is an example generated with a 704x1408 resolution.

image

Finally, we have the negative prompt. This functions as a sort of opposite to our regular prompt, and any tokens we include will be discounted as much as possible by the model. For example, we may use a negative prompt to remove unwanted traits from our generated images or try to mitigate some of the inborn problems Stable Diffusion has with certain objects. In the example above, we used the same seed as we had in the previous generation, but added in an additional negative prompt to try and alter our output a bit. You can assign the seed by filling the field left by un-toggling the Random box below the negative prompt field.

image

In the next tab in the advanced options, there is “Style”. These are the GPT2 enhancements provided by the model to expand our prompts. Try out different styles to see what sort of effect they can have on the final outputs. We particularly recommend using the ones above for all generations.

image

The next tab is the model tab, and it is probably the most important of them all because it let’s us switch out main checkpoint and LoRAs. It has nice sliders that allow us to easily adjust the weights of any additional models, allowing for a more simplistic way to blend the traits of two LoRAs . The application will also automatically download and allocate a weight of on .1 to the ‘sd_xl_offset_example-lora’, which we can choose to remove if we so choose. If we download a model during a running session, the refresh all files button at the bottom will allow us to refresh the available model lists.

image

The final tab, advanced, has our Guidance Scale and Image Sharpness sliders. The guidance scale controls the impact the prompt has on our final input, and works best in a range of 4-10 in our experience. The image sharpness value will sharpen images, but increase the uncanny valley effect if raised too high. We suggest leaving it alone unless adjustment is particularly needed.

“a scenic texas sunset”

In addition to the advanced options toggle, there is also the input image toggle right beside it. This is how we can do Image-to-Image generation in Fooocus. Illyasviel, the creator of this project, was also the creator of the popular ControlNet’s. They have combined Image-to-Image with a robust ControlNet system to automatically use ControlNet to guide inputted generations. It comes with three options - Upscale or Variation, Image Prompt, and Inpaint or Outpaint. Let’s look at what manipulations we can make to the scene above using this prompt:

New prompt: “a scenic sunset in hawaii”

  • Upscale or Variation: there are 5 settings: vary (subtle), vary (strong), upscale (1.5x), upscale (2x), and upscale (fast 2x). We can use the first two to introduce additional variation to our image with our prompts, as shown above with a subtle variation. The strong variation will make even more changes, and the upscale options will make our images higher resolutions

image

  • Image Prompt: this is comparable to Image-to-Image modeling with Stable diffusion, but instead of a single image, we can input multiple images. This allows us to vary our outputs using the inputted prompt. Check out the example above where we added a tree to our output using a different tree as input to influence the final result.

image

  • Inpaint or Outpaint: inpainting allows us to add images to a masked region of the inputted image, and outpainting lets us functionally extend the edges of an image to predict what would be outside the “camera” aperture. We can use it to add new things to our images, like adding this lifeguard stand

We can even use all three together to great effect! Be sure to test out all of these to see how they fit into your workflow, as using existing images as bases or editing them to work can be much more efficient and effective than generating wholly new images.

Closing Thoughts

We have walked through everything that makes Fooocus such a great tool for Stable Diffusion, but how does it stack up against competition? There are many great tools out there for generating images, like the AUTOMATIC1111 Web UI, the ComfyUI, MidJourney, PixArt Alpha, Dalle 3, and more, so we think its important to focus on where Fooocus differs from previous iterations of the text-to-image web platform. Primarily, Fooocus is a great tool for low coders who do not want to dig into the intricacies of learning a complicated system like the A1111 Web UI or the Comfy UI, but it also offers higher levels of versatility and applicability than closed source applications like MidJourney. For these reasons, we recommend Fooocus to users new to coding and Stable Diffusion, while more experienced coders should stick to the Fast Stable Diffusion implementations of the Web UI and ComfyUI. Look out for the self attention improvements to hit those platforms in the near future, as well, because, while it cannot beat out the clever fixes in the A1111 resolutions fixer, it handles a lot of problems for the user that improve the quality of life to use it significantly.

Thanks for reading, and be sure to check out our other articles on Stable Diffusion and image generation!

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar

Technical Evangelist // AI Arcanist

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Limited Time: Introductory GPU Droplet pricing.

Get simple AI infrastructure starting at $2.99/GPU/hr on-demand. Try GPU Droplets now!

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.