In the past few weeks, we have seen in an explosion in popularity for the latest Stable Diffusion application Fooocus. Fooocus is a Gradio based image generating software that has been designed by the notable open-source developer Illyasviel, who also brought us ControlNet. It offers a novel approach to the image synthesis pipeline as an alternative to popular pipelines like AUTOMATIC1111’s Stable Diffusion Web UI or MidJourney.
In this article, we will start with a brief expose on the features and capabilities of this new platform. We intend to highlight the differences and advancements it offers in comparison to some of the other tools we have showcased in the past on this blog, and present an argument for why this tool should be added to your image synthesis toolset. Afterwards, we will start our demo, wherein we will walk through the steps needed to set the application and to begin generating images. Readers can expect to finish this blog with a full understanding of the intricacies of the Fooocus applications variety of useful settings and built-in features.
These basics will help you set up Fooocus for high-quality AI image generation.
To start, the first thing to know about Fooocus is its commitment to abstracting away many of the complicated settings required to make high quality generated images. They outline this in their tech list on the github page, but let’s go through each of these improvements here.
These cumulative features make for an extremely low level of coding knowledge to run the image generation after setup. Now that we have discussed what makes the platform so nice to use, let’s walk through setting up the UI before comparing weighing its pros and cons.
To run the demo, we first need to install conda however.
In the terminal, we are going to begin pasting in everything needed to run the notebook. Once we are done, we can just click on the shared Gradio link to get started generating images.
First, paste in the following to the terminal:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Follow the instructions in the terminal to complete the install by answering yes to each question when prompted. When that’s complete, close the terminal using the trash bin icon in the terminal window on the left, and then open a new one. This will complete the Miniconda installation.
Afterwards, we need to paste in another set of installs to complete setup:
conda env create -f environment.yaml
conda activate fooocus
pip install -r requirements_versions.txt
These will install everything needed to run Fooocus in the Notebook. Additionally, it’s worth mentioning that the application will download a Stable Diffusion XL model from HuggingFace for us to use on launch. When we paste in the final snippet, we will see that occur first before the application itself is launched. Be sure to use the public link, so that we are able to access the link from our local machine’s browser. The process to complete the launch and install may take a couple minutes.
python entry_with_update.py --listen --share
Now, we can begin actually synthesizing new images. This will share a lot of similarities with familiar Stable Diffusion and MidJourney pipelines, but has some obvious differences in implementation, some of which we covered above.
The first thing we want to do is show a quick test of the basic image generation using the base settings. To do this, all we need to do is enter in a prompt and hit ‘Generate’. Fooocus will have automatically downloaded a merged model, namely “juggernautXL_version6Rundiffusion.safetensors”, that is capable of handling a wide variety of both realistic and artistic medium outputs. Above, we can see an example generation using the Fooocus web ui’s basic input. The default settings will generate two images at a resolution of 1152x896 (a 9:7 ratio), and we can watch the diffusion process occur in real-time.
A comparison of three prompts across the different performance values
From here, we can begin to look at different advanced settings for the Fooocus by clicking the toggle at the bottom of the screen. The advanced settings will be displayed on the right, in convenient Gradio tabs. The first, and probably most important tab is the ‘Setting’. Here, we can see one of the first things abstracted away from a typical pipeline: the Performance settings. These are pre-set and optimized to run a different number of diffusion steps to achieve different qualities of images at different speeds. Speed has 30 steps, Quality has 60, and Extreme Speed will run for 8. While the application seems to have the capability of using other K samplers, it seems to be hard defaulted to use “dpmpp_2m_sde_gpu” for all performances.
Next, we have the resolutions. These are all optimized resolutions to run Stable Diffusion XL models in particular. They included this hard restriction on image sizing to further optimize users outputs; images generated at suboptimal resolutions are far more likely to appear strange. Above is an example generated with a 704x1408 resolution.
Finally, we have the negative prompt. This functions as a sort of opposite to our regular prompt, and any tokens we include will be discounted as much as possible by the model. For example, we may use a negative prompt to remove unwanted traits from our generated images or try to mitigate some of the inborn problems Stable Diffusion has with certain objects. In the example above, we used the same seed as we had in the previous generation, but added in an additional negative prompt to try and alter our output a bit. You can assign the seed by filling the field left by un-toggling the Random box below the negative prompt field.
In the next tab in the advanced options, there is “Style”. These are the GPT2 enhancements provided by the model to expand our prompts. Try out different styles to see what sort of effect they can have on the final outputs. We particularly recommend using the ones above for all generations.
The next tab is the model tab, and it is probably the most important of them all because it let’s us switch out main checkpoint and LoRAs. It has nice sliders that allow us to easily adjust the weights of any additional models, allowing for a more simplistic way to blend the traits of two LoRAs . The application will also automatically download and allocate a weight of on .1 to the ‘sd_xl_offset_example-lora’, which we can choose to remove if we so choose. If we download a model during a running session, the refresh all files button at the bottom will allow us to refresh the available model lists.
The final tab, advanced, has our Guidance Scale and Image Sharpness sliders. The guidance scale controls the impact the prompt has on our final input, and works best in a range of 4-10 in our experience. The image sharpness value will sharpen images, but increase the uncanny valley effect if raised too high. We suggest leaving it alone unless adjustment is particularly needed.
“a scenic texas sunset”
In addition to the advanced options toggle, there is also the input image toggle right beside it. This is how we can do Image-to-Image generation in Fooocus. Illyasviel, the creator of this project, was also the creator of the popular ControlNet’s. They have combined Image-to-Image with a robust ControlNet system to automatically use ControlNet to guide inputted generations. It comes with three options - Upscale or Variation, Image Prompt, and Inpaint or Outpaint. Let’s look at what manipulations we can make to the scene above using this prompt:
New prompt: “a scenic sunset in hawaii”
We can even use all three together to great effect! Be sure to test out all of these to see how they fit into your workflow, as using existing images as bases or editing them to work can be much more efficient and effective than generating wholly new images.
We have walked through everything that makes Fooocus such a great tool for Stable Diffusion, but how does it stack up against competition? There are many great tools out there for generating images, like the AUTOMATIC1111 Web UI, the ComfyUI, MidJourney, PixArt Alpha, Dalle 3, and more, so we think its important to focus on where Fooocus differs from previous iterations of the text-to-image web platform. Primarily, Fooocus is a great tool for low coders who do not want to dig into the intricacies of learning a complicated system like the A1111 Web UI or the Comfy UI, but it also offers higher levels of versatility and applicability than closed source applications like MidJourney. For these reasons, we recommend Fooocus to users new to coding and Stable Diffusion, while more experienced coders should stick to the Fast Stable Diffusion implementations of the Web UI and ComfyUI. Look out for the self attention improvements to hit those platforms in the near future, as well, because, while it cannot beat out the clever fixes in the A1111 resolutions fixer, it handles a lot of problems for the user that improve the quality of life to use it significantly.
Thanks for reading, and be sure to check out our other articles on Stable Diffusion and image generation!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!