Have you ever tried generating images using AI?
If you have, you know the key to a good image is a good detailed prompt.
And I am bad at such detailed visual prompting, so I rely on large language models to generate the detailed prompt and then use that to generate great images.
Here are some examples of the prompts and images I was able to generate:
Prompt: Create a stunning aerial view of Bengaluru, India with the city name written in bold, golden font across the top of the image, with the city skyline and Nandi Hills visible in the background.
Prompt: Design an image of the iconic Vidhana Soudha building in Bengaluru, India, with the city name written in a modern, sans-serif font at the bottom of the image, in a sleek and minimalist style.
Prompt: Generate an image of a bustling street market in Bengaluru, India, with the city name written in a playful, cursive font above the scene, in a warm and inviting style.
In order to achieve these results we are using Flux.1-schnell model for image generation and Llamma 3.1 - 8B - Instruct model for Prompt generation. Both of them are hosted on 1 single H100 machine with the help of MIG(more on this later).
This blog is not another image generation tutorial. Our goal is to create a scalable, secure, and globally accessible (and affordable) GenAI architecture.
Imagine a business case where a global e-commerce platform requires rapid image customisation for users or a content platform delivering on-demand AI-generated text across continents.
Such a setup has many challenges for a developer. Example:
This presentation should give you a starting point to solve each and every problem here.
Before proceeding with adding the demo, make sure you have the following:
To make this possible, we designed a distributed architecture using DigitalOcean’s infrastructure. We start with a Global Load Balancer (GLB) to manage incoming requests, ensuring that users from any region experience minimal latency.
Next, we have lightweight image generation apps deployed in key regions—London, New York, and Sydney—each with its on cache and ready to connect with our GPU resources as needed. Finally, all these components securely communicate over VPC Peering, channeling complex tasks back to our powerhouse H100 GPU in Toronto, where the prompt and image generation magic happens.
The Image Generation app is a simple Python Flask application with three main components:
Detect Location Section: This component makes a dummy request from the browser to the server to determine the user’s location (City and Country) and identify which server region is handling the request. This information is displayed to the user and helps optimize prompt and image generation, as we’ll explain in more detail later.
Prompts Dropdown Section: After determining the user’s location, the app first checks a cache for pre-existing prompts associated with that location. If suitable prompts are found in the cache, they are immediately displayed in a dropdown menu, allowing the user to choose a prompt for image generation. If no cached prompts are available, the app sends a request to the LLM (Large Language Model) to generate new prompts, which are then cached for future use and populated in the dropdown for the user to select.
Generated Image Section: When the user selects a prompt, the app first checks if an image generated from that specific prompt is already cached on disk. If a cached image exists, it is loaded directly from disk, ensuring a faster response time. If no cached image is available, the app makes an API call to generate a new image, which is then cached for future requests and displayed to the user.
The MIG (Multi-Instance GPU) is a feature of NVIDIA GPUs, such as the H100, that allows a single physical GPU to be partitioned into multiple independent instances. Each instance, called a MIG slice, operates as a fully isolated GPU with its own compute, memory, and bandwidth resources.
This approach not only optimizes GPU utilization but also enables us to deploy both the image generation and prompt generation models side by side.
Spin Up the GPU Droplet: Begin by creating a GPU Droplet on DigitalOcean with a single H100 GPU with the pre built OS image for ML development
Enable MIG on the H100 GPU: Once the GPU Droplet is up and running, enable MIG (Multi-Instance GPU) mode on the H100. MIG allows the GPU to be split into multiple, smaller GPU instances, each isolated from the others. This isolation is crucial for running different models in parallel without interference.
sudo nvidia-smi -i 0 -mig 1
Choose the MIG Profile and Create Instances: With MIG enabled, select a profile that suits the requirements of each model you plan to run.
nvidia-smi mig -lgip # will list all the profiles
For example
sudo nvidia-smi mig -cgi 9,9 -C
This command creates 2 MIG instances with a profile providing 40GB of memory, which should be sufficient for both models. The below code will list all MIG instance IDs
nvidia-smi -L
Set Up Docker Containers on Each MIG Instance: For each MIG instance, run a separate Docker container with the respective model.
This setup involves downloading and running two Docker containers for the models: one for image generation (Flux.1-schnell) and one for prompt generation (Llama 3.1 via vLLM).
Deploy Flux.1-schnell for Image Generation: We used the metatonic images and code for deploying a Docker image of Flux.1
# Clone the repo
git clone https://github.com/matatonic/openedai-images-flux
cd openedai-images-flux
# Copy Config file
cp config.default.json config/config.json
# Run the docker image
sudo docker run -d \
-e HUGGING_FACE_HUB_TOKEN="<HF_READ_TOKEN>" \
--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=<MIG_INSANCE_ID> \
-v ./config:/app/config \
-v ./models:/app/models \
-v ./lora:/app/lora \
-v ./models/hf_home:/root/.cache/huggingface \
-p 5005:5005 \
ghcr.io/matatonic/openedai-images-flux
Deploy Llama 3.1 using vLLM for Prompt Generation: Download and run the Docker container for the prompt generation model using vLLM.
sudo docker run -d --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=<MIG_INSANCE_ID> \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HUGGING_FACE_HUB_TOKEN="<HF_READ_TOKEN>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model meta-llama/Meta-Llama-3.1-8B-Instruct
Model Access and Communication:
We set up the Lightweight apps in regional locations (London, New York, Sydney) to handle user requests locally, caching frequently accessed prompts and images for faster responses.
# Login to docker registry
sudo docker login registry.digitalocean.com
# Pull the container
sudo docker pull registry.digitalocean.com/<cr_name>/city-image-generator:v2
# Run the Container
sudo docker run -d -p 80:80 registry.digitalocean.com/<cr_name>/city-image-generator:v2
VPC Peering ensures secure, low-latency communication between the regional app instances and the GPU server in Toronto over a private network.
default vpc
for each region (e.g., London, New York, Sydney, and Toronto).ping
or curl
from the regional app servers to the GPU server.The GLB distributes incoming user requests to the closest regional app instance, optimizing latency and improving user experience.
With all components in place, your architecture is fully operational, enabling scalable and secure GenAI services. The DigitalOcean GUI simplifies much of the setup, allowing you to focus on optimizing the performance and functionality of your AI models.
This demo is a practical starting point for businesses/developers exploring distributed GenAI solutions, such as e-commerce platforms generating personalized content or AI-driven content platforms serving global audiences. By leveraging DigitalOcean’s products, this setup demonstrates how to balance scalability, security, and cost efficiency in deploying cutting-edge AI services.
You can read more about the technologies used in this demo here:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!