With the increasing demand for multilingual communication, real-time audio translation is rapidly gaining attention. In this tutorial, you will learn to deploy a real-time audio translation application using OpenAI APIs on Open WebUI, all hosted on a powerful GPU Droplet from DigitalOcean.
DigitalOcean’s GPU Droplets, powered by NVIDIA H100 GPUs, offer significant performance for AI workloads, making them ideal for fast and efficient real-time audio translation. Let’s get started.
1.Create a New Project - You will need to create a new project from the cloud control panel and tie it to a GPU Droplet.
2.Create a GPU Droplet - Log into your DigitalOcean account, create a new GPU Droplet, and choose AI/ML Ready as the OS. This OS image installs all the necessary NVIDIA GPU Drivers. You can refer to our official documentation on how to create a GPU Droplet.
3.Add an SSH Key for authentication - An SSH key is required to authenticate with the GPU Droplet and by adding the SSH key, you can login to the GPU Droplet from your terminal.
4.Finalize and Create the GPU Droplet - Once all of the above steps are completed, finalize and create a new GPU Droplet.
Open WebUI is a web interface that allows users to interact with language models (LLMs). It’s designed to be user-friendly, extensible, and self-hosted, and can run offline. Open WebUI is similar to ChatGPT in its interface, and it can be used with a variety of LLM runners, including Ollama and OpenAI-compatible APIs.
There are three ways you can deploy Open WebUI:
In this tutorial you will deploy Open WebUI using Docker as a docker container on the GPU Droplet with Nvidia GPU support. You can check out and learn about how to deploy Open WebUI using other techniques in this Open WebUI quick start guide.
Once the GPU Droplet is ready and deployed. SSH to the GPU Droplet from your terminal.
ssh root@<your-droplet-ip>
This Ubuntu AI/ML Ready H100x1GPU Droplet comes pre-installed with docker
.
You can verify the docker version using the below command:
docker --version
OutputDocker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1
Next, run the below command to verify and ensure Docker has access to your GPU:
docker run --rm --gpus all nvidia/cuda:12.2.0-runtime-ubuntu22.04 nvidia-smi
This command pulls the nvidia/cuda:12.2.0-runtime-ubuntu22.04
image (if it has not already been downloaded or updates an existing image) and starts a container.
Inside the container, it runs nvidia-smi
to confirm that the container has GPU access and can interact with the underlying GPU hardware. Once nvidia-smi
has executed, the --rm
flag ensures the container is automatically removed, as it’s no longer needed.
You should observe the following output:
OutputUnable to find image 'nvidia/cuda:12.2.0-runtime-ubuntu22.04' locally
12.2.0-runtime-ubuntu22.04: Pulling from nvidia/cuda
aece8493d397: Pull complete
9fe5ccccae45: Pull complete
8054e9d6e8d6: Pull complete
bdddd5cb92f6: Pull complete
5324914b4472: Pull complete
9a9dd462fc4c: Pull complete
95eef45e00fa: Pull complete
e2554c2d377e: Pull complete
4640d022dbb8: Pull complete
Digest: sha256:739e0bde7bafdb2ed9057865f53085539f51cbf8bd6bf719f2e114bab321e70e
Status: Downloaded newer image for nvidia/cuda:12.2.0-runtime-ubuntu22.04
==========
== CUDA ==
==========
CUDA Version 12.2.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Thu Nov 7 19:32:18 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 |
| N/A 28C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Please use the below docker command to run the Open WebUI docker container.
docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --gpus all ghcr.io/open-webui/open-webui:main
The above command runs a Docker container using the open-webui
image and sets up specific configurations for network ports, volumes, and GPU access.
docker run -d:
docker run
starts a new Docker container.-d
runs the container in detached mode, meaning it runs in the background.-p 3000:8080:
http://localhost:3000
on the host.-v open-webui:/app/backend/data:
open-webui
to the /app/backend/data
directory inside the container.–name open-webui:
open-webui
, which makes it easier to reference (e.g., docker stop open-webui
to stop the container).ghcr.io/open-webui/open-webui:main:
ghcr.io/open-webui/open-webui
is the name of the image, hosted on GitHub’s container registry (ghcr.io).main
is the image tag, often representing the latest stable version or main branch.–gpus all:
Verify if the Open WebUI docker container is up and running:
docker ps
OutputCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4fbe72466797 ghcr.io/open-webui/open-webui:main "bash start.sh" 5 seconds ago Up 4 seconds (health: starting) 0.0.0.0:3000->8080/tcp, :::3000->8080/tcp open-webui
Once Open WebUI container is up and running, access it at http://<your_gpu_droplet_ip>:3000
on your browser.
In this step, you will add your OpenAI API key to Open WebUI.
Once logged in to the Open WebUI dashboard, you should notice no models running as seen in the below image:
To connect Open WebUI with OpenAI and use all the available OpenAI models, follow the below steps:
Open Settings:
Go to Admin:
Add the OpenAI API Key:
Verify Connection:
Now, Open WebUI will then auto-detect all available OpenAI models. Select GPT-4o from the list.
Next, set the text-to-speech and speech-to-text models and audio settings to use OpenAI whisper
model:
Again, navigate and click Settings -> Audio to configure and save the audio STT and TTS settings, as seen in the above screenshot.
You can read more about the OpenAI text-to-speech and speech-to-text here.
If you’re streaming audio from your local machine to the Droplet, route the audio input through an SSH tunnel.
Since the GPU Droplet has the Open WebUI container running on http://localhost:3000
, you can access it on your local machine by navigating to http://localhost:3000
after setting up this SSH tunnel.
This is required to let Open WebUI access the microphone on your local machine for realtime audio translation and realtime lamguage processing. As without this it will throw the below error when clicking the headphone or microphone icon to use GPT-4o for natural language processing tasks.
Use the below command to set a local SSH tunnel from your local machine to the GPU Droplet by opening a new terminal on your local machine:
ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=5 root@<gpu_droplet_ip> -L 3000:localhost:3000
This command establishes an SSH connection to your GPU Droplet as the root user and establishes a local port forwarding tunnel. It also includes options to keep the SSH session alive. Here’s a detailed breakdown:
-o ServerAliveInterval=60:
ServerAliveInterval
to 60 seconds, meaning that every 60 seconds, an SSH keep-alive message is sent to the remote server.-o ServerAliveCountMax=5:
ServerAliveCountMax
to 5, which allows up to 5 missed keep-alive messages before the SSH connection is terminated.ServerAliveInterval=60
, this setting means the SSH session will stay open for 5 minutes (5 × 60 seconds) of no response from the server before closing.-L 3000:localhost:3000:
3000
(before the colon) is the local port on your machine, where you will access the forwarded connection.localhost:3000
(after the colon) refers to the destination on the GPU Droplet.Now, this command will allow you to access the Open WebUI by visiting http://localhost:3000
on your local machine and also use the microphone for real-time audio translation.
Click the headphone or microphone icon to use whisper
and GPT-4o
models for natural language processing tasks.
Clicking on the Headphone/Call button will open a voice assistant using OpenAI GPT-4o
and whisper
models for real-time audio processing and translation.
You can use it to translate and transcribe the audio in real time by talking with the GPT-4o voice assistant.
Deploying real-time audio translation using OpenAI APIs on Open WebUI with DigitalOcean’s GPU Droplets allows developers to create high-performance translation systems. With easy setup and monitoring, DigitalOcean’s platform provides the resources for scalable, efficient AI applications.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!