Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

Published on November 8, 2024

AI/ML

Data Science

GPU

Droplets

By Anish Singh Walia

Sr Technical Writer

Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

Introduction

With the increasing demand for multilingual communication, real-time audio translation is rapidly gaining attention. In this tutorial, you will learn to deploy a real-time audio translation application using OpenAI APIs on Open WebUI, all hosted on a powerful GPU Droplet from DigitalOcean.

DigitalOcean’s GPU Droplets, powered by NVIDIA H100 GPUs, offer significant performance for AI workloads, making them ideal for fast and efficient real-time audio translation. Let’s get started.

Prerequisites

A DigitalOcean Cloud account.
A GPU Droplet deployed and running.
An OpenAI API key set up for accessing the OpenAI models.
Familiarity with SSH and basic Docker commands.
An SSH key for logging into your GPU Droplet.

Step 1 - Setting Up the DigitalOcean GPU Droplet

1.Create a New Project - You will need to create a new project from the cloud control panel and tie it to a GPU Droplet.

2.Create a GPU Droplet - Log into your DigitalOcean account, create a new GPU Droplet, and choose AI/ML Ready as the OS. This OS image installs all the necessary NVIDIA GPU Drivers. You can refer to our official documentation on how to create a GPU Droplet.

Create-a-gpu-droplet which is AI/ML Ready

3.Add an SSH Key for authentication - An SSH key is required to authenticate with the GPU Droplet and by adding the SSH key, you can login to the GPU Droplet from your terminal.

Add an SSH key for authentication

4.Finalize and Create the GPU Droplet - Once all of the above steps are completed, finalize and create a new GPU Droplet.

Create a GPU Droplet

Step 2 - Installing and Configuring Open WebUI

Open WebUI is a web interface that allows users to interact with language models (LLMs). It’s designed to be user-friendly, extensible, and self-hosted, and can run offline. Open WebUI is similar to ChatGPT in its interface, and it can be used with a variety of LLM runners, including Ollama and OpenAI-compatible APIs.

There are three ways you can deploy Open WebUI:

Docker: Officially supported and recommended for most users.
Python: Suitable for low-resource environments or those wanting a manual setup.
Kubernetes: Ideal for enterprise deployments that require scaling and orchestration.

In this tutorial you will deploy Open WebUI using Docker as a docker container on the GPU Droplet with Nvidia GPU support. You can check out and learn about how to deploy Open WebUI using other techniques in this Open WebUI quick start guide.

Docker Setup

Once the GPU Droplet is ready and deployed. SSH to the GPU Droplet from your terminal.

ssh root@<your-droplet-ip>

This Ubuntu AI/ML Ready H100x1GPU Droplet comes pre-installed with docker.

You can verify the docker version using the below command:

docker --version

Output
Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1

Next, run the below command to verify and ensure Docker has access to your GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-runtime-ubuntu22.04 nvidia-smi

This command pulls the nvidia/cuda:12.2.0-runtime-ubuntu22.04 image (if it has not already been downloaded or updates an existing image) and starts a container.

Inside the container, it runs nvidia-smi to confirm that the container has GPU access and can interact with the underlying GPU hardware. Once nvidia-smi has executed, the --rm flag ensures the container is automatically removed, as it’s no longer needed.

You should observe the following output:

Output
Unable to find image 'nvidia/cuda:12.2.0-runtime-ubuntu22.04' locally
12.2.0-runtime-ubuntu22.04: Pulling from nvidia/cuda
aece8493d397: Pull complete 
9fe5ccccae45: Pull complete 
8054e9d6e8d6: Pull complete 
bdddd5cb92f6: Pull complete 
5324914b4472: Pull complete 
9a9dd462fc4c: Pull complete 
95eef45e00fa: Pull complete 
e2554c2d377e: Pull complete 
4640d022dbb8: Pull complete 
Digest: sha256:739e0bde7bafdb2ed9057865f53085539f51cbf8bd6bf719f2e114bab321e70e
Status: Downloaded newer image for nvidia/cuda:12.2.0-runtime-ubuntu22.04

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Thu Nov  7 19:32:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:00:09.0 Off |                    0 |
| N/A   28C    P0              70W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Deploy Open WebUI using Docker with GPU Support

Please use the below docker command to run the Open WebUI docker container.

docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --gpus all ghcr.io/open-webui/open-webui:main

The above command runs a Docker container using the open-webui image and sets up specific configurations for network ports, volumes, and GPU access.

docker run -d:
- docker run starts a new Docker container.
- -d runs the container in detached mode, meaning it runs in the background.
-p 3000:8080:
- This maps port 8080 inside the container to port 3000 on the host machine.
- It allows you to access the application in the container by navigating to http://localhost:3000 on the host.
-v open-webui:/app/backend/data:
- This mounts a Docker volume named open-webui to the /app/backend/data directory inside the container.
- Volumes are used to persist data generated or used by the container, ensuring it remains available even if the container is stopped or deleted.
–name open-webui:
- Assigns the container a specific name, open-webui, which makes it easier to reference (e.g., docker stop open-webui to stop the container).
ghcr.io/open-webui/open-webui:main:
- Specifies the Docker image to use for the container.
- ghcr.io/open-webui/open-webui is the name of the image, hosted on GitHub’s container registry (ghcr.io).
- main is the image tag, often representing the latest stable version or main branch.
–gpus all:
- This option enables GPU support for the container, allowing it to use all available GPUs on the host machine.
- It’s essential for applications that leverage GPU acceleration, such as machine learning models.

Verify if the Open WebUI docker container is up and running:

docker ps

OutputCONTAINER ID   IMAGE                                COMMAND           CREATED         STATUS                            PORTS                                       NAMES
4fbe72466797   ghcr.io/open-webui/open-webui:main   "bash start.sh"   5 seconds ago   Up 4 seconds (health: starting)   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp   open-webui

Once Open WebUI container is up and running, access it at http://<your_gpu_droplet_ip>:3000 on your browser.

Open WebUI dashboard

Step 3 - Add OpenAI API Key to use GPT-4o with Open WebUI

In this step, you will add your OpenAI API key to Open WebUI.

Once logged in to the Open WebUI dashboard, you should notice no models running as seen in the below image:

Open WebUI Dashboard

To connect Open WebUI with OpenAI and use all the available OpenAI models, follow the below steps:

Open Settings:
- In Open WebUI, click your user icon at the bottom left, then click Settings.
Go to Admin:
- Navigate to the Admin tab, then select Connections.
Add the OpenAI API Key:
- Add your OpenAI API key in the right textbox under the OpenAI API tab.
Verify Connection:
- Click Verify Connection. A green light confirms a successful connection.

Adding OpenAI API Key

Now, Open WebUI will then auto-detect all available OpenAI models. Select GPT-4o from the list.

GPT-4o models

Next, set the text-to-speech and speech-to-text models and audio settings to use OpenAI whisper model:

Setup audio settings

Again, navigate and click Settings -> Audio to configure and save the audio STT and TTS settings, as seen in the above screenshot.

You can read more about the OpenAI text-to-speech and speech-to-text here.

Step 4 - Set up Audio Tunneling

If you’re streaming audio from your local machine to the Droplet, route the audio input through an SSH tunnel.

Since the GPU Droplet has the Open WebUI container running on http://localhost:3000, you can access it on your local machine by navigating to http://localhost:3000 after setting up this SSH tunnel.

This is required to let Open WebUI access the microphone on your local machine for realtime audio translation and realtime lamguage processing. As without this it will throw the below error when clicking the headphone or microphone icon to use GPT-4o for natural language processing tasks.

Error when recording audio

Use the below command to set a local SSH tunnel from your local machine to the GPU Droplet by opening a new terminal on your local machine:

ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=5 root@<gpu_droplet_ip> -L 3000:localhost:3000

This command establishes an SSH connection to your GPU Droplet as the root user and establishes a local port forwarding tunnel. It also includes options to keep the SSH session alive. Here’s a detailed breakdown:

-o ServerAliveInterval=60:
- This option sets the ServerAliveInterval to 60 seconds, meaning that every 60 seconds, an SSH keep-alive message is sent to the remote server.
- This helps prevent the SSH connection from timing out due to inactivity.
-o ServerAliveCountMax=5:
- This option sets the ServerAliveCountMax to 5, which allows up to 5 missed keep-alive messages before the SSH connection is terminated.
- Together with ServerAliveInterval=60, this setting means the SSH session will stay open for 5 minutes (5 × 60 seconds) of no response from the server before closing.
-L 3000:localhost:3000:
- This part sets up local port forwarding.
- 3000 (before the colon) is the local port on your machine, where you will access the forwarded connection.
- localhost:3000 (after the colon) refers to the destination on the GPU Droplet.
- In this case, it forwards traffic from port 3000 on your local machine to port 3000 on the GPU Droplet.

Now, this command will allow you to access the Open WebUI by visiting http://localhost:3000 on your local machine and also use the microphone for real-time audio translation.

Step 5 - Implementing Real-time Translation with GPT-4o

Click the headphone or microphone icon to use whisper and GPT-4o models for natural language processing tasks.

Use the microphone to chat

Clicking on the Headphone/Call button will open a voice assistant using OpenAI GPT-4o and whisper models for real-time audio processing and translation.

You can use it to translate and transcribe the audio in real time by talking with the GPT-4o voice assistant.

Voice chat and transcription in real time

Conclusion

Deploying real-time audio translation using OpenAI APIs on Open WebUI with DigitalOcean’s GPU Droplets allows developers to create high-performance translation systems. With easy setup and monitoring, DigitalOcean’s platform provides the resources for scalable, efficient AI applications.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Anish Singh Walia

Author

Sr Technical Writer

See author profile

I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix

Category:

Tags: