Tutorial

Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

Published on November 8, 2024

Sr Technical Writer

Real-Time Audio Translation with OpenAI APIs on DigitalOcean GPU Droplets Using Open WebUI

Introduction

With the increasing demand for multilingual communication, real-time audio translation is rapidly gaining attention. In this tutorial, you will learn to deploy a real-time audio translation application using OpenAI APIs on Open WebUI, all hosted on a powerful GPU Droplet from DigitalOcean.

DigitalOcean’s GPU Droplets, powered by NVIDIA H100 GPUs, offer significant performance for AI workloads, making them ideal for fast and efficient real-time audio translation. Let’s get started.

Prerequisites

Step 1 - Setting Up the DigitalOcean GPU Droplet

1.Create a New Project - You will need to create a new project from the cloud control panel and tie it to a GPU Droplet.

2.Create a GPU Droplet - Log into your DigitalOcean account, create a new GPU Droplet, and choose AI/ML Ready as the OS. This OS image installs all the necessary NVIDIA GPU Drivers. You can refer to our official documentation on how to create a GPU Droplet.

Create-a-gpu-droplet which is AI/ML Ready

3.Add an SSH Key for authentication - An SSH key is required to authenticate with the GPU Droplet and by adding the SSH key, you can login to the GPU Droplet from your terminal.

Add an SSH key for authentication

4.Finalize and Create the GPU Droplet - Once all of the above steps are completed, finalize and create a new GPU Droplet.

Create a GPU Droplet

Step 2 - Installing and Configuring Open WebUI

Open WebUI is a web interface that allows users to interact with language models (LLMs). It’s designed to be user-friendly, extensible, and self-hosted, and can run offline. Open WebUI is similar to ChatGPT in its interface, and it can be used with a variety of LLM runners, including Ollama and OpenAI-compatible APIs.

There are three ways you can deploy Open WebUI:

  • Docker: Officially supported and recommended for most users.
  • Python: Suitable for low-resource environments or those wanting a manual setup.
  • Kubernetes: Ideal for enterprise deployments that require scaling and orchestration.

In this tutorial you will deploy Open WebUI using Docker as a docker container on the GPU Droplet with Nvidia GPU support. You can check out and learn about how to deploy Open WebUI using other techniques in this Open WebUI quick start guide.

Docker Setup

Once the GPU Droplet is ready and deployed. SSH to the GPU Droplet from your terminal.

ssh root@<your-droplet-ip>

This Ubuntu AI/ML Ready H100x1GPU Droplet comes pre-installed with docker.

You can verify the docker version using the below command:

docker --version
Output
Docker version 24.0.7, build 24.0.7-0ubuntu2~22.04.1

Next, run the below command to verify and ensure Docker has access to your GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-runtime-ubuntu22.04 nvidia-smi

This command pulls the nvidia/cuda:12.2.0-runtime-ubuntu22.04 image (if it has not already been downloaded or updates an existing image) and starts a container.

Inside the container, it runs nvidia-smi to confirm that the container has GPU access and can interact with the underlying GPU hardware. Once nvidia-smi has executed, the --rm flag ensures the container is automatically removed, as it’s no longer needed.

You should observe the following output:

Output
Unable to find image 'nvidia/cuda:12.2.0-runtime-ubuntu22.04' locally 12.2.0-runtime-ubuntu22.04: Pulling from nvidia/cuda aece8493d397: Pull complete 9fe5ccccae45: Pull complete 8054e9d6e8d6: Pull complete bdddd5cb92f6: Pull complete 5324914b4472: Pull complete 9a9dd462fc4c: Pull complete 95eef45e00fa: Pull complete e2554c2d377e: Pull complete 4640d022dbb8: Pull complete Digest: sha256:739e0bde7bafdb2ed9057865f53085539f51cbf8bd6bf719f2e114bab321e70e Status: Downloaded newer image for nvidia/cuda:12.2.0-runtime-ubuntu22.04 ========== == CUDA == ========== CUDA Version 12.2.0 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. Thu Nov 7 19:32:18 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 | | N/A 28C P0 70W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

Deploy Open WebUI using Docker with GPU Support

Please use the below docker command to run the Open WebUI docker container.

docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --gpus all ghcr.io/open-webui/open-webui:main 

The above command runs a Docker container using the open-webui image and sets up specific configurations for network ports, volumes, and GPU access.

  1. docker run -d:

    • docker run starts a new Docker container.
    • -d runs the container in detached mode, meaning it runs in the background.
  2. -p 3000:8080:

    • This maps port 8080 inside the container to port 3000 on the host machine.
    • It allows you to access the application in the container by navigating to http://localhost:3000 on the host.
  3. -v open-webui:/app/backend/data:

    • This mounts a Docker volume named open-webui to the /app/backend/data directory inside the container.
    • Volumes are used to persist data generated or used by the container, ensuring it remains available even if the container is stopped or deleted.
  4. –name open-webui:

    • Assigns the container a specific name, open-webui, which makes it easier to reference (e.g., docker stop open-webui to stop the container).
  5. ghcr.io/open-webui/open-webui:main:

    • Specifies the Docker image to use for the container.
    • ghcr.io/open-webui/open-webui is the name of the image, hosted on GitHub’s container registry (ghcr.io).
    • main is the image tag, often representing the latest stable version or main branch.
  6. –gpus all:

    • This option enables GPU support for the container, allowing it to use all available GPUs on the host machine.
    • It’s essential for applications that leverage GPU acceleration, such as machine learning models.

Verify if the Open WebUI docker container is up and running:

docker ps 
Output
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4fbe72466797 ghcr.io/open-webui/open-webui:main "bash start.sh" 5 seconds ago Up 4 seconds (health: starting) 0.0.0.0:3000->8080/tcp, :::3000->8080/tcp open-webui

Once Open WebUI container is up and running, access it at http://<your_gpu_droplet_ip>:3000 on your browser.

Open WebUI dashboard

Step 3 - Add OpenAI API Key to use GPT-4o with Open WebUI

In this step, you will add your OpenAI API key to Open WebUI.

Once logged in to the Open WebUI dashboard, you should notice no models running as seen in the below image:

Open WebUI Dashboard

To connect Open WebUI with OpenAI and use all the available OpenAI models, follow the below steps:

  1. Open Settings:

    • In Open WebUI, click your user icon at the bottom left, then click Settings.
  2. Go to Admin:

    • Navigate to the Admin tab, then select Connections.
  3. Add the OpenAI API Key:

    • Add your OpenAI API key in the right textbox under the OpenAI API tab.
  4. Verify Connection:

    • Click Verify Connection. A green light confirms a successful connection.

Adding OpenAI API Key

Now, Open WebUI will then auto-detect all available OpenAI models. Select GPT-4o from the list.

GPT-4o models

Next, set the text-to-speech and speech-to-text models and audio settings to use OpenAI whisper model:

Setup audio settings

Again, navigate and click Settings -> Audio to configure and save the audio STT and TTS settings, as seen in the above screenshot.

You can read more about the OpenAI text-to-speech and speech-to-text here.

Step 4 - Set up Audio Tunneling

If you’re streaming audio from your local machine to the Droplet, route the audio input through an SSH tunnel.

Since the GPU Droplet has the Open WebUI container running on http://localhost:3000, you can access it on your local machine by navigating to http://localhost:3000 after setting up this SSH tunnel.

This is required to let Open WebUI access the microphone on your local machine for realtime audio translation and realtime lamguage processing. As without this it will throw the below error when clicking the headphone or microphone icon to use GPT-4o for natural language processing tasks.

Error when recording audio

Use the below command to set a local SSH tunnel from your local machine to the GPU Droplet by opening a new terminal on your local machine:

ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=5 root@<gpu_droplet_ip> -L 3000:localhost:3000

This command establishes an SSH connection to your GPU Droplet as the root user and establishes a local port forwarding tunnel. It also includes options to keep the SSH session alive. Here’s a detailed breakdown:

  1. -o ServerAliveInterval=60:

    • This option sets the ServerAliveInterval to 60 seconds, meaning that every 60 seconds, an SSH keep-alive message is sent to the remote server.
    • This helps prevent the SSH connection from timing out due to inactivity.
  2. -o ServerAliveCountMax=5:

    • This option sets the ServerAliveCountMax to 5, which allows up to 5 missed keep-alive messages before the SSH connection is terminated.
    • Together with ServerAliveInterval=60, this setting means the SSH session will stay open for 5 minutes (5 × 60 seconds) of no response from the server before closing.
  3. -L 3000:localhost:3000:

    • This part sets up local port forwarding.
    • 3000 (before the colon) is the local port on your machine, where you will access the forwarded connection.
    • localhost:3000 (after the colon) refers to the destination on the GPU Droplet.
    • In this case, it forwards traffic from port 3000 on your local machine to port 3000 on the GPU Droplet.

Now, this command will allow you to access the Open WebUI by visiting http://localhost:3000 on your local machine and also use the microphone for real-time audio translation.

Step 5 - Implementing Real-time Translation with GPT-4o

Click the headphone or microphone icon to use whisper and GPT-4o models for natural language processing tasks.

Use the microphone to chat

Clicking on the Headphone/Call button will open a voice assistant using OpenAI GPT-4o and whisper models for real-time audio processing and translation.

You can use it to translate and transcribe the audio in real time by talking with the GPT-4o voice assistant.

Voice chat and transcription in real time

Conclusion

Deploying real-time audio translation using OpenAI APIs on Open WebUI with DigitalOcean’s GPU Droplets allows developers to create high-performance translation systems. With easy setup and monitoring, DigitalOcean’s platform provides the resources for scalable, efficient AI applications.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar

Sr Technical Writer

Sr. Technical Writer@ DigitalOcean | Medium Top Writers(AI & ChatGPT) | 2M+ monthly views & 34K Subscribers | Ex Cloud Consultant @ AMEX | Ex SRE(DevOps) @ NUTANIX

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Limited Time: Introductory GPU Droplet pricing.

Get simple AI infrastructure starting at $2.99/GPU/hr on-demand. Try GPU Droplets now!

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.