DigitalOcean’s 1-Click Models, powered by Hugging Face, makes it easy to deploy and interact with popular large language models such as Mistral, Llama, Gemma, Qwen, and more, all on the most powerful GPUs available in the cloud. Utilizing NVIDIA H100 GPU Droplets, this solution provides accelerated computing performance for deep learning tasks. It eliminates overwhelming infrastructure complexities, allowing developers of all skill levels—whether beginners or advanced—to concentrate on building applications without the hassle of complicated software configurations.
In this article, we will demonstrate batch processing using the 1-Click Model. Our tutorial will utilize the Llama 3.1 8B Instruct model on a single GPU. Although we will use a smaller batch for this example, it can easily be scaled to accommodate larger batches, depending on your workload and the computational resources available. The flexibility of DigitalOcean’s 1-Click Model deployment allows users to easily manage varying data sizes, making it suitable for scenarios ranging from small-scale tasks to large-scale enterprise applications.
Before diving into batch inferencing with DigitalOcean’s 1-Click Models, ensure the following:
Batch inference is a process where batches or multiple data inputs are processed and analyzed together in a single operation rather than one at a time. Instead of sending each request to the model one at a time, a batch or group of requests is sent at once. This approach is especially useful when working with large datasets or handling large volumes of tasks.
This approach is beneficial for several reasons, a few of which are noted below.
We have created a detailed article on how to get started with the 1-Click Model and DigitalOcean’s platform. Feel free to check out the link to learn more.
Analyzing customer comments has become a critical tool for businesses to monitor brand perception, understand customer satisfaction with the product, and predict trends. Using DigitalOcean’s 1-Click Models, you can efficiently perform sentiment analysis at scale. In the below example, we will analyze a batch of five comments.
Let’s walk through a batch inferencing example using a sentiment analysis use case.
pip install --upgrade --quiet huggingface_hub
import os
from huggingface_hub import InferenceClient
# Initialize the client with your deployed endpoint and bearer token
client = InferenceClient(base_url="http://localhost:8080", api_key=os.getenv("BEARER_TOKEN"))
# Create a list of inputs
batch_inputs = [
{"role": "user", "content": "I love using this product. It's amazing!"},
{"role": "user", "content": "The service was terrible and I'm very disappointed."},
{"role": "user", "content": "It's okay, not great but not bad either."},
{"role": "user", "content": "Absolutely fantastic experience, I highly recommend it!"},
{"role": "user", "content": "I'm not sure if I like it or not."},
]
# Iterate through the batch and send requests
batch_responses = []
for input_message in batch_inputs:
# Send each input to the deployed model
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
messages=[input_message],
temperature=0.7,
top_p = 0.95,
max_tokens = 128,)
# Extract the sentiment result from the response and append the result to the batch_responses list
batch_responses.append(response['choices'][0]['message']['content'])
# Print the results for each input
for idx, (input_text, sentiment) in enumerate(zip(batch_inputs, batch_responses), start=1):
print(f"Input {idx}: {input_text['content']}")
print(f"Sentiment: {sentiment}")
print("-" * 50)
How It Works:
"YOUR_BEARER_TOKEN"
with the actual token obtained from your DigitalOcean Droplet.To conduct batch inferencing with DigitalOcean’s 1-Click Models, you can submit multiple questions in a single request. Here’s another example:
# Define a batch of inputs
batch_inputs = [
{"role": "user", "content": "What is Deep Learning?"},
{"role": "user", "content": "Explain the difference between AI and Machine Learning."},
{"role": "user", "content": "What are neural networks used for?"},
]
for input_message in batch_inputs:
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
messages=[input_message],
temperature=0.7,
top_p = 0.95,
max_tokens = 128,)
batch_responses.append(response['choices'][0]['message']['content'])
for idx, output in enumerate(batch_responses, start=1):
print(f"Response {idx}: {output}")
Explanation:
DigitalOcean’s infrastructure is designed for scalability:
Apart from sentiment analysis or recommendation systems, batch inference is a crucial feature for business applications that handle high data volumes. This makes the process faster, more efficient, and cost-effective.
Batch inferencing with DigitalOcean’s 1-Click Models is a powerful way to process multiple inputs efficiently. Using DigitalOcean’s 1-Click Models, you can quickly implement batch inferencing for sentiment analysis, enabling real-time insights into social media trends. This solution not only simplifies deployment but also ensures optimized performance and scalability,
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!