With billions of accounts active across platforms like Instagram, X, and LinkedIn, social media has become an integral part of modern society. From this widespread adoption comes a vast, dynamic landscape of online conversation, making social media the tool for businesses seeking to understand and grow their customer base.
The process of tracking, collecting, and analyzing data from social media platforms to improve an organization’s strategic business decisions is referred to as social media analytics. By understanding the nuances of online conversation, businesses can refine their messaging, optimize their campaigns, identify emerging trends, tailor their product roadmap to customer needs, track their competitors, and build stronger relationships with their audience. Social media analytics is a powerful driver to business strategy, ensuring time and energy is spent on fruitful labour.
While there are a number of services and platforms that offer subscription services for analyzing social media data, some companies opt for internal tooling to tailor solutions to specific needs, enhance data security, and protect intellectual property.
Large Language Models (LLMs) have attracted significant investment and research efforts from industry and academia alike, driving the increased popularity and adoption of AI. These models are trained to generate natural language responses to perform a wide range of tasks. As a result of their versatility and ease of use, it is worthwhile to consider the incorporation of LLMs into your social media analytics workflow.
DigitalOcean is committed to providing developers and innovators with the best resources and tools to bring their ideas to life. DigitalOcean has partnered with HuggingFace to offer 1-click models. This allows for the integration of GPU Droplets with state-of-the-art open-source LLMs in Text Generation Inference (TGI)-optimized container applications. As opposed to closed-source models, open-source models allow you to have greater control over the model and your data during inference.
In this tutorial, we hope to give you a starting point for incorporating LLMs into your social media analytics workflow.
There are 3 parts to this tutorial.
Part 1 and 2 of the tutorial do not require extensive coding experience. However, Python experience is critical for the third part of this tutorial.
This part of the tutorial can be skipped if you are already familiar with setting up 1-click models from our documentation or previous tutorials.
To access these 1-click models, sign up for an account or login to an existing account.
Navigate to the “Create GPU Droplet” page, by either clicking on GPU Droplets on the left panel or in the drop-down menu from the green “Create” button on the top right.
Where it says Choose an Image, navigate to the “1-click Models” tab and select the 1-click model you would like to use.
Choose a GPU plan. Currently, there is only the option of using either 1 or 8 H100 GPUs.
This step can be skipped if not required for your application. Select “Add Volume block storage” if additional Volumes(data storage) is desired for your Droplet.
If daily or weekly automated server backups are desired, those can be selected for as well.
Select an existing SSH Key or click “Add a SSH Key” for instructions.
Advanced options are available for you to customize your GPU Droplet experience.
After filling in the final details (name, project, tags), click the “Create a GPU Droplet” button on the right-hand side of the page to proceed.
It typically takes 10-15 minutes for the Droplet to be deployed. Once deployed, you will be charged for the time it is on. Remember to destroy your Droplet when it is not being used.
Once the GPU Droplet has been successfully deployed, click the “Web Console” button to access it in a new window.
cURL is a command-line tool for transferring data; it’s great for one-off or quick testing scenarios without setting up a full Python environment. The following cURL command can be modified and pasted into the Web Console.
curl http://localhost:8080/v1/chat/completions \
-X POST \
-d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":128}}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $BEARER_TOKEN"
Here’s a breakdown of the cURL command to give you more context should you want to modify the request body (line that begins with -d).
Base URL: http://localhost:8080/v1/chat/completions
This is the endpoint URL for the chat completion
-X POST: The POST HTTP method sends data to the server
Request Body
Headers
-H 'Content-Type: application/json’: HTTP Header specifying to the server that the request is in JSON format
-H “Authorization: Bearer $BEARER_TOKEN”: HTTP Header includes the authorization token required to access the API.
We will be using the TGI with Python in this tutorial for more programmatic control over requests. The Python code can be implemented in an IDE like VS Code.
The Bearer Token allows for requests to be sent to the public IP of the deployed GPU Droplet. To store this token as an environment variable, copy the Bearer Token from the Web Console. In the code snippet below, replace “PASTE BEARER TOKEN” with the copied token. Paste the updated code snippet into your terminal.
In Terminal:
export BEARER_TOKEN="PASTE BEARER TOKEN"
A common mistake when exporting the bearer token in your terminal is forgetting to include the quotes
Now that you know how to set up a 1-click model, let’s discuss prompt engineering from a social media analytics perspective.
LLMs generate drastically different responses based on how they’re prompted. A well-crafted prompt often includes clear instructions, contextual information, desired output format, requirements, constraints, and/or examples.
Before proceeding, it is critical that you understand what you’re hoping to achieve and what kind of data you need. Social media analytics with LLMs isn’t just about throwing data at an AI and expecting magic. The key to success lies in two areas: clearly defined goals and high-quality data.
Ensure you have an understanding of what you’re looking for. This involves having an objective and knowing what a good result looks like. Unsurprisingly, having subject matter expertise in the subject you’re prompting the LLM about is advantageous in getting the best outputs. At the end of the day, LLMs are merely tools to augment workflows.
Examples of potential objectives for social media analytics:
Knowing what information is required to achieve your objectives is crucial. Ensure the data is relevant to your objectives, accurate, and of sufficient volume to perform analysis.
The HuggingFace Hub has a collection of datasets that can be used for experimenting.
For example, here’s a collection of two-million-bluesky-posts.
For this implementation, we will be using Gradio, an open-source python library for building web interfaces to demo machine learning models.
We will be creating a social media analyzer for the fictional brand, TechNature.
“TechNature is an eco-friendly technology accessories company that designs and manufactures sustainable phone cases, laptop sleeves, and tech gadget accessories made from recycled and biodegradable materials.”
This analyzer we’re building will have buttons for the different tasks TechNature often wants done: Sentiment Analysis, Content Strategy, Competitor Analysis, Trend Analysis. There will also be a place for users to upload their data.
In terminal:
pip3 install huggingface-hub gradio
import os
import gradio as gr
from huggingface_hub import InferenceClient
InferenceClient is a class from the huggingface_hub library that allows you to make API calls to a deployed model.
For this step, you will need to include the address of your GPU Droplet in the base_url. If you haven’t already exported your bearer token in the terminal (see step 12 of Part 1), do it now.
Copy the address of your GPU Droplet from the WebConsole and paste it in the base_url below.
client = InferenceClient(base_url="http://REPLACE WITH GPU DROPLET ADDRESS:8080", api_key=os.getenv("BEARER_TOKEN"))
System prompts are instructions or context-setting messages given to the model prior to processing user interactions. They allow one to have greater control over model outputs by defining the system’s persona, expertise, communication style, ethical constraints, and operational guidelines. Note how we include context about the company in the system prompt. The task chosen further specializes the model by giving it a persona outlining its expertise and specific instructions.
def get_system_prompt(task):
company_context = "TechNature is an eco-friendly technology accessories company that designs and manufactures sustainable phone cases, laptop sleeves, and tech gadget accessories made from recycled and biodegradable materials."
prompts = {
"Sentiment Analysis": f"{company_context}\n\nYou are a sentiment analysis expert. Analyze the sentiment of the social media post, categorizing it as positive, negative, or neutral, and explain why.",
"Content Strategy": f"{company_context}\n\nYou are a content strategy expert. Suggest improvements and optimizations for this social media content to increase engagement while maintaining alignment with our eco-friendly brand.",
"Competitor Analysis": f"{company_context}\n\nYou are a competitor analysis expert. Analyze this social media content in comparison to other sustainable tech accessory brands and suggest positioning strategies.",
"Trend Analysis": f"{company_context}\n\nYou are a trend analysis expert. Identify current trends related to this content, particularly in the sustainable tech accessories space, and suggest how to leverage them.",
}
return prompts.get(task, prompts["Sentiment Analysis"])
return prompts.get(task, prompts["Sentiment Analysis"])
The inference function is responsible for generating a response from the AI model based on the user’s input message, the chat history, and the selected task.
def inference(message, history, task):
partial_message = ""
output = client.chat.completions.create(
messages=[
{"role": "system", "content": get_system_prompt(task)},
{"role": "user", "content": message},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
partial_message += chunk.choices[0].delta.content
yield partial_message
To customize the code to your liking, we suggest consulting the Gradio documentation.
Here, we’re incorporating two different input methods. Users can either upload their file or paste their data in the textbox to undergo one of four different types of analysis.
with gr.Blocks() as demo:
chatbot = gr.Chatbot(height=300)
task = gr.Radio(
choices=["Sentiment Analysis", "Content Strategy", "Competitor Analysis", "Trend Analysis"],
label="Select Analysis Type",
value="Sentiment Analysis"
)
with gr.Row():
file_input = gr.File(label="Upload social media data file (optional)")
msg = gr.Textbox(
placeholder="Enter your social media content here...",
container=False,
scale=7
)
def process_file(file):
if file is None:
return ""
with open(file.name, 'r') as f:
return f.read()
def respond(message, chat_history, task_selected, file):
if file:
file_content = process_file(file)
message = f"{message}\n\nFile content:\n{file_content}"
bot_message = inference(message, chat_history, task_selected)
chat_history.append((message, ""))
for partial_response in bot_message:
chat_history[-1] = (message, partial_response)
yield chat_history
msg.submit(respond, [msg, chatbot, task, file_input], [chatbot])
demo.queue().launch()
import os
import gradio as gr
from huggingface_hub import InferenceClient
client = InferenceClient(base_url="http://REPLACE WITH GPU DROPLET ADDRESS:8080", api_key=os.getenv("BEARER_TOKEN"))
def get_system_prompt(task):
company_context = "TechNature is an eco-friendly technology accessories company that designs and manufactures sustainable phone cases, laptop sleeves, and tech gadget accessories made from recycled and biodegradable materials."
prompts = {
"Sentiment Analysis": f"{company_context}\n\nYou are a sentiment analysis expert. Analyze the sentiment of the social media post, categorizing it as positive, negative, or neutral, and explain why.",
"Content Strategy": f"{company_context}\n\nYou are a content strategy expert. Suggest improvements and optimizations for this social media content to increase engagement while maintaining alignment with our eco-friendly brand.",
"Competitor Analysis": f"{company_context}\n\nYou are a competitor analysis expert. Analyze this social media content in comparison to other sustainable tech accessory brands and suggest positioning strategies.",
"Trend Analysis": f"{company_context}\n\nYou are a trend analysis expert. Identify current trends related to this content, particularly in the sustainable tech accessories space, and suggest how to leverage them.",
}
return prompts.get(task, prompts["Sentiment Analysis"])
def inference(message, history, task):
partial_message = ""
output = client.chat.completions.create(
messages=[
{"role": "system", "content": get_system_prompt(task)},
{"role": "user", "content": message},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
partial_message += chunk.choices[0].delta.content
yield partial_message
# Create the interface
with gr.Blocks() as demo:
chatbot = gr.Chatbot(height=300)
task = gr.Radio(
choices=["Sentiment Analysis", "Content Strategy", "Competitor Analysis", "Trend Analysis"],
label="Select Analysis Type",
value="Sentiment Analysis"
)
with gr.Row():
file_input = gr.File(label="Upload social media data file (optional)")
msg = gr.Textbox(
placeholder="Enter your social media content here...",
container=False,
scale=7
)
def process_file(file):
if file is None:
return ""
with open(file.name, 'r') as f:
return f.read()
def respond(message, chat_history, task_selected, file):
if file:
file_content = process_file(file)
message = f"{message}\n\nFile content:\n{file_content}"
bot_message = inference(message, chat_history, task_selected)
chat_history.append((message, ""))
for partial_response in bot_message:
chat_history[-1] = (message, partial_response)
yield chat_history
msg.submit(respond, [msg, chatbot, task, file_input], [chatbot])
demo.queue().launch()
TechNature is hoping to understand user sentiment regarding their brand. Here are the contents of an example CSV file containing 10 posts about TechNature.
post_id,username,text,timestamp,reposts,likes
1234567890,tech_lover23,Just dropped my phone case and it survived! @TechNature cases are legit 📱,2024-03-15 14:22:33,12,45
2345678901,eco_warrior44,Feeling good about buying a @TechNature laptop sleeve - finally a brand that cares about the planet 🌿,2024-03-10 09:15:22,8,32
3456789012,gadget_guru55,Is it just me or are @TechNature products a bit overpriced? 🤔,2024-03-05 11:40:11,5,22
4567890123,sustainability_fan67,The solar charging on my @TechNature power bank is awesome 🔋,2024-02-28 16:55:44,15,62
5678901234,tech_critic78,Disappointed that my @TechNature phone case got a scratch after one week 😤,2024-02-20 10:30:01,3,17
6789012345,green_tech_fan89,Wow, charging cable from @TechNature actually looks better than I expected 👌,2024-02-15 13:45:22,7,38
7890123456,mobile_pro90,Way better than my old Samsung case. Thanks @TechNature! 🚀,2024-02-10 08:22:11,11,49
8901234567,budget_buyer01,Bit expensive, but quality seems worth it @TechNature 💯,2024-02-05 12:10:33,6,27
9012345678,customer_care_watcher12,Customer service at @TechNature is super helpful 🙌,2024-01-30 15:33:44,9,41
0123456789,sustainability_skeptic23,Another eco-friendly marketing gimmick or legit sustainable tech? @TechNature 🤷♀️,2024-01-25 17:20:55,4,19
While not of sufficient volume to get an accurate representation of user sentiment, this dataset is particularly great for sentiment analysis. Not only do all of these posts indicate relevance by mentioning @TechNature directly, but they have emojis and hashtags to give the model additional context and emotional cues. Further contextual information is given by the “username” and “timestamp” columns. With more data, this information can be very useful for identifying trends, patterns, or biases among users. Additionally, the “reposts” and “likes” column provide quantifiable ways of measuring engagement and the impact of particular sentiments.
However, this fake dataset isn’t perfect. A real-life dataset, would unlikely have this many “on-theme” usernames (ex: tech_lover23, sustainability_fan67).
After uploading the file and hitting enter, we get the following output. Keep in mind that outputs can vary.
Based on the sentiment analysis of each social media post, the results are:
Post 1234567890 by @tech_lover23 - Positive
The sentiment of the post is positive because it expresses satisfaction with the product, using the phrase “it survived” and adding a positive emoji (📱).
Post 2345678901 by @eco_warrior44 - Positive
The sentiment of this post is positive as it expresses satisfaction with buying from a brand that cares about the planet and the use of positive emojis (🌿).
Post 3456789012 by @gadget_guru55 - Neutral
The sentiment of this post is neutral because it raises a question about the price, but doesn’t explicitly express a negative or positive sentiment, using a questioning emoji (🤔).
Post 4567890123 by @sustainability_fan67 - Positive
The sentiment of this post is positive as it expresses satisfaction with the product’s solar charging feature and uses positive emojis (🔋).
Post 5678901234 by @tech_critic78 - Negative
The sentiment of this post is negative as it expresses disappointment with the product’s scratch after one week and uses a disappointed emoji (😤).
Post 6789012345 by @green_tech_fan89 - Positive
The sentiment of this post is positive because it expresses surprise and satisfaction with the product’s charging cable and uses a positive emoji (👌).
Post 7890123456 by @mobile_pro90 - Positive
The sentiment of this post is positive as it compares the product favorably to the user’s old phone case and thanks the brand, using a celebratory emoji (🚀).
Post 8901234567 by @budget_buyer01 - Neutral/Positive
The sentiment of this post is neutral with a slightly positive tone, as it acknowledges the product’s quality and considers its value to be worth the price, using a positive emoji (💯).
Post 9012345678 by @customer_care_watcher12 - Positive
The sentiment of this post is positive because it praises the brand’s customer service and uses a positive emoji (🙌).
Post 0123456789 by @sustainability_skeptic23 - Neutral
The sentiment of this post is neutral because it raises a question about the brand’s authenticity, using a questioning emoji (🤷♀️).
Overall results:
Positive posts: 6
Negative posts: 2
Neutral posts: 3
And there you have it. Feel to play around with the code to develop an understanding of how this works and can be improved to your liking. Remember to turn off your GPU droplet when you’ve finished using it.
Congratulations on making it to this point, we covered a lot.
In this tutorial, we discussed the importance of social media analytics for businesses and how these workflows can be augmented with Large Language Models (LLMs). We looked at how DigitalOcean’s GPU Droplets can be integrated with open-source LLMs optimized by HuggingFace. We then saw how Gradio can be used to create an interactive user interface with minimal code.
Way to go, you!
Some of our other articles on 1-click models
Some excellent HuggingFace Resources
HuggingFace documentation
Open-source LLM Ecosystem at Hugging Face
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!