Prompt-based NLP is one of the hottest topics in the natural language processing space being discussed by people these days. And there is a strong reason for it, prompt-based learning works by utilizing the knowledge acquired by the pre-trained language models on a large amount of text data to solve various types of downstream tasks such as text classification, machine translation, named-entity detection, text summarization, etc. And that too under the relaxed constraint of not having any task-specific data in the first place. Unlike the traditional supervised learning paradigm, where we train a model to learn a function that maps input x to output y, here the idea is based on language models that model the probability of text directly.
Some of the interesting questions that you can ask here are, Can I use GPT to do Machine Translation? Can I use BERT to do Sentiment Classification? and all of it without having to train them for these tasks, specifically. And that’s exactly where prompt-based NLP comes to the rescue. So in this blog, We’ll try to summarise some initial segments from this exhaustive and beautifully written paper -Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. In this blog, we discuss various types of learning paradigms present in NLP, notations often used in prompt-based paradigm, demo applications of prompt-based learning, and discuss some of the design considerations to make while designing a prompting environment. This blog is part 1 of 3 blog series that will soon follow up discussing other details from the paper like Challenges of such a system, Learning to design the prompts automatically, etc.
Paradigms in NLP Learning Space| Source: https://arxiv.org/pdf/2107.13586v1.pdf
Until Paradigm3: Pre-train, Fine-tune, the use of Language Models as a base model for almost every task didn’t exist. That’s why we don’t see an arrow under the “Task Relation” column in the figure above amongst the boxes. Also, as discussed above, with Prompt-based learning the idea is to design input to fit the model. The same is depicted in the above table with incoming arrows to LM (Language Model), where the tasks are CLS (Classification), TAG (Tagging), GEN (Generation).
As can be seen in the below figure, we start with input (x) (Let’s say a movie review), and output expected (y). The first task is to re-format this input using a prompt function (mentioned Fprompt in the image), the output of which is denoted as (x’). Now it’s the task of our language model to predict z values in place of the placeholder Z. Then for prompts where the slot Z is filled with an answer, we refer to it as called Filled prompt, and if that answer is true, we call it Answered prompt.
Terminology in Prompting | Source: https://arxiv.org/pdf/2107.13586v1.pdf
Some of the popular applications of this paradigm are Text Generation, Question Answering, Reasoning, Named Entity Recognition, Relation Extraction, Text Classification, etc.
Text Generation - Text generation involves generating text, usually conditioned on some other piece of information. With the use of models trained in an auto-regressive setting, the task of text generation becomes natural. Often, the prompts designed are prefixed in nature with a trigger token as a hint for the model to start the generation process.
Question Answering - Question answering (QA) aims to answer a given input question, often based on a context document. For example - given an input passage if we want to get all the names mentioned in the passage, we can formulate our prompt to be "Generate all the person names mentioned in the above passage. Our model now behaves similarly to text generation, where the question becomes the prefix.
Named Entity Recognition - Named entity recognition (NER) is a task of identifying named entities (e.g., person name, location) in a given sentence. For example - if the input is “Prakhar likes playing cricket”, to determine what type of entity “Prakhar” is, we can formulate the prompt as “Prakhar is a Z entity”, and the answer space Z generated by the pre-trained language model should be person, organization, etc, which “person” having the highest probability.
Relation Extraction - Relation extraction is the task of predicting the relation between two entities in a given sentence. This video explanation talks about modeling the relation extraction task as a natural language inference task using a pre-trained language model in a zero-shot setting.
Text Classification - Text classification is the task of assigning a pre-defined label to a given text piece. The possible prompt for this task could be “the topic of this document is Z.”, which is then fed into mask pre-trained language models for slot filling.
Available after getting access to the GPT-3 API On November 18, 2021, OpenAI announced the broadened availability of its OpenAI API service, which enabl…
We will be using OpenPrompt - An Open-Source Framework for Prompt-learning for coding a prompt-based text classification use-case. It supports pre-trained language models and tokenizers from huggingface transformers.
You can install the library with a simple pip command as shown below -
>> pip install openprompt
We simulate a 2-class problem with classes being sports and health. We also define three input examples for which we are interested in getting the classification labels.
from openprompt.data_utils import InputExample
classes = [
"Sports",
"Health"
]
dataset = [
InputExample(
guid = 0,
text_a = "Cricket is a really popular sport in India.",
),
InputExample(
guid = 1,
text_a = "Coronavirus is an infectious disease.",
),
InputExample(
guid = 2,
text_a = "It's common to get hurt while doing stunts.",
)
]
Defining Input Examples
Next, we load our language model and we choose RoBERTa for our purposes.
from openprompt.plms import load_plm
plm, tokenizer, model_config, WrapperClass = load_plm("roberta", "roberta-base")
Loading Pre-trained Language Models
Next, we define ourtemplate that allows us to put in our input example stored in “text_a” variable dynamically. The {“mask”} token is what the model fills-in. Feel free to check out How to Write a Template? for more detailed steps in designing yours.
from openprompt.prompts import ManualTemplate
promptTemplate = ManualTemplate(
text = '{"placeholder":"text_a"} It was {"mask"}',
tokenizer = tokenizer,
)
Next, we define verbalizer that allows us to project our model’s prediction to our pre-defined class labels. Feel free to check out How to Write a Verbalizer? for more detailed steps in designing yours.
from openprompt.prompts import ManualVerbalizer
promptVerbalizer = ManualVerbalizer(
classes = classes,
label_words = {
"Health": ["Medicine"],
"Sports": ["Game", "Play"],
},
tokenizer = tokenizer,
)
Next, we create our prompt model for classification by passing in necessary parameters like templates, language model and verbalizer.
from openprompt import PromptForClassification
promptModel = PromptForClassification(
template = promptTemplate,
plm = plm,
verbalizer = promptVerbalizer,
)
Next, we create our data loader for sampling mini-batches from a dataset.
from openprompt import PromptDataLoader
data_loader = PromptDataLoader(
dataset = dataset,
tokenizer = tokenizer,
template = promptTemplate,
tokenizer_wrapper_class=WrapperClass,
)
Next, we set our model in evaluation mode and make prediction for each of the input example in a Masked-language model (MLM) fashion.
import torch
promptModel.eval()
with torch.no_grad():
for batch in data_loader:
logits = promptModel(batch)
preds = torch.argmax(logits, dim = -1)
print(tokenizer.decode(batch['input_ids'][0], skip_special_tokens=True), classes[preds])
Below snippet shows the output for each of the input example.
>> Cricket is a really popular sport in India. The topic is about Sports
>> Coronavirus is an infectious disease. The topic is about Health
>> It's common to get hurt while doing stunts. The topic is about Health
Here we discuss a few of the basic design considerations that can be used while designing the prompt environment
It’s interesting to see a new stream of research coming up in NLP dealing with minimal training data and utilizing large pre-trained language models out there. We will expand to each of the above-mentioned design considerations in the follow-up parts of this blog.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!