Building better service bots with GPT-3

Can breakthrough NLP models like GPT-3 soon help us build better chatbots?

8 min readJan 3, 2022

In 2020, the release of a massive new language AI model called: GPT-3 garnered worldwide attention. The model, developed by OpenAI, could understand and generate natural language to a level never seen before. Two years later, a lot has transpired in the AI space, but we’re still not near to human-level operational service bots, and GPT-3 is still in beta. As with many AI advancements, these models offer enormous promises when they’re first unveiled but then fizzle out just as quickly when put to use. How far must a technological breakthrough, such as GPT-3, leap before people can use it in real-world applications such as complete customer service automation?

The rise of Transformers in NLP

Transformers became the standard in Natural Language Processing (NLP). Transformers are massive pre-trained deep neural networks trained for various natural language processing tasks. Think translation, rephrasing sentences, question-answering, identifying the sentiments of a sentence, carrying conversations, summarizations, to even picking up meanings from emojis. What separates predecessor transformers from GPT-3 is that this model learned from over 45TB of text data from multiple sources like Wikipedia, Google Books, and common search queries on the Internet. To capture the semantics and meanings of all this textual data, the GPT-3 was trained with an overwhelming 175 billion machine learning parameters, ten times more than benchmarked NLP predecessors.

GPT-3 transformer and its number of parameters compared to various other benchmarked NLP models. Each parameter can be viewed a synaptic link between neurons in the brain. Only the brain has 100 trillions of them (50x more as GPT-3 parameters has)

For the first time in history, OpenAI wanted to demonstrate that simply scaling model parameters resulted in a greater level of language understanding. And, unlike predecessors as BERT, which could merely classify a sentence to a specific subject or detect similarities between sentences, the GPT-3 model was able to truly develop its own material in response to a user question., demonstrating that it could successfully interpret language at a far higher level.

Example conversation with GPT-3; obtained from https://www.nabla.com/blog/gpt-3/

Upgrading Customer Service with transformers

Understanding questions and situations are essential when it comes to customer service. Following that, customer support representatives should understand the business process that must be followed to resolve various difficulties and answer questions.

The present-day technologies behind most chatbots are a great time saver when it comes to answering, in brief, the frequently asked questions or following up with preset conversational dialogues. However, when it comes to answering specific and complex queries, these chatbots fail.

We have to deal with the long tail of customer service intents in this case. The long-tail of customer service intents tells us that generally, 20% of the questions are about the top 20% intents ie., only 20% of the customer queries are asked frequently. To get to the further 80% of the intents where the questions get more specific and less frequent, you need to move into the far end of the long-tail, lowering the marginal benefit of automating them.

This is due to the fact that when employing standard NLP techniques, such as those employed in today’s chatbots, each answer to an intent must be carefully established upfront, which necessitates a significant amount of effort in terms of engineering and content maintenance.

So it appears that we are in a dilemma here.

Organizations would really love to automate customer service departments further if the responses were qualitatively similar to those of their human counterparts, but they often hit a stumbling block when reaching this 20%.

To overcome this barrier, ideally, you want to have a smart technology to absorb logic from written business process logic without predefining all answers and building dialogs upfront.

Such technology will then work its magic by learning in the same way that human agents learn about different business processes.

Could GPT-3 help us better customer service chatbots?

The golden question is whether the GPT-3 model is ready to do this?

Well, let's test it!

In this experiment, I used GPT-3's Q&A settings in the OpenAI beta playground. As I previously stated, the model must learn the business process for various types of intents simply by writing it out, and in this configuration, this business logic content is labeled as 'Intro.'

Following that, we ask the model a specific and lengthy ('human') service question. When we hit the generate option, the AI (GPT-3) model generates an answer that makes sense as the correct answer to this service query.

OpenAI’s GPT-3 Playground using the base version of DaVinci.

The following scenarios demonstrate how the GPT-3 performs for various complex customer service use cases in the context of a utility company:

Scenario 1: Cancel a contract

Intro: There is no cancellation fee if the customer cancels during its 14-day cancellation period. The agent should inquire as to when the customer signed the contract. If the customer signed the contract more than two weeks ago, the cancellation fee should be €150. If the customer agrees, the agent can either support by canceling the deal entirely or canceling the contract and signing up for a new contract with our company.

Now, let us look at how the conversation pans out between the AI and the customer.

Human: Dear, I would like to cancel the contract I signed with you last week. How can I solve this? Cancel first, or can I close the other one? I would like to hear from you.
AI: I understand. I will check your contract and get back to you as soon as possible.

As you can see, the customer signed a contract within the 14 day consideration period. The AI doesn’t discuss the cancellation fee, which is correct as it is not necessarily required. Equally, when you change the time of the query from last week to last year:

Human: Dear, I would like to cancel the contract I signed with you last year. How can I solve this? Cancel first, or can I close the other one? I would like to hear from you.
AI: I understand. I can cancel the contract for you, but there is a fee of €150.

You see, the AI correctly follows the process and mentions the cancellation fee.

Scenario 2: New payment plan

In another scenario, a customer requests a new payment plan:

Intro: The process for issuing payment plans with the customer is:
1) Check if the client already has an active payment plan? If the client already has an active payment plan, one cannot opt for a new one. One should pay for everything first.
2) If the client doesn’t have an active payment plan, one can pay in several terms or postpone a payment once.
3) Check to see which invoices the client wants a payment plan for.

The conversation goes like this:

Human: Good afternoon. Recently, I contacted you regarding a high settlement with the cause of a ground fault. You told me that I should recover the amount from my home building, but I can’t manage that, so I would like to request a payment plan for the high amount even though I don’t agree that I should pay this amount because it is already clear what caused it.
AI: I understand your concern. I will check if you already have an active payment plan and an invoice for this, and then we can discuss the possibilities to postpone a payment or pay in terms.

The AI is uncertain whether the customer has a payment plan, so it correctly says it will check this internally and then follow up accordingly.

Now, if the human indicates it does have an active payment plan, the AI answers as follows:

Human: Hi, I already have a payment plan with you guys, but I have trouble paying next month’s invoice. Can I have a payment plan on my last invoice as well?
AI: I see you already have a payment plan with us. I cannot offer you another one as it would be against our policy.

Although these are two randomly selected examples, I believe you can see that the level of intelligence is pretty astounding, especially given that the results are based on a model that is not fine-tuned in any way.

Limitations

Though these scenarios appear to be very promising, due to the generative nature of the language models, they can occasionally return texts that make no sense at all, particularly when the conversation gets longer. Right now, this is by far the major constraint of this model.

To understand this better, let us consider the following example:

Scenario 3: Temporary energy in both houses

Intro: To provide the customer with temporary electricity in both houses while they are moving, the following information is required:
1) ask when he expects to have the key?
2) ask when he expects to move?
3) ask for the new home’s address
4) ask whether the customer wants the same energy products in his new home as he currently has for his current home?
Human: My wife and I are relocating to a new home on Amsterdam’s Main Street. We will be receiving keys to a new home on February 26th, 2021; however, we will be leaving our existing apartment on March 3rd, 2021. I wonder how we may shift our energy supply to our new home while still paying and having electricity in our previous apartment by 03–03–2021. To be clear, we require electricity for both households for the whole five-day period. Is it even possible to do something like that?
AI: I understand that you want to move your electricity supply from one address to another. I can help with this, but first, we need some information about the new home.
Human: sure, go-ahead
AI: I have the following information about your new home:
Human: yes?
AI: The address is Keizersgracht 393, Amsterdam.
Human: yes
AI: The postcode is 1016 AB.
Human: correct
AI: The zip code is 1016 AB.
Human: again, correct
AI: The city is Amsterdam.
Human: also correct
AI: The country is the Netherlands.
Human: correct
AI: The phone number is +31 20 555 5555.
Human: no
AI: The phone number is +31 20 555 5555.
Human: no
etc.

As you can see, the AI model starts off promising but seems to have lost the main objective as the conversation continues and finds itself in a repeating mode asking for client details that are not relevant.

Conclusion

Open AI’s GPT-3 model has propelled NLP to new heights by expanding its deep neural networks with tonnes of data, resulting in an unparalleled increase in the intelligence of these models. When applying these types of models for better customer service automation, the AI must be able to learn in such a way that the cost of adding new intelligence for the long tail intents is minimized.

The GPT-3 model is one of the first models that holds the potential to do this. Experiments in the GPT-3 playground demonstrate that it can extract intelligence from written business logic. The model appears to recognize the differences between different types of user questions in the various circumstances evaluated.

However, in another scenario, you also see that a lack of self-reflection resulting in ambiguous answers.

With advances like GPT-3, we may be optimistic that service chatbots will soon deliver more accurate and better conversations with clients without the need to pre-define all answers and dialogues that occur in the long tail.

The main question now is whether improved AI models can be obtained simply by increasing the number of neurons in a deep neural network or whether more is required to capture the challenging aspects of providing human-level customer assistance. We will learn soon enough.