Explaining the ‘T’ in ChatGPT

Professor Jiaming Xu says the ‘Transformer’ emulates how humans process language, by focusing on the most relevant information

August 2, 2023
Big Data, Innovation

One letter is key in explaining how Artificial Intelligence models work. ChatGPT stands for ‘Chat Generative Pretrained Transformer’, and the word ‘Transformer’ is at the core of how these systems work, said Jiaming Xu, an associate professor in the area of Decision Sciences at Duke University’s Fuqua School of Business.

In a live presentation on Fuqua’s LinkedIn page, Xu described the transformer as the fundamental building block of the large language models (LLMs) behind the rise of AI.

At its core, the transformer is a mechanism that tries to emulate the way humans process language, Xu said.

“As you read through this article, you may skim some parts and concentrate on others,” Xu said. “It’s a simple yet powerful idea: not all the input information holds equal importance to your interest.”

Similarly, models trying to emulate natural language processing are built to pay more attention to more relevant information, he said.

Xu said the transformer was introduced by the Google Brain team in 2017 with the seminal paper, ‘Attention is all you need’. As the title suggests, the core idea of the paper was the importance of ‘self-attention’, a term describing how humans effectively isolate resonant elements in processing information, Xu said.

To illustrate the ‘self-attention’ mechanism, Xu used the example of the sentence, “The train left the station on time” (the example was first used in the book, ‘Deep Learning with Python’.) If we see or hear the word ‘station’, our neural system may associate it to the word 'radio’, or ‘international space’, or – as in this case – the word ‘train’.

“What kind of station do we mean? Context is important,” Xu said.

Similarly, the transformer mechanism algorithmically trains LLMs to assign context-aware values (‘vectors’) to words, he explained. Each word is transformed into multiple vectors representing the different dimensions of the word (for example: its meaning, the position of the word in the sentence, and so on). “The dimensions can be thousands,” Xu said.

This part of the process is called ‘word embedding’, and it is necessary to transform the language input into a format (the vector) that computers understand, Xu said.

“A smart embedding would provide a different value for a word depending on the surrounding words. That’s where self-attention plays a role,” Xu said.

In the case of the train arriving at the station, the words ‘station’ and ‘train’ have a high “relevancy score”, because they have a relatively high chance of leading to one another, Xu explained.

Xu said LLMs are language models that compute the probability of the next word based on the architecture of the transformer.

“The LLM predicts what comes next by scouring the large word database it has been trained on,” Xu said. “This prediction is not based on facts but instead on the statistical relationships among the words and can sometimes be unreliable. That’s why sometimes we see ChatGPT gives nonsensical answers.”

The potential for business

The potentially historic magnitude of the AI revolution has prompted many Big Tech companies to join “an arms race for LLMs,” Xu said.

Among the many applications for business, third-party plugins have been developed that can operate with ChatGPT to better customize the user experience, Xu explained.

“You can already prompt GPT to create a food list for a party and queue it up in your Instacart account,” Xu said.

Microsoft has also integrated ChatGPT into its Bing search engine, opening the possibility for queries like asking the chatbot to plan a Disneyland trip for you and your children, pick the appropriate rides, and provide tips for avoiding lines.

 “Interesting concepts” have also emerged for the use of avatars in business, Xu said, like Microsoft Xiaoice’s virtual employee and Satoshi, “the world’s first AI anchor.”

ChatGPT and LLMs can also be used for personalized recommendations, Xu said. An example is an application for the insurance industry. In insurance, price is a function of pre-determined factors, like previous accidents. But Xu points to AI-powered Root, an application for insurance companies that records customers’ driving time, distance, driving style every day. “And after two to three weeks, its AI algorithm decides what your price is,” he said.

Recently, a new generative AI model for insurance, InsurGPT, was invented to read and accurately extract data from various documents, such as quotes and claim forms.

Risks and open questions about AI

“LLMs also face many issues and controversies,” Xu added.

For example, Xu mentioned artists whose voices have been sampled and reproduced – with staggering results – for viral songs, without consent.

He also mentioned an even deeper quandary—in some cases, AI has become so powerful that the developers themselves don’t know what patterns the machine is capturing in the huge mass of data it has been trained on.

“This is the peculiar part of deep learning,” Xu said. “When you train a neural network, it will try to capture the important features, dimensions of the input somehow magically. You don’t need to hand-design each dimension on your own. You are just letting the network figure it out, by training it through a lot of data.”

This story may not be republished without permission from Duke University's Fuqua School of Business. Please contact media-relations@fuqua.duke.edu for additional information.

Contact Info

For more information contact our media relations team at media-relations@fuqua.duke.edu.