Large Language Models Use Cases and Applications

If you have not yet heard about Large Language Models (LLMs), you will soon, because these are the beating hearts that power so many of the amazing artificial intelligence tools that have cropped up recently, and have even appeared in South Park.

Whether it’s ChatGPT, DALL-E, or our very own Vectara platform, LLMs are revolutionizing how humans interact with computers.

In this article we’ll discuss the most common use cases of large language models and problems they solve, but also challenges they face and thoughts on their future.

What are large language models and how do they work

An LLM is a piece of software that understands language very well, and uses that understanding to take a certain action. The most common actions that LLMs provide are generating content, finding information, conversing, or helping to organize your data. This post focuses on human language. But it is important to note that other domains also have languages and therefore also benefit from LLMs. A common example is genomics, whose language is DNA.

To make LLMs useful for some specific task, an application will accept one or more prompts from a user, then provide that as input to the LLM. These prompts are usually a question, an instruction, a description, or some other sequence of text.

The LLM then decides (i.e. predicts) what information should be returned to the user, and the application uses that information to craft a response, such as an answer or some novel generated content.

Technically, LLMs are specialized deep neural networks that have been trained primarily on text, although some use images, video, or audio as well. They are very robust and broad in terms of how they can be used, which helps them to achieve widespread adoption.

To better understand what these are, let’s deconstruct the LLM name itself:

Large – means they are trained on huge data sets with many parameters. For instance Generative Pre-trained Transformer version 3 (GPT-3) has more than 175 billion parameters and was trained on about 45 TB of text. This is how they can have such broad applicability.
Language – means that they operate primarily on human language.
Model – means they are used to find patterns or make predictions within data.

LLMs understand language so well because they figure out the relationships between words. To do this they leverage one or more powerful techniques recently developed by the machine learning research community. A few of the most noteworthy ones are:

Transformers – a type of deep learning model that tracks relationships within sequences of data so that it can understand the meaning inherent to that data and transform it into some other form. These use self-attention, a technique that lets a model use words previously seen in the sequence as it learns its parameters. This allows it to factor in the broader context for a given piece of text.
Bidirectional Encoding – a ML architecture, first described in the seminal BERT paper, that uses transformers and looks in both directions (i.e. the text before a specific word and after a specific word) to understand the meaning of ambiguous language as it encodes one sequence of text into another sequence.
Autoregressive Models – a type of model that uses the previous words to predict the next word in a sequence. These are very common for models that produce novel text.

Large language model examples

At this point many people have heard of a few of the most noteworthy LLMs and LLM-powered applications, such as ChatGPT and DALL-E (which generated the header image for this post via an ‘oil painting of a computer with a chat dialog on the screen’s prompt). But there are many more. Some of these are open source while others are closed source, and some are software artifacts you must download and bundle into your application while others are services consumed via APIs.

Large language model use cases

Generative

One of the most common use cases of LLMs is to generate content based on one or more prompts from a user. The primary goal is to improve the efficiency of knowledge workers, or in some cases obviate the need to have a human in the loop if the task is rudimentary enough. Generative applications are numerous – conversational AI and chatbots, creation of marketing copy, code assistants, and even artistic inspiration.

Figure 1. ChatGPT is a well-known example of an application powered by Generative LLMs. [Image courtesy of Emiliano Vittoriosi]

Real-world applications

GPT-3 (and ChatGPT), LaMDA, Character.ai, Megatron-Turing NLG – Text generation useful especially for dialogue with humans, as well as copywriting, translation, and other tasks
PaLM – LLM from Google Research that provides several other natural language tasks
Anthropic.ai – Product focused on optimizing the sales process, via chatbots and other LLM-powered tools
BLOOM – General purpose language model used for generation and other text-based tasks, and focused specifically on multi-language support
Codex (and Copilot), CodeGen – Code generation tools that provide auto-complete suggestions as well as creation of entire code blocks
DALL-E, Stable Diffusion, MidJourney – Generation of images based on text descriptions
Imagen Video – Generation of videos based on text descriptions
Whisper – Transcription of audio files into text

Summarization

With data volumes continuing to explode, especially as computer systems themselves generate more and more content, it becomes increasingly important to have good summaries so us humans can make sense of all those articles, podcasts, videos, and earnings calls. Thankfully, LLMs can do that too.

One flavor of this is abstractive summarization, where novel text is generated to represent the information contained in longer content. The other is extractive summarization, where relevant facts retrieved based on a prompt are extracted and summarized into a concise response/answer.

Figure 2. Text summarization is an increasingly common technique to help humans grapple with information overload. [Image courtesy of Romain Vignes]

Real world applications

Assembly AI – provides transcription and summarization of audio and video
Davinci – a GPT-3 based model that can summarize text, among several other tasks
Cohere Generate – LLM-based product that can paraphrase text and distill long passages down to condense points
Megatron-Turing NLG – LLM that can perform a broad set of natural language tasks, including summarization
Viable – summarizes data spread out across multiple sources to improve business operations and efficiency

Rewrite

It is very common to use LLMs to convert text from one form to another – these are based on transformers after all. This could be done to correct spelling/grammar errors or to redact content. Translation can also be considered as a form of rewriting.

Real world applications

Grammarly – Grammatical error correction tool
Cohere Generate – LLM-based product that can rewrite text, for example to clean it up or change the voice
Google Translate – translates over 100 languages
Meta AI’s NLLB-200 – translates over 200 languages

Search

Traditional search offerings usually use keyword-based algorithms, sometimes employing knowledge graphs or pagerank style approaches as well, to look up information that is (hopefully) relevant to what the user is asking for.

These are fast giving way to LLM-based techniques, such as “neural search”, which understand language much more deeply and are able to find more relevant results. This is especially important now, with people more commonly searching for information using long form queries, explicit questions, or conversational prompts.

So the ubiquitous search box in websites and applications will get much smarter. But so will all the implicit usages of search which can enable capabilities such as recommendations, conversational AI, classification, and more.

Figure 3. The standard search box is being revolutionized by LLM-based information retrieval techniques.

Real world applications

Vectara – LLM-powered search platform which matches data based on intent and meaning, regardless of how the concepts are worded.
Glean – workplace search that helps you find information across your company’s applications
Neeva – search engine providing ad-free results over data crawled from the Internet, with the option to also data in your personal accounts searchable
Azure Embeddings Models, OpenAI Embeddings Models – these generate text embeddings that can be used as the basis for a custom-built search system
Jina – neural search platform that provide prompt optimization and decision support capabilities
You.com – search engine that leverages LLMs to help make users’ search activities more efficient

Question Answering

This is essentially a combination of “Search” and “Summarize”. The application first uses LLMs to understand what the user is looking for and return a relevant set of information. Then it uses another LLM to summarize that information into a singular answer.

This daisy-chaining of LLMs, where one model’s output is used as another model’s input, is a common design, as these models are usually built with composability in mind.

Question answering capabilities can improve customer service and customer support outcomes, help analysts find insights more effectively, make sales teams more efficient, and make conversational AI systems more effective.

Real world applications

Google Search, Bing Search – both of these regularly attempt to provide a summarized answer at the top of a list of search results
LLaMA – focused especially on question answering and document summarization
Vectara – retrieval of relevant information based on the user’s query/prompt, which is then summarized to provide an answer with citations
Neeva– in addition to search results (as mentioned above) summarized answers are provided to the user
Contriever – LLM from Facebook Research that has been trained for information retrieval and question answering

Clustering

It is frequently useful to group documents together based on the content they contain. This helps users organize or understand the data available to them, and it can help content providers increase engagement by surfacing content in an easy-to-consume manner. As with Search, this relies on understanding of the meaning inherent to the data. But instead of using that understanding as part of a retrieval operation, it is used to group the data together into similar buckets.

Figure 4. Clustering is a very common way to help humans make sense of large sets of data. [Image courtesy of: hellisp, Public domain, via Wikimedia Commons]

Real world applications

Cohere Embed, Azure Embeddings Models, OpenAI Embeddings Models – these generate text embeddings that can be used as the basis for a custom-built clustering application

Classification

This is similar to Cluster, but instead of placing data into previously-unknown groupings, with Classify the groupings are known in advance. Examples include intent classification, sentiment detection, and prohibited behavior identification. This can be done via a traditional supervised learning approach, where the classifier is trained on the embeddings, or via a few-shot approach, where prompt engineering is used to provide examples to a LLM that then learns how to do the classification.

Real world applications

Azure Embeddings Models, OpenAI Embeddings Models – these generate text embeddings that can be used as the basis for a custom-built classification system
Cohere Classify – libraries that let you build classifiers into your application
Vectara– use the semantic similarity capability to identify which class(es) a new piece of text belongs to

Challenges that large language models face

As impressive as LLMs are, it’s still early days and there are serious challenges still to be overcome before we will see widespread adoption and acceptance. Some of these are intrinsic to LLMs themselves, and others have to do with the applications that use them. The Future of large language models section below offers perspectives on how some of these challenges can be mitigated or overcome.

Hallucination – for LLMs that Generate text, it is; common for them to include plausible-sounding falsehoods within their responses. This normally happens because the model doesn’t have all the relevant information it needs, but a malevolent actor could (and likely has already) bias a model in a specific direction, using false or misleading content to achieve that outcome. In some cases this is merely comical, but there is real risk if people – or other applications – assume these responses are true and act on them.
Cost of LLM Creation – due to the number of parameters used in LLMs and the huge training sets, it is normally very expensive to train these models and to execute them at inference time – especially at scale, with low latency, and high concurrency. This cost is ultimately passed on to the user.
Cost of LLM Based Solutions – it’s now alluringly easy to wire together a POC that uses one or more LLMs to implement a given use case. But it’s expensive to build and maintain a production-grade solution that is low latency, scalable, and cost-effective. This is due largely to the inference time costs mentioned above, and the expensive and rare machine learning talent that can keep you at the forefront of the fast-moving LLM space. It will frequently make more economic sense to buy a solution as opposed to building one, and there will be many vendors who offer this option.
Interpretability – it’s often hard, or impossible, to know why the LLM did what it did. In some cases this is benign, in other cases this engenders mistrust and will be an inhibitor to adoption, and in still other cases it will be a complete showstopper (e.g. in regulated industries).
Risk of Spectacular Fails – as the quote goes “Trust Takes Years To Build, Seconds To Break And Forever To Repair”… one bad failure in an application where a LLM is to blame can erase the gains made by all the other times where it worked properly. This is causing practitioners to ease their way into the LLM space, which is a prudent approach but which will also reduce the pace of adoption.
Impersonal – for LLM applications that perform Generative or Search tasks, the results can often be too generic, and based more on the training data (which is usually publicly available data from the Internet) than on an organization’s specific data, which was not used during training. This can lead to impersonal user experiences, or even worse, incomplete or incorrect results.

Many of these challenges will undoubtedly be addressed in the coming years, while others will persist and be thorns in our sides for quite some time. In both cases the community of LLM Engineers, Software Developers, and Product Owners must be cognizant of these challenges, and build appropriate guardrails and transparency into the applications they create.

Future of large language models

This is a very fast moving space, so it is difficult to predict the future. But even at this nascent stage there are strong indications that the following trends and developments will transpire.

LLM Overload

With the rising popularity of LLMs – not to mention all the venture capital pouring into this space – we will see an explosion of new and derivative models. Advanced researchers will continue to push the envelope on the core LLMs while access to them will become democratized. And it will be much more common for them to be consumed within applications as opposed to in raw form.

But much like the recent NoSQL and big data booms, there will be too many LLM options, flavors, and vendors vying for our attention. They will all be very powerful, but most will not be very differentiated from each other, so variations in non-functional aspects such as size, license terms, cost, and ease of use will matter a lot.

Hence organizations will come to rely on a relatively small number of leading vendors and communities, who will help the average developer cut through all the noise and pick the right models and tools.

Trust, but Verify

The capabilities provided by LLMs are frequently amazing. But there are equally amazing fails, as seen with ChatGPT and Bing in early 2023. This will cause people to have a healthy level of suspicion, to ensure that they bring the benefits of these capabilities to their organizations in a responsible, ethical, and legal manner.

For instance, these applications will be required to explain how they ended up with the answer or the content they provided. Table stakes will be something as simple as citations in generated answers, such as what Bing and Vectara (see image) can provide.

Additionally, end users – or potentially regulators – will require applications to be transparent about when artificial intelligence has generated a piece of information.

Figure 5. Providing citations alongside generated content engenders trust between users and the AI system

LLM Maturity Curve

Shocker here… we will see a version of the familiar hype cycle, such as this 2022 report on AI, play out as the LLM space evolves. We can characterize this by the perspective of the people/teams that bring these into an organization. The market at large will progress along this maturity curve, and vendors will likely focus on one of the following steps.

Technical novelty of LLMs – “I’m the one who brought LLMs into my organization, and I found the best models and APIs and stitched them together myself.” There will be great successes and great failures amongst this set of early adopters.
Using LLMs to safely achieve real business goals – “I did the above and managed to get meaningful business outcomes without breaking the bank.” These folks will (correctly) focus on outcomes vs technology, but still walk a rocky path as they keep up with the fast pace of evolution in this space.
Focus on business logic instead of LLM underpinnings – “I got those business outcomes without having to worry about keeping up with which models are best and how to run them cost effectively while still getting good performance, HA, and fault tolerance. I just focused on my business.” This is where widespread adoption takes off and access to the power of LLMs becomes truly commoditized.

Cheaper to Build

LLM Engineers will develop new architectures that produce better results but with fewer parameters, resulting in faster and cheaper training. There will also be hardware and software improvements that let LLM Engineers get more mileage out of the silicon available to them, akin to what TPUs did for deep learning in the mid 2010s.

These lower costs will help expedite the evolution of the LLM ecosystem, and most importantly will result in lower costs to the end users. A recent example of this trend is seen with the model family used by ChatGPT being 10x cheaper than the previous version.

All the Enterprise-ey Things

Many novel user experiences will be accessible using LLMs, but these will need to be packaged such that enterprises can safely use them. This means we’ll soon see investments made to provide all the boring-yet-hugely-important features like:

audit trails
access controls
financial governance
data privacy
data lineage
and many more

Conclusion

It truly is astonishing how quickly LLMs have become so powerful. We’re at the start of a revolution in how people interact with computers, where the amazing will become normal. In just a few years almost every application we use will in some way be powered by LLMs. That said, there is certainly a lot of work to be done to get there.

If you are anxious to get your hands dirty with an LLM, sign up with Vectara and then test out our APIs or web console. You can also learn about what you can build in our free tier. Happy discovering!

Large Language Models Use Cases and Applications

What are large language models and how do they work

Large language model examples

Large language model use cases