Table of Contents
Since the 1950s, artificial intelligence (AI) — the idea that machines or software can replicate human intelligence to answer questions and solve problems — has been an area of significant promise and focus.
Given rapid advancements in computing power and the ability to store and process massive amounts of data, AI has now become commonplace in most daily experiences, as seen with smartphones, connected home devices, intelligent driving features (including self-driving cars), chatbots, and even real estate listings.
Large language models (LLMs) complement and enhance AI applications, and they have become accessible to everyone through tools such as OpenAI’s ChatGPT and other generative applications.
What are large language models (LLMs)?
An LLM is a type of AI model trained on massive amounts of text and data from sources across the internet, including books, articles, video transcripts, and other content. LLMs use deep learning to understand content and then perform tasks such as content summarization and generation, and they make predictions based on their input and training.
LLMs can be trained on more than one petabyte of data. For reference, a one-gigabyte document can contain approximately 180 million words. A petabyte can contain one million gigabytes worth of data.
The rigorous LLM training process enables applications and platforms to understand and generate content including text, audio, images, and synthetic data. Most popular LLMs are general-purpose models that are pre-trained and then fine-tuned to meet specific needs.
How do large language models work?
LLMs require an extensive training and fine-tuning process before they can deliver reliable and useful results (although they have several limitations). In many cases, professionals across industries, including marketing and sales teams, use pre-trained LLMs that are provided by organizations that dedicate a wealth of resources to create and maintain the LLM (see the section below for examples of popular LLMs).
The following high-level steps are required to train and fine-tune an LLM:
- Identify the goal/purpose - There should be a specific use case for the LLM, and the goal will affect which data sources to pull from. The goal and LLM use case can evolve to include new elements as the LLM is trained and fine-tuned.
- Pre-training - An LLM requires a large and diverse dataset to train on. Gather and clean the data so it is standardized for consumption.
- Tokenization - Break the text within the dataset into smaller units so the LLM can understand words or subwords. Tokenization helps the LLM understand sentences, paragraphs, and documents by first learning words and subwords. This process enables the transformer model and transformer neural network, which are a class of AI models that learn the context of sequential data.
- Infrastructure selection - An LLM needs computational resources like a powerful computer or cloud-based server to handle the training. These resource requirements often limit many organizations from developing their own LLM.
- Training - Set parameters for the training process, such as batch size or learning rate.
- Fine-tuning - Training is an iterative process, meaning an individual will present data to the model, assess its output, and then adjust the parameters to improve its results and fine-tune the model.
What are large language models used for?
Large language models can be used to accomplish many tasks that would commonly take humans a lot of time, such as text generation, translation, content summary, rewriting, classification, and sentiment analysis. LLMs can also power chatbots, which enable customers to ask questions and seek help or resources without entering a support queue.
People can engage with LLMs through a conversational AI platform that allows them to ask questions or provide commands — a process known as prompt engineering — for the LLM to fulfill.
It’s important to note that LLMs should not replace humans (see the section below on LLM limitations). Instead, LLMs augment and accelerate human productivity, can help people overcome writer’s block, and handle mundane or repetitive tasks to free humans up to pursue other important or creative endeavors.
Marketing use cases for LLMs
Marketing teams can leverage LLMs and AI-powered tools to accelerate their content creation workflows and support various elements of the customer journey. LLMs are powerful for streamlining marketing processes, managing brand reputation, and improving customer support response times. This value is especially helpful if an organization lacks internal resources or manages a large volume of customer interactions.
- Audio transcription - Create a transcript from audio and video content, like a webinar or customer service call, to extract insights, perform sentiment analysis, and create derivative content.
- Chatbots - Answer common customer questions and point them to resources through a chat-based interface. Chatbots are especially effective for reducing customer support wait times and resolving customer issues quickly.
- Content editing - Review and improve existing content by receiving suggestions to refine the tone or sentence structure.
- Content generation - Generate new content by describing the content goal and providing the LLM with key details like the intended audience and topics to address in the content.
- Content summarization - Distill key themes and important points from a piece of content. This can greatly accelerate research tasks and help with understanding large volumes of text.
- Sentiment analysis - Understand whether articles, social media posts, or customer reviews are positive, neutral, or negative in sentiment. This information can help teams prioritize specific customer comments or expressed needs and identify areas they should address.
- Style guide enforcement - Feed a content style guide into the LLM to ensure that future content aligns with this style guide.
If you’re interested in learning more, this webinar replay further explains how generative AI can be used to enable and improve the above use cases.
What are the different types of large language models?
There are several different types of large language models based on how they are trained. Common LLM types include:
- Zero-shot models - A zero-shot model can perform tasks without being trained on examples. These models learn patterns and contextual information from the data on which they are trained and can perform other tasks without explicit training. For example, a zero-shot translation model could translate text from English to Spanish, even if it has not been trained on specific translation examples.
- Fine-tuned or domain-specific models - Fine-tuned or domain-specific models receive additional training on specific datasets to improve their output or performance for a distinct task or application. For example, a model could be fine-tuned using customer support calls and interactions to improve its effectiveness as a customer chatbot.
- Language representation models - Language representation models are designed to understand and generate language, making them useful for natural language processing (NLP). These models are fine-tuned to understand nuances of language, such as context and syntax.
- Multimodal models - Multimodal models can process and understand information from different modalities, such as audio, images, text, or video. Multimodal models can process these modalities as either inputs (what the user provides the model to generate its response) or outputs (what the model provides in response to a user’s prompt).
Key components of large language models
There are several key components of large language models that orchestrate requests and generate responses to a prompt. The following are the key LLM components:
- Embedding layer - The LLM embedding layer maps input tokens, which are words or subwords, and captures semantic relationships between words to help the model capture contextual information and improve its generalization.
- Feedforward layer - The feedforward layer processes the tokens from the embedding layer to capture patterns and relationships within the data. This enhances the LLM’s ability to learn from and understand the input data.
- Recurrent layer - A recurrent layer captures sequential dependencies so the model can consider previous tokens in a sequence. The recurrent layer is especially helpful for modeling sequential data and performing tasks where context and order matter (such as language understanding).
- Attention mechanism - The attention mechanism helps the LLM focus on specific parts of the input with different weights. The attention mechanism improves the model’s ability to understand the relationships or connections between separated elements and better capture the context of an input, especially if it is long.
- Neural network layers - Neural network layers — including input, hidden, and output layers — process information and pass it to the next layer. The layers explained in the above bullet points are organized into stacked layers that form a deep neural network. The neural network architecture enables the LLM to understand and generate human-like text.
Deciphering LLM, AI, NLP, GPT, and AGI
There are many acronyms and terms related to artificial intelligence and large language models that are commonly misunderstood or confused with one another.
Large language models vs. generative AI
Large language models are a specific class of AI models designed to understand and generate human-like text. LLMs specifically refer to AI models trained on text and can generate textual content. All LLMs are a type of generative AI.
Generative AI is a broad category of AI encompassing a range of multimodal models that can create new content, including text, images, videos, and more.
Both LLMs and generative AI can be built with a transformer architecture (represented with the ‘T’ in ChatGPT). Transformers effectively capture contextual information and long-range dependencies, making them especially helpful for various language tasks. Transformers can also be used to generate images and other types of content.
LLM vs. NLP
Natural language processing (NLP) is a broad field focused on the interaction between computers and language. NLP refers to the ability of computers to interpret, understand, and generate human language. NLP enables text understanding, language translation, speech recognition, and text generation.
LLMs are a subset of NLP and are specific classes of models that include NLP capabilities and enable similar functions. LLMs are also used to improve NLP outputs.
GPT vs. LLM
Generative Pre-trained Transformer (GPT) refers to a family of LLMs created by OpenAI that are built on a transformer architecture. GPT is a specific example of an LLM, but there are other LLMs available (see below for a section on examples of popular large language models).
Artificial general intelligence (AGI)
Artificial general intelligence (AGI) is a type of AI that can understand, learn, and apply knowledge across a range of tasks with performance that is comparable to human intelligence. AGI is also called “human-level AI,” and experts frequently debate whether AGI is achievable and if it’s helpful or harmful to society. LLMs are more focused on specific tasks than AGI.
What are the advantages of large language models?
LLMs present several advantages for data engineers and everyday practitioners, including:
- Accuracy - LLMs, in general, can provide highly accurate outputs for a range of questions and requests. They do present several challenges and limitations, however, as described in the below section.
- Broad range of applications - LLMs can enable innovations across fields, including advertising and marketing, e-commerce, education, finance, healthcare, human resources, and legal.
- Continuous improvement - By design, LLMs become more accurate and can expand in use cases as they are trained and used more frequently.
- Ease of training - It is relatively easy to train and fine-tune an LLM, assuming an organization has the available resources.
- Extensibility - Extensible systems empower organizations to adapt and evolve their applications based on current needs. LLMs make it easier for developers to update applications with new features and functionality.
- Fast learning - LLMs can quickly learn from input data and gradually improve their results with use.
- Flexibility - A single LLM can be applied for different tasks or use cases across an organization.
- Performance - LLMs can typically respond to prompts very quickly.
What are the challenges and limitations of large language models?
Despite the clear advantages of LLMs, users should consider several challenges and limitations.
- Bias - LLMs are only as good as the data they are trained on. LLMs can mirror the biases of the content on which they are trained.
- Consent - There is an ongoing debate about the ethicality of how LLMs are trained and, specifically, how systems are trained on data without a user’s consent and can replicate art, designs, or concepts that are copyrighted.
- Development and operational cost - It costs millions of dollars to build and maintain a private LLM, which is why most teams rely on LLMs offered by companies like Google and OpenAI.
- Glitch tokens - Since 2022, there has been a rise of prompts designed to cause LLMs to malfunction, a concept known as glitch tokens.
- Hallucination - Hallucination refers to the fact that LLMs can generate content that is not factually correct. This is caused when LLMs are trained on imperfect data or lack the fine-tuning to correctly understand the context of information it is pulling from.
- Greenhouse gas emissions - LLMs consume a significant amount of power to train and maintain (including data storage), which has a large environmental impact.
- Security - Organizations should not provide free LLMs with sensitive or confidential data or information, as everything the LLM receives will train its future outputs.
Examples of popular large language models
There are many popular large language models. Some LLMs are open source, meaning users can access the full source code, training data, and architecture. Other LLMs are proprietary, which are owned by a company or entity that can limit how the LLM is used, and only customers can access the LLM.
Each model offers different benefits or advantages, such as being trained on larger datasets, enhanced capabilities for common sense reasoning and mathematics, and differences in coding. While earlier LLMs focused primarily on NLP capabilities, new LLM advancements have introduced multimodal capabilities for both inputs and outputs.
A few popular LLMs include:
- Google BERT (Bidirectional Encoder Representations from Transformers) - Google’s BERT is an open source model that is widely used for NLP. It is one of the earliest LLMs and has been adopted by both research and industry users.
- Google Gemini - Gemini is Google DeepMind’s family of proprietary multimodal LLMs, released in late December 2023. It was created to outperform OpenAI’s GPT models.
- Google PaLM (Pathway Language Model) - PaLM is a proprietary model created by Google. PaLM provides code generation, NLP, natural language generation, translation, and question-answering capabilities.
- Meta LLaMA (Large Language Model Meta AI) - Meta’s LLaMA is a family of autoregressive LLMs. LLaMA 2, released in partnership with Microsoft, is open-source and free for research and commercial use.
- OpenAI GPT (Generative Pre-Trained Transformer) - OpenAI’s GPT family of models were one of the first to introduce the transformer architecture. GPT is a generative language model used for a wide range of NLP applications. Newer GPT models are proprietary, however, versions like GPT-2 have been open sourced and made available to users for free.
- XLNet - XLNet is a pre-training method for NLP built by Carnegie Mellon University and Google to improve NLP tasks.