The LLM Application Stack

There are 4 main stack in building an Large Language Model (LLM) based Application

  • LLM Stack
  • Search / Memory / Data Stack
  • Reasoning and Action Stack
  • Personalization Stack

LLM Stack

In this stack we focus mainly on the Large Language Models if we are using any Reinforcement Learning (RL) with Human Feedback or are we using any fine-tuning. A another key component would be if you are using any Open Source language model, how would you

  • serve it?
  • run in the cloud / local
    • will you pay per token?
    • will you pay per compute
    • will you go for a 4 bit model or a low resolution model?

Search / Memory / Data Stack

This is the stack that relates to anything data and search like

  • Semantic Search
  • Vector Stores
    What data would you be needing for the application?
    How would you inject into the Language Model
    Another key part is memory for agents, where we can have some sort of memory that we can save and retrieve to pass on to the Agents

Reasoning and Action Stack

this is where we use decision making to perform certain tasks using ReAct or something like that where we make the model to take a certain decision, this stack can have the most growth.

it is important to remember that all the stacks interact with each other, when we use react, sometimes we would like to have a model that is finetuned for this use case, or have a dedicated model for decision making while having another for conversation

Personalization Stack

This is where you would use Prompt Engineering and large language models to create a personality, a brand or a relationship with the user, it could also be tuning the conversation style itself or the use the information of the user

When we are architecting a application or agents, we need to think about what tools they are going to use what reasoning are they going to use Automatic Reasoning and Tool Use (ART) are there any heuristic system the LLM can use?

References