The LLM Application Stack
There are 4 main stack in building an Large Language Model (LLM) based Application
- LLM Stack
- Search / Memory / Data Stack
- Reasoning and Action Stack
- Personalization Stack
LLM Stack
In this stack we focus mainly on the Large Language Models if we are using any Reinforcement Learning (RL) with Human Feedback or are we using any fine-tuning. A another key component would be if you are using any Open Source language model, how would you
- serve it?
- run in the cloud / local
- will you pay per token?
- will you pay per compute
- will you go for a 4 bit model or a low resolution model?
Search / Memory / Data Stack
This is the stack that relates to anything data and search like
- Semantic Search
- Vector Stores
What data would you be needing for the application?
How would you inject into the Language Model
Another key part is memory for agents, where we can have some sort of memory that we can save and retrieve to pass on to the Agents
Reasoning and Action Stack
this is where we use decision making to perform certain tasks using ReAct or something like that where we make the model to take a certain decision, this stack can have the most growth.
it is important to remember that all the stacks interact with each other, when we use react, sometimes we would like to have a model that is finetuned for this use case, or have a dedicated model for decision making while having another for conversation
Personalization Stack
This is where you would use Prompt Engineering and large language models to create a personality, a brand or a relationship with the user, it could also be tuning the conversation style itself or the use the information of the user
When we are architecting a application or agents, we need to think about what tools they are going to use what reasoning are they going to use Automatic Reasoning and Tool Use (ART) are there any heuristic system the LLM can use?