Evaluation

The most important step in making sure our largelanguagemodel model is performing well is by evaluation, for example does our application meet our accuracy standards

This is also extremely helpful when we plan to change the LLM Model or the Embeddings model which generates us all all of our embeddings or even the Vector Stores which is used to retrieve chunks and store our embeddings, using evaluation we can decide if our application performing better or getting worse from whatever changes we have done in our LangChain or any other LLM Framework based application.

One should know what the application is doing at each step of the day, as it is just a bunch of Chains and sequences, by knowing what goes in and out, these evaluation tools just become a tool that helps us visualise and debug the model, this also give us a holistic picture on a lot of data points.

Another Approach is to use LLM’s themselves to evaluate themselves, chains and other models and chains as well