Automatic Reasoning and Tool Use (ART)

Combining Chain Of Thought (COT) prompting and tools in an interleaved manner has shown to be a strong and robust approach to address many tasks with Large Language Model (LLM)s in Prompt Engineering such an implementation is Agents which are provided with set tools to complete a job.

This approach requires handcrafted examples of task specific demonstrations and carefully scripted interleaving of model generations with tool use. This Research Paper proposes a new framework that uses a frozen LLM to automatically generate intermediate reasoning steps as a program

ART works as follows:

  • given a new task, it select demonstrations of multi-step reasoning and tool use from a task library
  • at test time, it pauses generation whenever external tools are called, and integrate their output before resuming generation
    Source: promptingguide.ai

ART encourages the LLM to generalize from demonstrations to decompose a new task and use tools in appropriate places, in a zero-shot fashion. ART is also extensible as it enables humans to fix mistakes in reasoning steps or add new tools by simply updating the task and tool libraries

ART substantially improves over Few Shot Prompting and Auto Chain Of Thought (COT) on unseen tasks in BigBench and MMLU benchmarks, and exceeds the performance of Chain Of Thought (COT) prompts when human feedback is integrated. This is one of the most sought out capability of of an LLM based application these days.

Below are some benchmarks

This technique feels like it could go well with SAiC (Smart AI Chat Bot) where the possible end goal would be to make a pure conversational LLM based application which can independently create a new Service request and also answer the user questions about the city using Retrieval Augmented Generation (RAG) with Agents in LangChain

References