Chain of Thought Paradigms in LLMs

Mar 9, 2023

Chain of thought (CoT), breaking a problem down into a series of intermediate reasoning steps, has significantly improved the ability of LLMs to perform complex reasoning. But, most importantly, it is the current state-of-the-art in teaching LLMs how to take action (API calls, RPA, or anything else).

An overview of different strategies.

Few-shot CoT. Provide examples of Question-Answer pairs where the answer is explained "step by step."

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?

A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

From Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

Zero-shot CoT. Prefix the Answer block with "Let's think step by step." to prompt the LLM to complete the output in that format.

Self-consistency CoT. First, prompt the model with CoT, generate multiple completions, and choose the most consistent answer. You can think of this as a self-ensemble method.

Self-consistency Improves Chain of Thought Reasoning in Language Models.

Least-to-Most. Borrowed from an idea in education psychology, generating a list of questions to answer and then sequentially solving the subquestions. Problem reduction followed by problem-solving.

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

ReAct. Given a claim or question, generate a completion identifying an action to take, record the action, and make an observation from the result. Repeat until the task is finished, recognized by calling a special FINISH action.

Thought:
Action:
Observation:
Claim: Princess Mononoke is a film.
Thought 1: I need to search Princess Mononoke and find if it is a film. Action 1: Search[Princess Mononoke] Observation 1: Princess Mononoke ... Thought 2: From the observation, it says that Princess Mononoke is a film. Action 2: Finish[SUPPORTS]
Observation 2: Episode finished

ReAct: Synergizing Reasoning and Acting in Language Models.

Chain of thought prompting techniques have been shown to increase LLM performance significantly, but they still feel incredibly unoptimized.