Notes on: Large Language Models as Optimizers by Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2023)

tags: Optimization, Machine learning, Natural language processing
source: (yanglargelanguagemodels2023?)

Summary

This paper proposes Optimization by PROmpting (OPRO), a framework that uses large language models as general-purpose optimizers. Rather than relying on formal problem specifications or gradient-based methods, OPRO describes optimization tasks in natural language and leverages the LLM’s ability to recognize patterns from previously evaluated solutions to iteratively propose better ones.

The core mechanism is a meta-prompt that contains two components: (1) the optimization problem description with task exemplars, and (2) an optimization trajectory of previously generated solutions paired with their objective values, sorted in ascending order. At each step, the LLM generates new candidate solutions, which are evaluated and added to the trajectory for the next iteration. The process terminates when no improvement is found or a step budget is reached.

The authors first demonstrate OPRO on classical optimization problems — linear regression and the traveling salesman problem — showing that LLMs can perform black-box optimization through prompting alone. They then apply OPRO to prompt optimization, where the goal is to find natural language instructions that maximize task accuracy. Using various LLMs (PaLM 2, GPT-3.5-turbo, GPT-4) as both optimizer and scorer, the best OPRO-optimized prompts outperform human-designed prompts by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks.

Key Ideas

LLMs can serve as black-box optimizers when the optimization task is described in natural language, without requiring formal mathematical specifications or gradients.
The meta-prompt design includes an optimization trajectory (solution-score pairs sorted ascending) that enables the LLM to identify patterns and propose improved solutions.
Recency bias in LLMs is exploited by placing higher-scoring solutions at the end of the trajectory, encouraging generation of similar or improved solutions.
Semantically similar instructions can yield drastically different task accuracies, highlighting the sensitivity of LLM performance to prompt phrasing.
OPRO-optimized prompts transfer across datasets within the same domain (e.g., GSM8K prompts work well on MultiArith and AQuA).
Multiple solution generation per step (batch size of 8) improves optimization stability, analogous to mini-batch gradient descent.
The approach scales poorly to large combinatorial problems (e.g., TSP with n>20), as LLMs struggle with complex discrete search spaces.

Comments

OPRO is a compelling demonstration that LLMs can function as meta-optimizers — optimizing not through formal computation but through pattern recognition over optimization trajectories. The most impactful application is prompt optimization, where OPRO discovers instructions that outperform carefully hand-crafted prompts, suggesting that the optimal prompt space is non-trivial and difficult for humans to navigate manually.

A key limitation is scalability: while OPRO works well on small problems, it degrades on larger instances of classical optimization problems. The method is also fundamentally constrained by the LLM’s context window, which limits the length of the optimization trajectory that can be maintained.

The finding that semantically similar prompts yield very different accuracies (Section 5.2.3) is particularly interesting, reinforcing results from prompt sensitivity literature. This motivates automated prompt search as a practical necessity rather than a convenience.

Connections

Related to Optimization because the paper reframes optimization as a natural language task for LLMs
Related to Gradient descent because OPRO is positioned as a gradient-free alternative that uses LLM pattern recognition instead of derivatives
Related to Natural language processing as a novel application of LLMs beyond text generation
Related to Few-shot learning because the meta-prompt leverages in-context exemplars for task understanding
Related to Zero-shot learning because OPRO discovers zero-shot instructions that match or exceed few-shot chain-of-thought prompting
Related to PaLM as one of the primary LLM families evaluated in the paper
Related to GPT as another LLM family (GPT-3.5-turbo, GPT-4) used for optimization

Large Language Models as Optimizers by Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2023)

Summary

Key Ideas

Comments

Connections

Bibliography

Comments

Leave a comment