Large Language Models as General Pattern Machines

Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

2982 words

Read time: 01:11:54

References and Support: https://general-pattern-machines.github.iohttps://arxiv.org/pdf/2307.04721.pdf

This paper investigates the capabilities of large language models (LLMs) as general pattern machines that can perform sequence transformations, completions, and improvements in a zero-shot manner when prompted with examples. The authors demonstrate that LLMs like GPT-3 can solve a subset of problems in the Abstract Reasoning Corpus, a benchmark for spatial reasoning, as well as complete patterns generated by context-free grammars. They also show LLMs can complete periodic functions like sinusoids, which enables completing periodic motions on a robot. By providing trajectories with increasing rewards, LLMs can generate improved trajectories and even learn stabilizing controllers for CartPole. Overall, the results suggest LLMs have inherent capabilities for general pattern manipulation that could be applied to robotics problems, despite not being explicitly trained for such tasks. The authors propose this as an alternative approach compared to task-specific finetuning, and suggest it provides insights into abilities that may transfer from pretraining on textual data. However, deploying LLMs for real robotics systems faces challenges like latency, context limitations, and compute costs.

This paper provides compelling evidence that large language models have built-in capabilities for pattern recognition and manipulation that can be exploited in a zero-shot manner, without task-specific fine-tuning. The experiments on solving spatial reasoning problems, completing periodic functions and robotic motions, and iteratively improving trajectories are practically relevant for potential robotics applications. However, as the authors note, there are still significant barriers to real-world deployment on physical systems due to factors like latency, context limitations, and compute costs. Nonetheless, this provides useful insights into the generalization abilities of large language models, and suggests promising directions for developing more general and adaptable agents by pretraining on diverse data. The proposed framework of prompting pattern transformations, completions, and improvements could be beneficial for sample-efficient learning in simulated environments. Overall the work is technically strong with rigorously designed experiments, and has high applicability for developing large language model based systems.