paper

Authors: Chris Cundy, Stefano Ermon

Word Count: 6700

Estimated Read Time: 21-25 minutes

Source Code: Not available (research paper)

Supporting Links: None provided

Summary: The paper proposes a method called SequenceMatch for training autoregressive sequence models like language models. The key ideas are:

Formulate sequence generation as an imitation learning problem to minimize divergences between the model and data distributions. This allows for penalties for out-of-distribution sequences.

Introduce a <backspace> action to allow the model to correct erroneous generations, reducing compounding error.

Minimize alternative divergences like the χ2-divergence instead of maximum likelihood, which leads to improved generations.

Implement SequenceMatch without architectural changes by masking the <backspace> action and recomputing logits efficiently.

They show empirically that SequenceMatch leads to better text generation compared to maximum likelihood training, as measured by MAUVE score, diversity, and fluency.

Evaluation: The SequenceMatch approach seems applicable to improving the quality of generations from large language models. The key ingredients - alternative loss functions, backtracking, and masking techniques - are general enough to apply to other autoregressive models like GANs. The main limitation is the increased training cost due to sampling during training. However, the authors discuss approaches to mitigate this. In summary, the techniques proposed in the paper are promising for developing better generative models for text and other sequential data.