WoFT：结合形式语法验证与结构学习的代码生成新范式

精选理由

想写更少bug的代码？WoFT帮你模型边生成边检查语法，比普通微调少14%错误，而且学会了用语法树当草稿纸。

AI 摘要

WoFT（Weave of Formal Thought）提出一个形式化引擎和约束解码器，基于完整Tree-sitter规范实现语法验证的完备性。通过将GLR解析与推测性词法分析结合，解码器仅保留可扩展为有效程序前缀的子词标记。该方法还采用重加权睡眠（RWS）算法优化重要性加权证据下界（IW-ELBO），训练模型在生成中插入非终结符符号。在Python上对StarCoder2-3B进行微调后，每词元交叉熵相比文本SFT基线降低14.3%。

AI 翻译 · 中文

arXiv cs.LGLarge language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. Whi…

阅读原文