MarkTechPost@AI 04月03日 15:40
Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Snowflake提出的ExCoT框架,通过结合思维链推理(CoT)和迭代偏好优化,显著提升了开源大型语言模型(LLMs)在文本转SQL任务上的表现。ExCoT创新性地仅依赖执行准确性反馈,摒弃了外部奖励模型和人工标注,通过离策略和在策略的DPO优化,逐步提高模型生成SQL语句的准确性。实验结果表明,ExCoT在BIRD和Spider数据集上均取得了显著的性能提升,成为单模型评估中的领先方法。

💡ExCoT 框架的核心在于结合了CoT推理和迭代偏好优化。它使用CoT推理将复杂查询分解为更简单的子查询,并通过执行结果进行验证,从而提升SQL生成的准确性。

⚙️ExCoT采用两阶段优化流程。首先,通过离策略DPO生成候选CoT数据,为监督微调提供基础。随后,模型通过在策略DPO迭代生成和优化CoT数据,依据执行正确性的反馈逐步提升准确性。

✅ExCoT 依赖于执行准确性反馈,无需外部奖励模型或人工标注。这种方法通过比较生成的查询结果与真实结果,为偏好学习提供明确信号,从而实现模型的持续改进。

📈实验结果显示,ExCoT在多个基准测试中均表现出色。例如,在LLaMA-3.1 70B模型上,ExCoT在BIRD开发集上的执行准确率从57.37%提升至68.51%,在Spider测试集上从78.81%提升至86.59%。

Text-to-SQL translation, the task of transforming natural language queries into structured SQL statements, is essential for facilitating user-friendly database interactions. However, the task involves significant complexities, notably schema linking, handling compositional SQL syntax, and resolving ambiguities in user queries. While Large Language Models (LLMs) have shown robust capabilities across various domains, the efficacy of structured reasoning techniques such as Chain-of-Thought (CoT) within text-to-SQL contexts remains limited. Prior attempts employing zero-shot CoT or Direct Preference Optimization (DPO) without structured reasoning yielded marginal improvements, indicating the necessity for more rigorous methodologies.

Snowflake introduces ExCoT, a structured framework designed to optimize open-source LLMs through the combination of CoT reasoning and iterative preference optimization, specifically utilizing off-policy and on-policy DPO guided exclusively by execution accuracy feedback. ExCoT dispenses with external reward models and human annotations, relying instead on internally generated reasoning steps and execution results. The method operates in two principal phases: initially, it generates candidate CoT data validated through off-policy DPO, forming the basis for supervised fine-tuning. Subsequently, the model iteratively generates and refines CoT data via on-policy DPO, incrementally improving accuracy through feedback derived from execution correctness.

ExCoT employs detailed CoT reasoning, particularly adopting a divide-and-conquer strategy wherein complex queries are decomposed into simpler sub-queries. Each sub-query is analyzed and independently resolved before being integrated into a coherent final query. This structured decomposition enables the model to manage the complexity and nested structures common in SQL operations more effectively. Execution-based verification serves as the core mechanism for correctness evaluation, where generated queries are validated by comparing their execution outputs against ground-truth results. Incorrect and correct queries are systematically paired, providing explicit signals for preference-based learning. The iterative refinement in the on-policy DPO phase progressively enhances the model’s reasoning accuracy.

Experimental evaluation of ExCoT demonstrated significant improvements in execution accuracy. Specifically, with the LLaMA-3.1 70B model, ExCoT elevated execution accuracy on the BIRD development set from 57.37% to 68.51%, and increased Spider test set performance from 78.81% to 86.59%. Comparable performance enhancements were recorded with the Qwen-2.5-Coder 32B model. These results position ExCoT as a leading approach in single-model evaluations for these benchmarks, surpassing established methods such as XiYanSQL and proprietary models including OpenAI variants. Notably, the improvements consistently maintained high query validity rates (exceeding 98%), confirming enhancements in semantic correctness alongside syntactic precision.

In conclusion, ExCoT represents a methodical advancement in structured reasoning optimization for open-source LLMs applied to text-to-SQL tasks. By integrating structured CoT reasoning with preference optimization, guided solely by execution-based feedback, ExCoT effectively addresses limitations identified in previous methods. Its iterative refinement capability ensures continuous improvement without dependence on external reward structures or manual annotations. Further research might explore extending this framework to more intricate schema environments and additional structured reasoning tasks, thus broadening the applicability and reliability of LLMs in structured query generation contexts.


Check out the Paper, GitHub Page and Details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

The post Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ExCoT 文本转SQL LLMs CoT推理 偏好优化
相关文章