MarkTechPost@AI 2024年10月13日
Google Cloud and Stanford Researchers Propose CHASE-SQL: An AI Framework for Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

CHASE-SQL是谷歌云和斯坦福研究者提出的框架,用于解决文本到SQL任务中提高LLMs效率的难题。它结合多种技术,生成大量潜在SQL候选并选出最准确的,提高了数据的可访问性和分析效率。

🧐CHASE-SQL利用LLMs的固有知识,通过三种不同方法生成大量潜在SQL候选。第一种是分治策略,将复杂查询分解为更易管理的子查询,使单个LLM能有效处理多个子任务。

🤔第二种方法使用思维链推理模型,模仿数据库引擎的查询执行逻辑,使生成的SQL命令更准确,符合底层数据库的数据处理流程。

📝第三种是实例感知的综合示例生成方法,在少样本学习中为模型提供针对每个测试问题的定制示例,增强LLM对数据库结构和上下文的理解,从而生成更精确的SQL。

🎯CHASE-SQL使用选择代理从生成的SQL查询中识别出最佳候选。该代理通过两两比较多个候选查询,使用微调的LLM确定最正确的查询。

An essential bridge connecting human language and structured query languages (SQL) is text-to-SQL. With its help, users can convert their queries in normal language into SQL commands that a database can comprehend and carry out. This technology makes it easier for users to interface with complex databases, which is especially helpful for those who are not proficient in SQL. This feature improves the accessibility of data, allowing users to extract important features for machine learning applications, generate reports, gain insights, and conduct effective data analysis.

LLMs are used in the broader context of code generation to generate a huge number of potential outputs from which the best is chosen. While producing several candidates is frequently beneficial, the process of choosing the best output can be difficult, and the selection criteria are essential to the caliber of the result. Research has indicated that a notable discrepancy exists between the answers that are most consistently provided and the actual accurate answers, indicating the need for improved selection techniques to improve performance.

In order to tackle the difficulties associated with enhancing the efficiency of LLMs for text-to-SQL jobs, a team of researchers from Google Cloud and Stanford have created a framework called CHASE-SQL, which combines sophisticated techniques to improve the creation and choice of SQL queries. This method uses a multi-agent modeling technique to take advantage of the computational power of LLMs during testing, which helps to improve the process of producing a variety of high-quality, diversified SQL candidates and choosing the most accurate one.

Using three distinct approaches, CHASE-SQL utilizes the innate knowledge of LLMs to generate a large pool of potential SQL candidates. The divide-and-conquer strategy, which breaks down complicated inquiries into smaller, more manageable sub-queries, is the first way. This makes it possible for a single LLM to effectively manage numerous subtasks in a single call, simplifying the processing of inquiries that would otherwise be too complex to answer directly.

The second approach uses a chain-of-thought reasoning model that imitates the query execution logic of a database engine. This method allows the model to produce SQL commands that are more accurate and reflective of the underlying database’s data processing workflow by matching the LLM’s logic with the steps a database engine takes during execution. With the use of this reasoning-based generating technique, SQL queries can be better crafted to align with the intended logic of the user’s request.

An instance-aware synthetic example generation methodology is the third approach. Using this method, the model receives customized examples during few-shot learning that are specific to each test question. By enhancing the LLM’s comprehension of the structure and context of the database it is querying, these examples enable more precise SQL generation. The model is able to generate more efficient SQL commands and navigate the database schema by utilizing examples that are specifically related to each query.

These techniques are used to generate SQL queries, and then CHASE-SQL uses a selection agent to identify the top candidate. Through pairwise comparisons between many candidate inquiries, this agent uses a fine-tuned LLM to determine which query is the most correct. The selection agent evaluates two query pairs and decides which is superior as part of a binary classification approach to the selection process. Choosing the right SQL command from the generated possibilities is more likely with this strategy since it is more reliable than other selection strategies.

In conclusion, CHASE-SQL sets a new benchmark for text-to-SQL speed by producing more accurate SQL queries than previous approaches. In particular, CHASE-SQL has obtained top-tier execution accuracy ratings of 73.0% on the BIRD Text-to-SQL dataset test set and 73.01% on the development set. These outcomes have established CHASE-SQL as the top method on the dataset’s leaderboard, proving how well it can connect SQL with plain language for intricate database interactions.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

[Upcoming Event- Oct 17 202] RetrieveX – The GenAI Data Retrieval Conference (Promoted)

The post Google Cloud and Stanford Researchers Propose CHASE-SQL: An AI Framework for Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CHASE-SQL 文本到SQL LLMs 数据库交互
相关文章