MarkTechPost@AI 2024年12月14日
Yale Researchers Propose AsyncLM: An Artificial Intelligence System for Asynchronous LLM Function Calling
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

耶鲁大学研究人员提出了AsyncLM,一个用于异步LLM函数调用的系统,旨在提高效率。与同步方法不同,AsyncLM允许LLM在后台执行函数调用时继续生成token,通过中断机制通知LLM函数调用完成,从而避免资源闲置。AsyncLM使用特定领域的语言CML和微调策略,确保中断的无缝集成和依赖的准确处理。在Berkeley Function Calling Leaderboard上的基准测试显示,AsyncLM比同步方法快5.4倍,并支持新型AI应用。

🚀 AsyncLM系统通过异步函数调用,允许LLM并行生成token和执行函数调用,显著提升了资源利用率和降低了延迟。它引入中断机制,使LLM在函数调用返回时接收通知,避免了资源空闲。

🧰 AsyncLM采用特定领域语言CML,使用[CALL]、[INTR]、[TRAP]和[END]等token来结构化函数调用、中断和陷阱,实现LLM与执行器之间的异步交互。这使得LLM可以并行执行任务,无需等待前一个任务完成。

⏱️ 通过微调策略,AsyncLM优化函数调度,最小化任务完成时间,并有效处理中断。它在Berkeley Function Calling Leaderboard上的测试显示,与同步方法相比,任务完成时间缩短了1.6到5.4倍,展示了其在并行化任务和优化token生成周期方面的效率。

LLMs enable interactions with external tools and data sources, such as weather APIs or calculators, through function calls, unlocking diverse applications like autonomous AI agents and neurosymbolic reasoning systems. However, the current synchronous approach to function calling, where LLMs pause token generation until the execution of each call is complete, could be more resource-intensive and efficient. This process blocks LLM inference—one of the most computationally demanding steps—and limits concurrency, as function calls must be completed sequentially. These inefficiencies grow with task complexity, making synchronous function calls impractical for handling multiple or complex operations.

Recent efforts to improve the efficiency of LLM function calling include parallelizing function executions, combining sequential calls, and optimizing function syntax. While these strategies reduce overhead, the fundamental challenge of synchronous interaction persists. Asynchronous function calling has been proposed, enabling LLMs to continue token generation while function calls execute in the background. This approach allows overlapping execution and inference, improving resource utilization and reducing latency. Studies like ReWOO have further explored consolidating function calls into single sessions, offering more efficient alternatives to traditional synchronous methods without relying on specific reasoning strategies, thus enhancing scalability across applications.

Researchers from Yale University propose AsyncLM, a system for asynchronous LLM function calling that enhances efficiency by allowing LLMs to generate and execute function calls concurrently. AsyncLM introduces an interrupt mechanism, enabling the LLM to receive in-flight notifications when a function calls return, thus avoiding resource idling. Using a domain-specific language (CML) and fine-tuning strategies, AsyncLM ensures seamless integration of interrupts and accurate handling of dependencies. Benchmark tests on the Berkeley Function Calling Leaderboard show that AsyncLM achieves up to 5.4× faster task completion than synchronous methods while maintaining accuracy. Additionally, it enables novel AI applications, including human-LLM interactions.

The CML is a domain-specific interface enabling asynchronous interactions between a LLM and an executor. It uses tokens like [CALL], [INTR], [TRAP], [END], and [HEAD] to structure-function calls, interrupts, and traps. LLMs initiate tasks using CML, allowing parallel execution without blocking token generation. Interrupts notify the LLM of completed tasks, while traps temporarily pause generation when dependencies are unmet. AsyncLM employs fine-tuning with simulated datasets to optimize function scheduling, minimize task completion time, and handle interrupts effectively. The system integrates components like token monitors, an executor, and an interrupt manager to manage asynchronous workflows efficiently.

The evaluation focuses on two key aspects: latency and correctness. Latency examines the effectiveness of asynchronous function calling in reducing task completion time compared to synchronous methods, while correctness assesses its impact on generating accurate function calls. The Berkeley Function Calling Leaderboard (BFCL) covered diverse real-world tasks like travel booking and API interactions, with datasets for various scenarios, including a custom multi-step dataset for complex tasks. AsyncLM, tested in local (using Llama models) and cloud (GPT-4o) setups, demonstrated latency reductions up to 5.4× over synchronous methods. Results showed Async’s efficiency in parallelizing tasks and optimizing token generation cycles.

In conclusion, AsyncLM is designed to enable asynchronous function calling for LLMs, allowing the models and function executors to work independently. Unlike traditional synchronous methods, where LLM inference is blocked until a function call is completed, AsyncLM uses an interrupt mechanism to notify the LLM during execution. Key innovations include an in-context interface for asynchronous interactions, fine-tuning LLMs to handle interrupt semantics, and efficient implementation within the inference pipeline. Empirical results on the BFCL show that AsyncLM reduces task completion latency by 1.6×–5.4×, enabling more efficient LLM interactions with tools, data, and humans.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Yale Researchers Propose AsyncLM: An Artificial Intelligence System for Asynchronous LLM Function Calling appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AsyncLM 异步函数调用 LLM CML 人工智能
相关文章