Nanonets 2024年11月26日
How to automate Accounts Payable using LLM-Powered Multi Agent Systems
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了利用大型语言模型(LLM)驱动的多智能体系统(MAS)来实现应付(AP)自动化的潜力。传统自动化方法在处理需要上下文理解的复杂动态任务时往往力不从心,而MAS通过将AI与专门的任务分配相结合,提供可扩展、自适应且类似人类的解决方案。文章介绍了MAS的核心组件和优势,包括分离关注点、模块化、多元视角和可重用性,并深入探讨了MAS的架构和组件,例如智能体、连接、编排、人机交互和工具与资源。此外,文章还比较了LangGraph、AutoGen和CrewAI等构建MAS的框架,并以CrewAI为例,演示了如何构建一个包含发票数据提取、验证和支付处理等智能体的AP自动化系统,最终实现更智能、更高效的应付流程。

🤔 **智能体(Agents)**:MAS的核心是多个智能体,每个智能体都承担特定角色,例如发票数据提取、验证和支付处理。这些智能体独立工作,利用LLM理解上下文、做出决策并执行任务,从而实现任务的分工与协作。

🤝 **连接(Connections)**:智能体之间需要相互通信和信息共享,以确保协作的顺畅进行,并最大程度地减少延迟。连接机制定义了智能体之间如何交互,例如顺序、层次或双向交互。

⚙️ **编排(Orchestration)**:编排组件负责管理智能体之间的交互方式,优化工作流程并确保任务按计划进行。它决定了智能体如何协同工作,以实现最终目标,例如根据业务规则和流程定义智能体的执行顺序。

🧑‍💼 **人机交互(Human Interaction)**:人类在MAS中扮演着重要的角色,他们可以监督系统的运行,验证结果,并在复杂情况下做出决策,从而确保系统的安全性与可靠性。

🧰 **工具与资源(Tools and Resources)**:智能体需要利用各种工具和资源来提高效率和能力,例如数据库进行数据验证,API访问外部数据等。LLM作为系统的核心,为智能体提供高级的理解能力和定制化的输出。

📊 **框架(Frameworks)**:文章比较了LangGraph、AutoGen和CrewAI等框架,这些框架提供构建和部署MAS的工具和环境。CrewAI因其易用性和易于设置的特点,被选作AP自动化系统的构建框架。

Introduction

In today’s fast-paced business landscape, organizations are increasingly turning to AI-driven solutions to automate repetitive processes and enhance efficiency. Accounts Payable (AP) automation, a critical area in financial management, is no exception. Traditional automation methods often fall short when dealing with complex, dynamic tasks requiring contextual understanding.

This is where Large Language Model (LLM)-powered multi-agent systems step in, combining the power of AI with specialized task allocation to deliver scalable, adaptive, and human-like solutions.

In this blog, we’ll:

By the end of this blog, you’ll understand how to code your own AP agent for your own invoice use-case. But before we jump ahead, let's understand what are LLM based AI agents and some things about multi-agent systems.

AI Agents

Agents are systems or entities that perform tasks autonomously or semi-autonomously, often by interacting with their environment or other systems. They are designed to sense, reason, and act in a way that achieves a specific goal or set of goals.

LLM-powered AI agents use large language models as their core to understand, reason and generate texts. They excel at understanding context, adapting to diverse data, and handling complex tasks. They’re scalable and efficient, making them suitable for automating repetitive tasks like AP automation. However LLMs cannot handle everything. As agents can be arbitrarily complex, there are additional system components such as IO sanity, memory and other specialized tools that are needed as part of the system. Multi-Agent Systems (MAS) come into picture, orchestrating and distributing tasks among specialized single-purpose agents and tools to enhance dev-experience, efficiency and accuracy.

Multi-Agent Systems (MAS): Leveraging Collaboration for Complex Tasks

A Multi-Agent System (MAS) works like a team of specialists, each with a specific role, collaborating toward a common goal. Powered by LLMs, agents refine their outputs in real-time—for instance, one writes code while another reviews it. This teamwork boosts accuracy and reduces biases by enabling cross-checks. Benefits of Multi-Agent Designs

Here are some advantages of using MAS that cannot be easily replicated with other patterns

Separation of ConcernsAgents focus on specific tasks, enhancing effectiveness and delivering specialized results.
ModularityMAS simplifies complex problems into manageable tasks, allowing easy troubleshooting and optimization.
Diversity of PerspectivesVarious agents provide distinct insights, improving output quality and reducing bias.
ReusabilityDeveloped agents can be reconfigured for different applications, creating a flexible ecosystem.

Let's now look at the architecture and various components which are the building blocks of a multi agent system.

Core Components of Multi-Agent Systems

The architecture of MAS consists of several critical components to ensure that agents work cohesively. Below are the key components that makes up an MAS:

    Agents: Each agent has a specific role, goal, and set of instructions. They work independently, leveraging LLMs for understanding, decision-making, and task execution.Connections: These pathways let agents share information and stay aligned, ensuring smooth collaboration with minimal delays.Orchestration: This manages how agents interact—whether sequentially, hierarchically, or bidirectionally—to optimize workflows and keep tasks on track.Human Interaction: Humans often oversee MAS, stepping in to validate results or make decisions in tricky situations, adding an extra layer of safety and quality.Tools and Resources: Agents use tools like databases for validation or APIs to access external data, boosting their efficiency and capabilities.LLM: The LLM acts as the system’s core, powering agents with advanced comprehension and tailored outputs based on their roles.

Below you can see how all the components are interconnected:

Core components of a Multi Agent System.

There are several frameworks that enable us to effectively write code and setup Multi Agent Systems. Now let's discuss a few of these frameworks.


Frameworks for Building Multi-Agent Systems with LLMs

To effectively manage and deploy MAS, several frameworks have emerged, each with its unique approach to orchestrating LLM-powered agents. In below table we can see the 3 most popular frameworks and how they are different.

CriteriaLangGraphAutoGenCrewAI
Ease of UsageModerate complexity; requires understanding of graph theoryUser-friendly; conversational approach simplifies interactionStraightforward setup; designed for production use
Multi-Agent SupportSupports both single and multi-agent systemsStrong multi-agent capabilities with flexible interactionsExcels in structured role-based agent design
Tool CoverageIntegrates with a wide range of tools via LangChainSupports various tools including code executionOffers customizable tools and integration options
Memory SupportAdvanced memory features for contextual awarenessFlexible memory management optionsSupports multiple memory types (short-term, long-term)
Structured OutputStrong support for structured outputsGood structured output capabilitiesRobust support for structured outputs
Ideal Use CaseBest for complex task interdependenciesGreat for dynamic, customizable agent interactionsSuitable for well-defined tasks with clear roles

Now that we have a high level knowledge about different multi-agent systems frameworks, we'll be choosing crewai for implementing our own AP automation system because it is straightforward to use and easy to setup.

Accounts Payable (AP) Automation

We'll focus on building an AP system in this section. But before that let's also understand what AP automation is and why it is needed.

Overview of AP Automation

AP automation simplifies managing invoices, payments, and supplier relationships by using AI to handle repetitive tasks like data entry and validation. It speeds up processes, reduces errors, and ensures compliance with detailed records. By streamlining workflows, it saves time, cuts costs, and strengthens vendor relationships, turning Accounts Payable into a smarter, more efficient process.

Typical Steps in AP

    Invoice Capture: Use OCR or AI-based tools to digitize and capture invoice data.Invoice Validation: Automatically verify invoice details (e.g., amounts, vendor details) using set rules or matching against Purchase Orders (POs).Data Extraction & Categorization: Extract specific data fields (vendor name, invoice number, amount) and categorize expenses to relevant accounts.Approval Workflow: Route invoices to the correct approvers, with customizable approval rules based on vendor or amount.Matching & Reconciliation: Automate 2-way or 3-way matching (invoice, PO, and receipt) to check for discrepancies.Payment Scheduling: Schedule and process payments based on payment terms, early payment discounts, or other financial policies.Reporting & Analytics: Generate real-time reports for cash flow, outstanding payables, and vendor performance.Integration with ERP/Accounting System: Sync with ERP or accounting software for seamless financial records management.
Here's a typical flow of AP automation along with technology that's used in each step.

Implementing AP Automation

As we've learnt what is a multi-agent system and what is AP, it's time to implement our learnings.

Here are the agents that we’ll be creating and orchestrating using crew.ai -

    Invoice Data Extraction Agent: Extracts key invoice details (vendor name, amount, due date) using multimodal capability of GPT-4o for OCR and data parsing.Validation Agent: Ensures accuracy by verifying extracted data, checking for matching details, and flagging discrepancies.Payment Processing Agent: Prepares payment requests, validates them, and initiates payment execution.

This setup delegates tasks efficiently, with each agent focusing on a specific step, enhancing reliability and overall workflow performance.

Here’s a visualisation of how the flow will look like.

Here’s a visualisation of how the flow will look like.

Code:

First we’ll start by installing the Crew ai package. Install the 'crewai' and 'crewai_tools' packages using pip. 

!pip install crewai crewai_tools

Next we’ll import necessary classes and modules from the 'crewai' and 'crewai_tools' packages.

from crewai import Agent, Crew, Process, Taskfrom crewai.project import CrewBase, agent, crew, taskfrom crewai_tools import VisionTool

Next, import the 'os' module for interacting with the operating system. Set the OpenAI API key and model name as environment variables. Define the URL of the image to be processed.

import osos.environ["OPENAI_API_KEY"] = "YOUR OPEN AI API KEY"os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'image_url = 'https://cdn.create.microsoft.com/catalog-assets/en-us/fc843d45-e3c4-49d5-8cc6-8ad50ef1c2cd/thumbnails/616/simple-sales-invoice-modern-simple-1-1-f54b9a4c7ad8.webp'

Import the VisionTool class from crewai_tools. This tool uses multimodal functionality of GPT-4 to process the invoice image.

from crewai_tools import VisionToolvision_tool = VisionTool()

Now we’ll be creating the agents that we need for our task.

image_text_extractor = Agent(   role="Image Text Extraction Specialist",   backstory="You are an expert in text extraction, specializing in using AI to process and analyze textual content from images, specifically from PDF files which are invoices that need to be paid. Make sure you use the tools provided.",   goal= "Extract and analyze text from images efficiently using AI-powered tools. You should get the text from {image_url}",   allow_delegation=False,   verbose=True,   tools=[vision_tool],   max_iter=1)invoice_data_analyst = Agent(   role="Invoice Data Validation Analyst",   goal="Validate the data extracted from the invoice. In case the conditions are not met, you should return the error message.",   backstory="You're a meticulous analyst with a keen eye for detail. You're known for your ability to read through the invoice data and validate the data based on the conditions provided.",   max_iter=1,   allow_delegation=False,   verbose=True,)payment_processor = Agent(   role="Payment Processing Specialist",   goal="Process the payment for the invoice if the payment is approved.",   backstory="You're a payment processing specialist who is responsible for processing the payment for the invoice if the payment is approved.",   max_iter=1,   allow_delegation=False,   verbose=True,)

Defining Agents, which are the personas in the multi-agent system

Now we’ll be defining the tasks that these agents will be performing.

Define three tasks which our agents will perform:

text_extraction_task = Task(   agent=image_text_extractor,   description=(       "Extract text from the provided image file. Ensure that the extracted text is accurate and complete, "       "and ready for any further analysis or processing tasks. The image file provided may contain various text elements, "       "so it's crucial to capture all readable text. The image file is an invoice, and we need to extract the data from it to process the payment."   ),   expected_output="A string containing the full text extracted from the image.")# We can define the conditions which we want the agent to validate for payment processing.# Currently I have created 2 conditions which should be met in the invoice before it's paid.invoice_data_validation_task = Task(   agent=invoice_data_analyst,   description=(       "Validate the data extracted from the invoice and ensure that these 2 conditions are met:\n"       "1. Total due should be between 0 and 2000.00 dollars.\n"       "2. The date of invoice should be after Dec 2022."   ),   expected_output=(       "If both conditions are met, return 'Payment approved'.\n"       "Else, return 'Payment not approved' followed by the error string according to the unmet condition, which can be either\n"   ))payment_processing_task = Task(   agent=payment_processor,   description=(       "Process the payment for the invoice if the payment is approved. In case there is an error, return 'Payment not approved'."   ),   expected_output="A confirmation message indicating that the payment has been processed successfully: 'Payment processed successfully'.")

Tasks performed by each agent

Once we have created agents and the tasks that these agents will be performing, we’ll initialise our Crew, consisting of the agents and the tasks that we need to complete. The process will be sequential, i.e each task will be completed in the order they are set.

# Note: If any changes are made in the agents and/or tasks, we need to re-run this cell for changes to take effect.crew = Crew(   agents=[image_text_extractor, invoice_data_analyst, payment_processor],   tasks=[text_extraction_task, invoice_data_validation_task, payment_processing_task],   process=Process.sequential,   verbose=True)

Finally, we’ll be running our crew and storing the result in the “result” variable. Also we’ll be passing the invoice image url, which we need to process.

result = crew.kickoff(inputs={"image_url": image_url})

Here are some sample outputs for different scenarios/conditions for invoice validation:

Sample approved invoice
Case 1: All the validation conditions met and invoice processed successfully by the AI agent.
Case 2: Invoice total due greater than the total due limit. Payment not approved by the AI agent.
Case 3: Invoice date before the allowed date. Payment not approved by the AI agent.

If you want to try the above example, here’s a Colab notebook for the same. Just set your OpenAI API and experiment with the flow yourself!


Sounds simple? There are a few challenges that we've overlooked while building this small proof of concept.

Challenges of Implementing AI in AP Automation

    Integration with Existing Systems: Integrating AI with existing ERP systems can create data silos and disrupt workflows if not done properly.Employee Resistance: Adapting to automation may face pushback; training and clear communication are key to easing the transition.Data Quality: AI depends on clean, consistent data. Poor data quality leads to errors, making source accuracy essential.Initial Investment: While cost-effective long-term, the upfront investment in software, training, and integration can be significant.

Nanonets is an enterprise-grade tool designed to eliminate all the hassles for you and provide a seamless experience, effortlessly managing the complexities of accounts payable. Click below to schedule a free demo with Nanonets' Automation Experts.

Conclusion

In summary, LLM-powered multi-agent systems provide a scalable and intelligent solution for automating tasks like Accounts Payable, combining specialized roles and advanced comprehension to streamline workflows.

We've learned the paradigms behind multi-agent systems, and learnt how to code a simple crew.ai application to streamline invoices. Increasing the components in the system should be as easy as generating more agents and tasks, and orchestrating with the right process.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 多智能体系统 应付自动化 LLM 流程自动化
相关文章