How Planview built a scalable AI Assistant for portfolio and project management using Amazon Bedrock

This post is co-written with Lee Rehwinkel from Planview.

Businesses today face numerous challenges in managing intricate projects and programs, deriving valuable insights from massive data volumes, and making timely decisions. These hurdles frequently lead to productivity bottlenecks for program managers and executives, hindering their ability to drive organizational success efficiently.

Planview, a leading provider of connected work management solutions, embarked on an ambitious plan in 2023 to revolutionize how 3 million global users interact with their project management applications. To realize this vision, Planview developed an AI assistant called Planview Copilot, using a multi-agent system powered by Amazon Bedrock.

Developing this multi-agent system posed several challenges:

Reliably routing tasks to appropriate AI agents Accessing data from various sources and formats Interacting with multiple application APIs Enabling the self-serve creation of new AI skills by different product teams

To overcome these challenges, Planview developed a multi-agent architecture built using Amazon Bedrock. Amazon Bedrock is a fully managed service that provides API access to foundation models (FMs) from Amazon and other leading AI startups. This allows developers to choose the FM that is best suited for their use case. This approach is both architecturally and organizationally scalable, enabling Planview to rapidly develop and deploy new AI skills to meet the evolving needs of their customers.

This post focuses primarily on the first challenge: routing tasks and managing multiple agents in a generative AI architecture. We explore Planview’s approach to this challenge during the development of Planview Copilot, sharing insights into the design decisions that provide efficient and reliable task routing.

We describe customized home-grown agents in this post because this project was implemented before Amazon Bedrock Agents was generally available. However, Amazon Bedrock Agents is now the recommended solution for organizations looking to use AI-powered agents in their operations. Amazon Bedrock Agents can retain memory across interactions, offering more personalized and seamless user experiences. You can benefit from improved recommendations and recall of prior context where required, enjoying a more cohesive and efficient interaction with the agent. We share our learnings in our solution to help you understanding how to use AWS technology to build solutions to meet your goals.

Solution overview

Planview’s multi-agent architecture consists of multiple generative AI components collaborating as a single system. At its core, an orchestrator is responsible for routing questions to various agents, collecting the learned information, and providing users with a synthesized response. The orchestrator is managed by a central development team, and the agents are managed by each application team.

The orchestrator comprises two main components called the router and responder, which are powered by a large language model (LLM). The router uses AI to intelligently route user questions to various application agents with specialized capabilities. The agents can be categorized into three main types:

Help agent

Retrieval Augmented Generation

Data agent

Action agent

After the agents have processed the questions and provided their responses, the responder, also powered by an LLM, synthesizes the learned information and formulates a coherent response to the user. This architecture allows for a seamless collaboration between the centralized orchestrator and the specialized agents, which provides users an accurate and comprehensive answers to their questions. The following diagram illustrates the end-to-end workflow.

Technical overview

Planview used key AWS services to build its multi-agent architecture. The central Copilot service, powered by Amazon Elastic Kubernetes Service (Amazon EKS), is responsible for coordinating activities among the various services. Its responsibilities include:

Amazon Relational Database Service

The router and responder are AWS Lambda functions that interact with Amazon Bedrock. The router considers the user’s question and chat history from the central Copilot service, and the responder considers the user’s question, chat history, and responses from each agent.

Application teams manage their agents using Lambda functions that interact with Amazon Bedrock. For improved visibility, evaluation, and monitoring, Planview has adopted a centralized prompt repository service to store LLM prompts.

Agents can interact with applications using various methods depending on the use case and data availability:

Existing application APIs

Amazon Athena or traditional SQL data stores

Amazon Athena

Amazon Neptune for graph data

Amazon Neptune

Amazon OpenSearch Service for document RAG

Amazon OpenSearch Service

The following diagram illustrates the generative AI assistant architecture on AWS.

Router and responder sample prompts

The router and responder components work together to process user queries and generate appropriate responses. The following prompts provide illustrative router and responder prompt templates. Additional prompt engineering would be required to improve reliability for a production implementation.

First, the available tools are described, including their purpose and sample questions that can be asked of each tool. The example questions help guide the natural language interactions between the orchestrator and the available agents, as represented by tools.

tools = '''<tool><toolName>applicationHelp</toolName><toolDescription>Use this tool to answer application help related questions.Example questions:How do I reset my password?How do I add a new user?How do I create a task?</toolDescription></tool><tool><toolName>dataQuery</toolName><toolDescription>Use this tool to answer questions using application data.Example questions:Which tasks are assigned to me?How many tasks are due next week?Which task is most at risk?</toolDescription></tool>

Next, the router prompt outlines the guidelines for the agent to either respond directly to user queries or request information through specific tools before formulating a response:

system_prompt_router = f'''<role>Your job is to decide if you need additional information to fully answer the User's questions.You achieve your goal by choosing either 'respond' or 'callTool'.You have access to your chat history in <chatHistory></chatHistory> tags.You also have a list of available tools to assist you in <tools></tools> tags.</role><chatHistory>{chatHistory}</chatHistory><tools>{tools}</tools><rules>- If the chat history contains sufficient information to answer the User's questions, choose the 'respond' action.- To gather more information before responding, choose the 'callTool' action.- You many only choose from the tools in the <tools></tools> tags.- If no tool can assist with the question, choose the 'respond' action.- Place your chosen action within <action></action> tags.- When you chose the 'callTool' action, provide the <toolName> and the <toolQuestion> youwould like to ask.- Your <toolQuestion> should be verbose and avoid using pronouns.- Start by providing your step-by-step thinking in <thinking></thinking> tags.- Then you will give your answer in <answer></answer> tags.- Your answer should follow the format of one of these three examples:When choosing the 'respond' action, your answer should follow the below example EXACTLY:<example><answer><action>respond</action></answer></example>When choosing the 'callTool' action for a single Tool:<example><answer><action>callTool</action><tools><tool><toolName>applicationHelp</toolName><toolQuestion>How do I reset my password?</toolQuestion></tool></tools></answer></example>Executing the above, would produce the following result:You can also call multiple Tools using this format:<example><answer><action>callTool</action><tools><tool><toolName>dataQuery</toolName><toolQuestion>How many tasks are assigned to me?</toolQuestion></tool><tool><toolName>applicationHelp</toolName><toolQuestion>How do I add a new task?</toolQuestion></tool></tools></answer></example></rules>'''

The following is a sample response from the router component that initiates the dataQuery tool to retrieve and analyze task assignments for each user:

<thinking>To determine who has the most tasks assigned, I will need to query the application data. The "dataQuery" tool seems most appropriate for this question.</thinking><answer><action>callTool</action><tools>    <tool>        <toolName>dataQuery</toolName>        <toolQuestion>Which user has the most tasks currently assigned to them?   </toolQuestion>    </tool></tools></answer>

The following is a sample response from the responder component that uses the dataQuery tool to fetch information about the user’s assigned tasks. It reports that the user has five tasks assigned to them.

<thinking>Based on the chat history, I previously called the dataQuery tool to ask "How many tasks are currently assigned to the user?". The tool responded that the user has 5 tasks assigned to them.</thinking><answer>According to the data I queried previously, you have 5 tasks assigned to you.</answer>

Model evaluation and selection

Evaluating and monitoring generative AI model performance is crucial in any AI system. Planview’s multi-agent architecture enables assessment at various component levels, providing comprehensive quality control despite the system’s complexity. Planview evaluates components at three levels:

Prompts

AI agents

AI system

The following figure illustrates the evaluation framework for prompts and scoring.

To conduct these evaluations, Planview uses a set of carefully crafted test questions that cover typical user queries and edge cases. These evaluations are performed during the development phase and continue in production to track the quality of responses over time. Currently, human evaluators play a crucial role in scoring responses. To aid in the evaluation, Planview has developed an internal evaluation tool to store the library of questions and track the responses over time.

To assess each component and determine the most suitable Amazon Bedrock model for a given task, Planview established the following prioritized evaluation criteria:

Quality of response

Time of response

Scale

Cost of response

Based on these criteria and the current use case, Planview selected Anthropic’s Claude 3 Sonnet on Amazon Bedrock for the router and responder components.

Results and impact

Over the past year, Planview Copilot’s performance has significantly improved through the implementation of a multi-agent architecture, development of a robust evaluation framework, and adoption of the latest FMs available through Amazon Bedrock. Planview saw the following results between the first generation of Planview Copilot developed mid-2023 and the latest version:

Accuracy

Response time

Load testing

Cost-efficiency

Time-to-market

Conclusion

In this post, we explored how Planview was able to develop a generative AI assistant to address complex work management process by adopting the following strategies:

Modular development

Evaluation framework

Amazon Bedrock integration

Planview is migrating to Amazon Bedrock Agents, which enables the integration of intelligent autonomous agents within their application ecosystem. Amazon Bedrock Agents automate processes by orchestrating interactions between foundation models, data sources, applications, and user conversations.

As next steps, you can explore Planview’s AI assistant feature built on Amazon Bedrock and stay updated with new Amazon Bedrock features and releases to advance your AI journey on AWS.

About Authors

Sunil Ramachandra is a Senior Solutions Architect enabling hyper-growth Independent Software Vendors (ISVs) to innovate and accelerate on AWS. He partners with customers to build highly scalable and resilient cloud architectures. When not collaborating with customers, Sunil enjoys spending time with family, running, meditating, and watching movies on Prime Video.

Benedict Augustine is a thought leader in Generative AI and Machine Learning, serving as a Senior Specialist at AWS. He advises customer CxOs on AI strategy, to build long-term visions while delivering immediate ROI.As VP of Machine Learning, Benedict spent the last decade building seven AI-first SaaS products, now used by Fortune 100 companies, driving significant business impact. His work has earned him 5 patents.

Lee Rehwinkel is a Principal Data Scientist at Planview with 20 years of experience in incorporating AI & ML into Enterprise software. He holds advanced degrees from both Carnegie Mellon University and Columbia University. Lee spearheads Planview’s R&D efforts on AI capabilities within Planview Copilot. Outside of work, he enjoys rowing on Austin’s Lady Bird Lake.

Solution overview

Technical overview

Router and responder sample prompts

Model evaluation and selection

Results and impact

Conclusion

About Authors

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签