Patronus AI Created a Groundbreaking Automated Evaluation Platform

What is The News About?

Patronus AI, established by Anand Kannappan and Rebecca Qian, two seasoned professionals in metamachine learning (ML), has created a groundbreaking automated evaluation platform that claims to detect hallucinations, copyright infringement, and safety issues in LLM outputs. Without the human labor needed by most businesses today, the system achieves model performance scoring, stress testing using adversarial cases, and granular benchmarking through the use of proprietary AI.

There has been a mad dash in Silicon Valley to make use of the generative capabilities of recently emerged powerful LLMs like OpenAI’s GPT-4o and Meta’s Llama 3. However, high-profile model failures have also been on the rise, with news site CNET releasing AI-generated articles plagued with errors and drug development businesses retracting research papers based on LLM-hallucinated compounds.

Read: 10 AI ML In Data Storage Trends To Look Out For In 2024

According to Patronus AI, these blunders in public merely reveal deeper problems with the present generation of LLMs. Previous work by the firm, such as the “CopyrightCatcher” API that came out three months ago and the “FinanceBench” benchmark that came out six months ago, exposes shocking shortcomings in the capacity of top models to correctly respond to inquiries based on facts.

Why Is It Important?

To create its “FinanceBench” benchmark, Patronus used publicly available SEC filings to ask models like GPT-4 financial questions. Regrettably, even after devouring the full yearly report, the top-performing model could only answer 19% of questions correctly. Using Patronus’s new “CopyrightCatcher” API, an additional investigation discovered that open-source LLMs replicated copyrighted text word for word in 44% of the outputs.

Week’s Top Read Insight:10 AI ML In Supply Chain Management Trends To Look Out For In 2024

According to the business, Patronus AI is helping numerous Fortune 500 corporations in sectors such as education, software, automotive, and finance employ LLMs “safely within their organizations.” However, the company chose not to disclose the names of its customers. Patronus intends to expand its research, engineering, and sales teams as well as create new industry standards with the new funding.

Must Read: What is Experience Management (XM)?

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

The post Patronus AI Created a Groundbreaking Automated Evaluation Platform appeared first on AiThority.

What is The News About?

Why Is It Important?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签