AiThority 2024年05月30日
Patronus AI Created a Groundbreaking Automated Evaluation Platform
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

What is The News About?

Patronus AI, established by Anand Kannappan and Rebecca Qian, two seasoned professionals in metamachine learning (ML), has created a groundbreaking automated evaluation platform that claims to detect hallucinations, copyright infringement, and safety issues in LLM outputs. Without the human labor needed by most businesses today, the system achieves model performance scoring, stress testing using adversarial cases, and granular benchmarking through the use of proprietary AI.

There has been a mad dash in Silicon Valley to make use of the generative capabilities of recently emerged powerful LLMs like OpenAI’s GPT-4o and Meta’s Llama 3. However, high-profile model failures have also been on the rise, with news site CNET releasing AI-generated articles plagued with errors and drug development businesses retracting research papers based on LLM-hallucinated compounds.

Read: 10 AI ML In Data Storage Trends To Look Out For In 2024

According to Patronus AI, these blunders in public merely reveal deeper problems with the present generation of LLMs. Previous work by the firm, such as the “CopyrightCatcher” API that came out three months ago and the “FinanceBench” benchmark that came out six months ago, exposes shocking shortcomings in the capacity of top models to correctly respond to inquiries based on facts.

Why Is It Important?

To create its “FinanceBench” benchmark, Patronus used publicly available SEC filings to ask models like GPT-4 financial questions. Regrettably, even after devouring the full yearly report, the top-performing model could only answer 19% of questions correctly. Using Patronus’s new “CopyrightCatcher” API, an additional investigation discovered that open-source LLMs replicated copyrighted text word for word in 44% of the outputs.

Week’s Top Read Insight:10 AI ML In Supply Chain Management Trends To Look Out For In 2024

According to the business, Patronus AI is helping numerous Fortune 500 corporations in sectors such as education, software, automotive, and finance employ LLMs “safely within their organizations.” However, the company chose not to disclose the names of its customers. Patronus intends to expand its research, engineering, and sales teams as well as create new industry standards with the new funding.

Must Read: What is Experience Management (XM)?

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

The post Patronus AI Created a Groundbreaking Automated Evaluation Platform appeared first on AiThority.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

相关文章