TechCrunch News 01月20日
AI benchmarking organization criticized for waiting to disclose funding from OpenAI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Epoch AI是一家为AI开发数学基准的非营利组织,近期被曝在接受OpenAI资助的情况下,未及时公开信息,引发AI社区对其客观性的质疑。Epoch AI开发的FrontierMath被OpenAI用于演示其AI模型o3,但许多贡献者在信息公开前并不知晓OpenAI的参与。尽管Epoch AI承认沟通失误,并表示已与OpenAI达成协议,不会使用该测试集训练AI,但其独立验证结果仍未完成,加剧了人们对AI基准测试客观性和透明度的担忧。此事凸显了在AI评估中,确保资源获取与避免利益冲突的挑战。

💰Epoch AI在接受OpenAI资助的情况下,未及时披露信息,引发AI社区对其客观性的质疑。这一事件使得FrontierMath作为客观基准的声誉受到影响。

📝FrontierMath是Epoch AI为衡量AI数学能力而开发的专家级测试,被OpenAI用于演示其即将推出的旗舰AI模型o3。然而,许多贡献者在信息公开前并不知晓OpenAI的参与。

🤝尽管Epoch AI承认沟通失误,并表示与OpenAI达成了口头协议,不会使用FrontierMath的测试集训练AI,但Epoch AI尚未完成对OpenAI的FrontierMath o3结果的独立验证,这进一步加剧了人们对AI基准测试客观性的担忧。

🧐尽管Epoch AI声称拥有一个单独的“保留集”作为独立验证FrontierMath基准结果的额外保障,但其独立验证的缺失仍然引发了人们对OpenAI结果真实性的疑虑。

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI’s mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t informed of OpenAI’s involvement until it was made public.

“The communication about this has been non-transparent,” Meemi wrote. “In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.”

On social media, some users raised concerns that the secrecy could erode FrontierMath’s reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark — a fact Epoch AI didn’t divulge prior to December 20, when o3 was announced.

In a reply to Meemi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, but admitted that Epoch AI “made a mistake” in not being more transparent.

“We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training an AI on FrontierMath would be akin to teaching to the test.) Epoch AI also has a “separate holdout set” that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI has … been fully supportive of our decision to maintain a separate, unseen holdout set,” Besiroglu wrote.

However, muddying the waters, Epoch AI lead mathematician Ellot Glazer noted in a post on Reddit that Epoch AI hasn’t be able to independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is that [OpenAI’s] score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances,” Glazer said. “However, we can’t vouch for them until our independent evaluation is complete.”

The saga is yet another example of the challenge of developing empirical benchmarks to evaluate AI — and securing the necessary resources for benchmark development without creating the perception of conflicts of interest.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Epoch AI OpenAI FrontierMath AI基准测试 透明度
相关文章