少点错误 2024年07月10日
Fix simple mistakes in ARC-AGI, etc.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ARC-AGI是用于测试通用智能的人工数据集,@ryan_greenblatt提出一种方法,在此基础上作者提出改进想法——Program Dithering,旨在自动修复常见错误以提高准确性。

🎯ARC-AGI是多样化的人工数据集,用于测试通用智能,类似在矩形网格上进行的智商测试。

💻@ryan_greenblatt使用GPT-4o生成大量Python程序,选择在训练示例上有效的程序来解决测试查询,该方法在基准测试的一部分上达到72%的准确率,而人类在该部分的准确率为85%。

💡作者提出Program Dithering的改进想法,通过遍历生成的Python程序,逐个或同时扰动其中的整数常量及数组索引位置,以自动修复常见错误,如极其常见的差一错误,从而提高整体准确性。

🌐如果GPT-4o存在其他常见简单错误,如交换数组索引,该方法可扩展以尝试修复,且像AlphaCode所做的其他任务可能也会发现此方法有用。

Published on July 9, 2024 5:46 PM GMT

ARC-AGI is a diverse artificial dataset that aims to test general intelligence. It's sort of like an IQ test that's played out on rectangular grids.

Last month, @ryan_greenblatt proposed an approach that used GPT-4o to generate about 8000 Python programs per task. It then selected the programs that worked on the "training" examples given, and ran them to actually solve the "test" query.

His approach achieved 72% accuracy on the part of the benchmark that humans have been measured to get 85% accuracy on.

I have an idea for an improvement, on top of this approach. It should be relatively cheap. I don't have time to work on this myself, but I hope someone else runs with it, hence this post.

The motivation for this idea is Ryan's note 

[GPT-4o] makes simple mistakes like off-by-one errors extremely often

My idea is to try to fix them automatically. I call it Program Dithering. You go through the generated Python programs, and try to perturb all integer constants in it, one at a time, and maybe several at a time. 

Thus, if you try two perturbations at a time, a program that looks like this

x = 7...y = x + 3

can become

x = 8...y = x + 2

etc., generating a potentially large number of candidate programs without any extra GPT-4o calls. One could also consider perturbing array indexing locations in a similar way.

If off-by-one errors are extremely common, Program Dithering could fix some or many of them, and improve the overall accuracy. 

Off-by-one errors seem like a general flaw, so fixing them should not be "overfitting" the benchmark.

Generalizations:

If there are other simple mistakes that GPT-4o tends to make, e.g. swapping array indexes, one can extend the approach to try to fix them also.

Other tasks, like what AlphaCode does, might find this useful too.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ARC-AGI Program Dithering GPT-4o 准确性提升
相关文章