The Rundown AI -每日精选 前天 15:33
Google 'officially' bags IMO gold
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌DeepMind团队的Gemini大模型在国际数学奥林匹克竞赛(IMO)中取得了官方金牌水平的成绩,紧随OpenAI的类似声明。此次竞赛中,Gemini在4.5小时的时限内,成功解答了六道涵盖代数、组合学、几何和数论的题目中的五道,总分达到35/42,展现了其在复杂数学推理方面的强大能力。与去年不同,今年的Gemini完全使用自然语言进行端到端处理。虽然OpenAI也声称取得了同等分数,但其模型并未与IMO官方合作进行评分。谷歌的这一成就标志着AI在数学领域正在快速进步,预示着未来AI可能在解决未解数学问题上扮演更重要角色。与此同时,文章还提到了阿里巴巴Qwen3开源模型的强大表现以及如何构建AI邮件助手等AI前沿动态。

🥇 **谷歌Gemini在IMO竞赛中获官方金牌认证**:谷歌DeepMind团队宣布其Gemini模型在国际数学奥林匹克竞赛(IMO)2025中达到了金牌水平。该模型在规定时间内成功解答了大部分难题,并由IMO官方进行评分认证,其表现证明了AI在高级数学推理上的飞速发展。

🚀 **AI在数学领域竞争激烈,谷歌与OpenAI齐头并进**:继OpenAI宣布其模型在IMO取得佳绩后,谷歌Gemini也获得官方认可。尽管两家公司采用了不同的测试和评分方法,但都清晰表明AI在数学超智能领域的竞赛已正式打响,预示着AI解决复杂数学问题的能力将持续提升。

💡 **阿里巴巴Qwen3引领开源模型新高度**:阿里巴巴发布的Qwen3模型在开源领域表现抢眼,其更新后的非思考版本在各项基准测试中超越了Kimi K2,并能与Claude Opus 4等闭源模型相媲美。这一成就不仅体现了中国AI的创新实力,也强调了开源策略在推动AI发展中的重要作用。

🛠️ **AI邮件助手构建教程**:文章提供了一个关于如何使用xAI的Grok 4模型和n8n自动化平台来创建AI邮件草稿助手的分步教程,用户可以借此了解如何集成AI模型以提高工作效率,并提供了使用“草稿模式”进行预览的实用建议。

🧠 **脑启发的层次化推理模型展现高效智能**:Sapient Intelligence推出的Hierarchical Reasoning Model(HRM)是一种仅用2700万参数就能在复杂任务(如ARC-AGI和数独)上展现出强大推理能力的AI模型。其借鉴大脑皮层计算的架构,实现了高效的智能,预示着AI在低数据环境下的部署潜力。

Read Online | Sign Up | Advertise

Good morning, {{ first_name | AI enthusiasts }}. Fresh off yesterday’s news of OpenAI claiming gold-level performance at the International Math Olympiad, Google entered the spotlight—earning an official gold-medal standard… and stirring up some serious drama.

The two AI giants took different approaches to generating and grading math proofs, but one thing is clear: the race to build mathematical superintelligence is officially on.


In today’s AI rundown:

    Google’s ‘official’ gold win at IMO

    Alibaba’s Qwen3 takes open-source crown

    Create an AI agent that drafts emails for you

    Brain-inspired Hierarchical Reasoning Model

    4 new AI tools & 4 job opportunities

LATEST DEVELOPMENTS

GOOGLE

🥇 Google’s ‘official’ gold win at IMO

Image source: Google DeepMind

The Rundown: Google DeepMind has announced that its advanced version of Gemini with Deep Think has officially achieved gold-medal level performance at the International Mathematical Olympiad 2025, following OpenAI’s similar claim.

The details:

    DeepMind said it worked with IMO to test Gemini's mathematical reasoning on the same problem statements and time limits, 4.5 hours, as human competitors.

    Out of six problems covering algebra, combinatorics, geometry, and number theory, the AI solved five and scored 35/42—marking the gold-medal standard.

    Last year, DeepMind won silver by using domain-specific translations, but this year, its model tackled the problems entirely in natural language end-to-end.

    OpenAI also claimed the same score with an unnamed model, but it did not work with IMO and had the answers graded by former medalists.

    Google’s answers, on the other hand, were officially graded and certified by IMO coordinators using the same internal criteria as for student solutions.

Why it matters: Despite taking different paths, both models’ performance shows that AI is rapidly closing in on advanced mathematical reasoning. At this rate, the next frontier isn’t if they’ll solve all 6 out of 6 IMO problems—but rather when they’ll have the creativity to solve problems no human ever has.

TOGETHER WITH VANTA

🛡️ Your shortcut to startup compliance

The Rundown: Early compliance doesn’t just protect data — it also unlocks new customers and speeds up growth. Vanta’s Compliance for Startups Bundle makes it easy, with automation across 35+ frameworks and practical resources to guide you at every step.

The Compliance for Startups Bundle includes:

    Step-by-step compliance checklists

    Case studies from fast-growing startups

    On-demand videos with industry leaders

Get it here.

ALIBABA

⚙️ Alibaba’s Qwen3 takes open-source crown

Image source: Qwen

The Rundown: Alibaba’s Qwen team just took the open-source crown with the release of an updated, non-thinking Qwen3 model that beats Kimi K2 across the board and challenges top closed-source models like Anthropic’s Claude Opus 4.

The details:

    Following community feedback, Alibaba separated its hybrid thinking approach, training instruct and reasoning models independently.

    The new non-thinking version activates 22B of 235B parameters with a 256K-context window, delivering significant performance gains.

    In benchmarks, it surpassed Moonshot AI’s recently released Kimi K2 and challenged closed frontier models like Claude Opus 4 and GPT-4o-0327.

    The updated model is 100% open-source and is also available as the free default model on Qwen Chat, Alibaba’s ChatGPT competitor.

Why it matters: Another Chinese team has outshined frontier labs through bold open-source innovation, despite chip constraints from the West. The achievement spotlights China’s growing dominance in AI innovation—driven not just by technical prowess, but by a strategic push for openness and global influence.

AI TRAINING

🤖 Create an AI agent that drafts emails for you

The Rundown: In this tutorial, you’ll learn how to build an intelligent AI agent that draft’s emails for you using xAI’s Grok 4 model through n8n's workflow automation platform.

Step-by-step:

    Add a n8n chat message trigger and connect an AI Agent node to create your workflow foundation

    Configure xAI Grok Chat Model (Grok-4-0709) with your API credentials

    Add a Simple Memory node and set a Gmail integration to Create Draft (see image above as reference)

    Test with: “Draft an email to john@company.com asking for meeting availability” and customize with system messages

Pro tip: Use draft mode first to review AI-generated emails are on par with your writing before switching to automatic sending.

PRESENTED BY ASAPP

🤝 Elevate customer trust with GenerativeAgent

The Rundown: ASAPP’s GenerativeAgent is an enterprise-grade AI agent that resolves real customer issues (on its own) across voice and chat. With new features focused on precision and trust, it delivers reliable, high-quality service at every step, while enabling fast deployment and instant value.

With GenerativeAgent, you can:

    Drive better, smarter outcomes with AI that learns from human expertise over time

    Maintain full visibility with tools to flag anomalies, track trends, and enforce compliance

    Confidently launch with AI behavior testing in simulated environments

Take a self-guided tour of GenerativeAgent today.

SAPIENT INTELLIGENCE

🧠 Brain-inspired Hierarchical Reasoning Model

Image source: Sapient Intelligence

The Rundown: Sapient Intelligence introduced Hierarchical Reasoning Model, a brain-inspired open-source AI that delivers unprecedented reasoning power on complex tasks like ARC-AGI and Sudoku, with just 27M parameters.

The details:

    HRM’s architecture uses three principles seen in cortical computation: hierarchical processing, temporal separation, and recurrent connectivity.

    A high-level module handles abstract planning while a low-level one executes fast, detailed tasks, switching between automatic and deliberate reasoning.

    The approach enabled the model to beat larger ones like Claude 3.7, DeepSeek R1, and o3-mini-high on ARC-AGI 2 and complex Sudoku and maze puzzles.

    With no pretraining or CoT, it points to a new kind of efficient intelligence that doesn’t need immense training data or suffer from brittle task decomposition.

Why it matters: As AI moves to real-world decision-making—efficient, brain-inspired models like HRM signal a shift toward intelligence that’s not just powerful, but also deployable in low-data environments. Sapient is already putting this into practice, helping teams with rare-disease diagnostics and pushing climate forecasting accuracy.

QUICK HITS

🛠️ Trending AI Tools

    📝 Gemini Code Assist - Google’s AI coding assistant, now with agent mode

    🤖 SOLO - Trae’s all-in-one Context Engineer for full software development

    🧠 Composite - Turn your existing browser into an AI agent

    ⚙️ GEN - Create AI characters that build social media audiences end-to-end

💼 AI Job Opportunities

📰 Everything else in AI today

Cohere Labs introduced Catalyst Grants Program, providing free access to its models to teams tackling challenges in areas like education, healthcare, and climate.

AI video company Pika announced a new AI-only social video app, built on a highly expressive human video model, with early access waitlist now open for iOS users.

OpenAI’s ChatGPT now gets over 2.5B daily requests (meaning 912.5B annually), with 330 million coming from users based in the U.S alone.

Netflix said it used generative AI in an Argentine TV series and completed its VFX sequence “10 times faster” than it could have been completed with traditional tools.

Elon Musk’s xAI poached Ethan He, one of Nvidia’s top AI researchers who led the work on Cosmos, the company’s SOTA world model.

Runway announced its Act-Two motion capture model is now available via the API, allowing users to integrate it directly into their apps, platforms, and websites.

COMMUNITY

🎥 Join our next live workshop

Check out our last live workshop with Dr. Alvaro Cintas, The Rundown’s AI professor, and learn how to use Perplexity Comet (and other alternatives) to automate your browsing experience.

Watch it here. Not a member? Join The Rundown University on a 14-day free trial.

See you soon,

Rowan, Joey, Zach, Alvaro, and Shubham—The Rundown’s editorial team

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 数学竞赛 谷歌 OpenAI 开源模型
相关文章