Society's Backend 2024年12月13日
Digitizing Smell, Automatic Prompt Optimization, Targeted AI Regulation, an Intro to AI Agents, and More
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周AI领域涌现诸多进展,包括AI Agent的动态、新模型与机器学习技术、AI新闻以及行业发展等方面的汇总。此外,还介绍了监督式机器学习在科学研究中的应用,并推荐了相关的论文播客。OpenAI推出了SimpleQA基准来评估语言模型的准确性,自动提示优化技术也在不断提升。多模态LLM技术,以及Meta的Llama 3.2等模型也取得了新的进展。文章还讨论了AI的监管问题,并介绍了LMSys如何通过用户投票来改进LLM的基准测试。此外,谷歌的Gemini API新增了搜索功能,可以为开发者提供更准确的信息。最后,还介绍了AI Agent的构建、气味数字化技术,以及一个拥有10万亿参数的AI模型。

🤖AI Agent动态:每周汇总AI Agent领域的最新进展,包括新模型和机器学习技术,以及相关公司的新闻。

🔬科研应用:监督式机器学习在科学研究中的应用,包括可解释性、因果关系和不确定性等重要议题。

✅基准评估:OpenAI推出SimpleQA基准,旨在评估语言模型的准确性,并推动AI模型更加可靠。

🖼️多模态LLM:多模态大型语言模型可以处理文本、图像和音频等多种数据类型,Meta AI的Llama 3.2等模型表现突出。

⚖️监管与优化:针对AI的监管策略,以及通过自动提示优化技术提升LLM的质量。

Here's a comprehensive AI reading list from this past week. Thanks to all the incredible authors for creating these helpful articles and learning resources.

I put one of these together each week. If reading about AI updates and topics is something you enjoy, make sure to subscribe.

Society's Backend is reader supported. You can support my work (these reading lists and standalone articles) for 80% off for the first year (just $1/mo). You'll also get the extended reading list each week.

A huge thanks to all supporters.

Get 80% off for 1 year

What Happened Last Week

Here are some resources to learn more about what happened in AI last week and why those happenings are important:

What Else You Should Know

Supervised Machine Learning for Science is published

Christoph Molnar and Timo have published a new book titled "Supervised Machine Learning for Science," which explores how machine learning can be effectively used in scientific research. The book covers important topics like interpretability, causality, and uncertainty, aiming to enhance the understanding and application of ML in science.

Papers Podcast

ML papers are difficult to keep up with. Here’s this week’s NotebookLM-generated podcast going over important papers you should know:

Last Week’s Reading List

Reading List

Introducing SimpleQA

OpenAI has created SimpleQA, a new benchmark to evaluate the factual accuracy of language models. It focuses on short, clear questions with definitive answers to reduce errors and improve trustworthiness. SimpleQA aims to drive research towards making AI models more reliable and accurate.

Source

Automatic Prompt Optimization

By Cameron R. Wolfe, Ph.D.

Automatic prompt optimization uses algorithms to improve prompts without human intervention. This process involves generating and evaluating various prompt variants to find the most effective one. Recent research shows that large language models (LLMs) can significantly enhance prompt quality through these automated techniques.

Source

Understanding Multimodal LLMs

By Sebastian Raschka, PhD

The article explains multimodal large language models (LLMs), which can process different types of data like text, images, and audio. It highlights two main approaches for building these models: the Unified Embedding Decoder Architecture and the Cross-Modality Attention Architecture. The author also reviews recent multimodal models, including Meta AI's Llama 3.2, showcasing their capabilities in tasks like image captioning.

Source

The case for targeted regulation

Governments need to create targeted regulations for AI to manage risks while allowing innovation. Anthropic suggests using Responsible Scaling Policies (RSPs) to identify and mitigate these risks effectively. Collaboration among policymakers, the AI industry, and other stakeholders is crucial to establish a solid regulatory framework soon.

Source

In the Arena: How LMSys changed LLM Benchmarking Forever

By Latent Space

LMSys has transformed how language models are benchmarked by using a voting system that reflects user preferences instead of traditional metrics. The Arena ELO scores, which have gained over a million votes, provide a more practical view of model performance compared to academic standards. This approach captures diverse user experiences and is expanding to include multimodal evaluations and specialized tasks.

Source

Gemini API and Google AI Studio now offer Grounding with Google Search

Google AI Studio and the Gemini API now include a feature called Grounding with Google Search, which helps developers get more accurate and up-to-date responses. This feature provides supporting links and search suggestions, making AI applications more trustworthy and informative. Developers can enable this feature to enhance their applications and deliver richer content to users.

Source

Why Executives Seem Out of Touch, and How to Reach Them

By Ethan Evans

Ethan Evans, a former Amazon VP, discusses why employees often feel disconnected from their leaders in large organizations. As companies grow, decisions made by executives can seem surprising or out of touch. He offers insights and resources for career growth and understanding executive perspectives.

Source

Introduction to AI Agents

This course teaches how to build effective AI agents and complex workflows using LLMs. Students will learn key concepts, including multi-agent systems and the no-code tool Flowise AI. By completing the course, participants will earn a certificate and gain skills applicable to various domains.

Source

Digitizing smell to give everyone a chance at a better life.

Osmo is creating technology to generate smells like we create images and sounds. The team, with expertise in various fields, was previously at Google Research and is now focused on building a startup dedicated to digitizing smell. They have attracted notable investors, including the Bill & Melinda Gates Foundation and other prominent individuals.

Source

The 10 Trillion Parameter AI Model With 300 IQ

A new AI model has been developed with 10 trillion parameters and an IQ of 300. This powerful model aims to improve various tasks in artificial intelligence. Its advanced capabilities could change how we interact with technology.

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Read more

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI Agent 多模态LLM 机器学习 AI监管 SimpleQA
相关文章