MarkTechPost@AI 05月02日 01:55
Salesforce AI Research Introduces New Benchmarks, Guardrails, and Model Architectures to Advance Trustworthy and Capable AI Agents
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Salesforce AI 研究团队发布了一份全面的路线图,旨在构建更智能、可靠且多功能的AI代理。该计划重点关注当前AI系统在任务执行一致性、稳健性以及适应复杂企业工作流程方面的局限性。通过引入新的基准、模型架构和安全机制,Salesforce正在建立一个多层框架,以负责任地扩展代理系统。研究涉及了SIMPLE基准测试、ContextualJudgeBench评估、SFR-Guard模型家族、CRMArena模拟评估套件、xLAM和TACO模型家族,以及Agentforce平台,旨在提高AI的可靠性、适应性和企业软件的契合度。

🎯 Salesforce 提出了“锯齿智能”的概念,并引入了SIMPLE基准测试来解决AI在类似复杂任务中表现不一致的问题。SIMPLE包含225个简单的推理问题,人类可以近乎完美地回答,但对语言模型来说仍然具有挑战性。该基准旨在揭示模型在看似统一的问题上泛化的能力差距。

🛡️ 为了增强AI在企业环境中的安全性和稳健性,Salesforce扩展了其信任层,引入了新的安全措施。SFR-Guard模型家族经过训练,可以检测提示注入、有害输出和虚构内容。CRMArena是一个基于模拟的评估套件,用于在模拟真实CRM工作流程的条件下测试代理的性能,确保AI代理能够泛化并可在各种企业任务中可预测地运行。

🛠️ Salesforce推出了xLAM和TACO这两个新的模型家族,以支持代理中更结构化、目标导向的行为。xLAM系列模型针对工具使用、多轮交互和函数调用进行了优化,支持企业级部署。TACO模型旨在提高代理的规划能力,通过明确地建模中间推理步骤和相应的操作,增强代理将复杂目标分解为操作序列的能力。

Salesforce AI Research has outlined a comprehensive roadmap for building more intelligent, reliable, and versatile AI agents. The recent initiative focuses on addressing foundational limitations in current AI systems—particularly their inconsistent task performance, lack of robustness, and challenges in adapting to complex enterprise workflows. By introducing new benchmarks, model architectures, and safety mechanisms, Salesforce is establishing a multi-layered framework to scale agentic systems responsibly.

Addressing “Jagged Intelligence” Through Targeted Benchmarks

One of the central challenges highlighted in this research is what Salesforce terms jagged intelligence: the erratic behavior of AI agents across tasks of similar complexity. To systematically diagnose and reduce this problem, the team introduced the SIMPLE benchmark. This dataset contains 225 straightforward, reasoning-oriented questions that humans answer with near-perfect consistency but remain non-trivial for language models. The goal is to reveal gaps in models’ ability to generalize across seemingly uniform problems, particularly in real-world reasoning scenarios.

Complementing SIMPLE is ContextualJudgeBench, which evaluates an agent’s ability to maintain accuracy and faithfulness in context-specific answers. This benchmark emphasizes not only factual correctness but also the agent’s ability to recognize when to abstain from answering—an important trait for trust-sensitive applications such as legal, financial, and healthcare domains.

Strengthening Safety and Robustness with Trust Mechanisms

Recognizing the importance of AI reliability in enterprise settings, Salesforce is expanding its Trust Layer with new safeguards. The SFR-Guard model family has been trained on both open-domain and domain-specific (CRM) data to detect prompt injections, toxic outputs, and hallucinated content. These models serve as dynamic filters, supporting real-time inference with contextual moderation capabilities.

Another component, CRMArena, is a simulation-based evaluation suite designed to test agent performance under conditions that mimic real CRM workflows. This ensures AI agents can generalize beyond training prompts and operate predictably across varied enterprise tasks.

Specialized Model Families for Reasoning and Action

To support more structured, goal-directed behavior in agents, Salesforce introduced two new model families: xLAM and TACO.

The xLAM (eXtended Language and Action Models) series is optimized for tool use, multi-turn interaction, and function calling. These models vary in scale (from 1B to 200B+ parameters) and are built to support enterprise-grade deployments, where integration with APIs and internal knowledge sources is essential.

TACO (Thought-and-Action Chain Optimization) models aim to improve agent planning capabilities. By explicitly modeling intermediate reasoning steps and corresponding actions, TACO enhances the agent’s ability to decompose complex goals into sequences of operations. This structure is particularly relevant for use cases like document automation, analytics, and decision support systems.

Operationalizing Agents via Agentforce

These capabilities are being unified under Agentforce, Salesforce’s platform for building and deploying autonomous agents. The platform includes a no-code Agent Builder, which allows developers and domain experts to specify agent behaviors and constraints using natural language. Integration with the broader Salesforce ecosystem ensures agents can access customer data, invoke workflows, and remain auditable.

A study by Valoir found that teams using Agentforce can build production-ready agents 16 times faster compared to traditional software approaches, while improving operational accuracy by up to 75%. Importantly, Agentforce agents are embedded within the Salesforce Trust Layer, inheriting the safety and compliance features required in enterprise contexts.

Conclusion

Salesforce’s research agenda reflects a shift toward more deliberate, architecture-aware AI development. By combining targeted evaluations, fine-grained safety models, and purpose-built architectures for reasoning and action, the company is laying the groundwork for next-generation agentic systems. These advances are not only technical but structural—emphasizing reliability, adaptability, and alignment with the nuanced needs of enterprise software.


Check out the Technical details. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Salesforce AI Research Introduces New Benchmarks, Guardrails, and Model Architectures to Advance Trustworthy and Capable AI Agents appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Salesforce AI代理 机器学习
相关文章