AI News 2024年09月27日
Ivo Everts, Databricks: Enhancing open-source AI and improving data governance
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Databricks的多项成果,包括DBRX模型、Unity Catalog、Databricks AI/BI、Mosaic AI及Data Intelligence Platform,旨在塑造开源AI和数据治理的未来。

📌DBRX模型为开放大型语言模型设立新标准,在标准基准测试中表现出色,推理速度比Llama2-70B等模型快2倍,且训练效率更高,被认为是优质的开源模型之一。

📌Unity Catalog解决数据蔓延和访问控制不一致问题,具有集中式数据访问管理、基于角色的访问控制、数据沿袭和审计、跨云和混合支持等功能。

📌Databricks AI/BI是新的商业智能产品,包含AI驱动的低代码仪表盘和对话式界面Genie,能增强数据探索和可视化,提供深入的数据语义理解。

📌Mosaic AI是构建、部署和管理机器学习与生成式AI应用的综合平台,具有统一工具、支持生成式AI模式、集中式模型管理、监控和治理、成本效益高的定制LLM等组件。

📌Data Intelligence Platform是核心,结合数据湖和数据仓库特征,利用Delta Lake技术实时处理数据,通过Delta Sharing安全交换数据,支持机器学习和AI模型开发。

Ahead of AI & Big Data Expo Europe, AI News caught up with Ivo Everts, Senior Solutions Architect at Databricks, to discuss several key developments set to shape the future of open-source AI and data governance.

One of Databricks’ notable achievements is the DBRX model, which set a new standard for open large language models (LLMs).

“Upon release, DBRX outperformed all other leading open models on standard benchmarks and has up to 2x faster inference than models like Llama2-70B,” Everts explains. “It was trained more efficiently due to a variety of technological advances.

“From a quality standpoint, we believe that DBRX is one of the best open-source models out there and when we refer to ‘best’ this means a wide range of industry benchmarks, including language understanding (MMLU), Programming (HumanEval), and Math (GSM8K).”

The open-source AI model aims to “democratise the training of custom LLMs beyond a small handful of model providers and show organisations that they can train world-class LLMs on their data in a cost-effective way.”

In line with their commitment to open ecosystems, Databricks has also open-sourced Unity Catalog.

“Open-sourcing Unity Catalog enhances its adoption across cloud platforms (e.g., AWS, Azure) and on-premise infrastructures,” Everts notes. “This flexibility allows organisations to uniformly apply data governance policies regardless of where the data is stored or processed.”

Unity Catalog addresses the challenges of data sprawl and inconsistent access controls through various features:

    Centralised data access management: “Unity Catalog centralises the governance of data assets, allowing organisations to manage access controls in a unified manner,” Everts states.Role-Based Access Control (RBAC): According to Everts, Unity Catalog “implements Role-Based Access Control (RBAC), allowing organisations to assign roles and permissions based on user profiles.”Data lineage and auditing: This feature “helps organisations monitor data usage and dependencies, making it easier to identify and eliminate redundant or outdated data,” Everts explains. He adds that it also “logs all data access and changes, providing a detailed audit trail to ensure compliance with data security policies.”Cross-cloud and hybrid support: Everts points out that Unity Catalog “is designed to manage data governance in multi-cloud and hybrid environments” and “ensures that data is governed uniformly, regardless of where it resides.”

The company has introduced Databricks AI/BI, a new business intelligence product that leverages generative AI to enhance data exploration and visualisation. Everts believes that “a truly intelligent BI solution needs to understand the unique semantics and nuances of a business to effectively answer questions for business users.”

The AI/BI system includes two key components:

    Dashboards: Everts describes this as “an AI-powered, low-code interface for creating and distributing fast, interactive dashboards.” These include “standard BI features like visualisations, cross-filtering, and periodic reports without needing additional management services.”Genie: Everts explains this as “a conversational interface for addressing ad-hoc and follow-up questions through natural language.” He adds that it “learns from underlying data to generate adaptive visualisations and suggestions in response to user queries, improving over time through feedback and offering tools for analysts to refine its outputs.”

Everts states that Databricks AI/BI is designed to provide “a deep understanding of your data’s semantics, enabling self-service data analysis for everyone in an organisation.” He notes it’s powered by “a compound AI system that continuously learns from usage across an organisation’s entire data stack, including ETL pipelines, lineage, and other queries.”

Databricks also unveiled Mosaic AI, which Everts describes as “a comprehensive platform for building, deploying, and managing machine learning and generative AI applications, integrating enterprise data for enhanced performance and governance.”

Mosaic AI offers several key components, which Everts outlines:

    Unified tooling: Provides “tools for building, deploying, evaluating, and governing AI and ML solutions, supporting predictive models and generative AI applications.”Generative AI patterns: “Supports prompt engineering, retrieval augmented generation (RAG), fine-tuning, and pre-training, offering flexibility as business needs evolve.”Centralised model management: “Model Serving allows for centralised deployment, governance, and querying of AI models, including custom ML models and foundation models.”Monitoring and governance: “Lakehouse Monitoring and Unity Catalog ensure comprehensive monitoring, governance, and lineage tracking across the AI lifecycle.”Cost-effective custom LLMs: “Enables training and serving custom large language models at significantly lower costs, tailored to specific organisational domains.”

Everts highlights that Mosaic AI’s approach to fine-tuning and customising foundation models includes unique features like “fast startup times” by “utilising in-cluster base model caching,” “live prompt evaluation” where users can “track how the model’s responses change throughout the training process,” and support for “custom pre-trained checkpoints.”

At the heart of these innovations lies the Data Intelligence Platform, which Everts says “transforms data management by using AI models to gain deep insights into the semantics of enterprise data.” The platform combines features of data lakes and data warehouses, utilises Delta Lake technology for real-time data processing, and incorporates Delta Sharing for secure data exchange across organisational boundaries.

Everts explains that the Data Intelligence Platform plays a crucial role in supporting new AI and data-sharing initiatives by providing:

    A unified data and AI platform that “combines the features of data lakes and data warehouses into a single architecture.”Delta Lake for real-time data processing, ensuring “reliable data governance, ACID transactions, and real-time data processing.”Collaboration and data sharing via Delta Sharing, enabling “secure and open data sharing across organisational boundaries.”Integrated support for machine learning and AI model development with popular libraries like MLflow, PyTorch, and TensorFlow.Scalability and performance through its cloud-native architecture and the Photon engine, “an optimised query execution engine.”

As a key sponsor of AI & Big Data Expo Europe, Databricks plans to showcase their open-source AI and data governance solutions during the event.

“At our stand, we will also showcase how to create and deploy – with Lakehouse apps – a custom GenAI app from scratch using open-source models from Hugging Face and data from Unity Catalog,” says Everts.

“With our GenAI app you can generate your own cartoon picture, all running on the Data Intelligence Platform.”

Databricks will be sharing more of their expertise at this year’s AI & Big Data Expo Europe. Swing by Databricks’ booth at stand #280 to hear more about open AI and improving data governance.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Ivo Everts, Databricks: Enhancing open-source AI and improving data governance appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Databricks 开源AI 数据治理 AI应用 数据平台
相关文章