MarkTechPost@AI 02月15日
Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizations
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

微软研究院推出Data Formulator,一款创新的可视化创作工具,它采用概念绑定的新范式,允许用户通过将数据概念绑定到视觉通道来表达其可视化意图。该工具支持两种创建新概念的方法:用于数据推导的自然语言提示和用于数据重塑的基于示例的输入。Data Formulator的AI后端推断必要的数据转换并生成候选可视化结果,并提供解释性反馈,帮助用户检查、改进和迭代可视化结果。用户测试表明,该工具在任务完成和可用性方面表现出色,有效解决了数据转换的难题。

💡Data Formulator采用概念绑定的方法,将数据概念视为一等公民,允许用户通过自然语言提示或示例输入来创建新的数据概念,从而简化了数据转换的过程。

🤖Data Formulator的AI后端利用大型语言模型(LLM)理解用户的意图,并自动推断所需的数据转换,生成多个候选可视化结果,用户可以通过直观的界面进行选择和调整。

📊用户测试表明,Data Formulator在任务完成和可用性方面表现出色,用户可以在较短的时间内完成复杂的可视化任务,证明了该工具在解决数据转换难题方面的有效性。

🧑‍💻Data Formulator 的架构围绕数据概念展开,有别于传统方法侧重于表级操作,这种设计让用户更容易与 AI 代理沟通并验证结果,同时结合自然语言和示例编程,用户可以在熟悉的工具中访问强大的转换功能。

Most modern visualization authoring tools like Charticulator, Data Illustrator, and Lyra,  and libraries like ggplot2, and VegaLite expect tidy data, where every variable to be visualized is a column and each observation is a row. When the input data is in a tidy format, authors simply need to bind data columns to visual channels, otherwise, they need to prepare the data, even if the original data is clean and contains all the information. Moreover, users must transform their data using specialized libraries like tidyverse or pandas, or separate tools like Wrangler before they can create visualizations. This requirement poses two major challenges – the need for programming expertise or specialized tool knowledge, and the inefficient workflow of constantly switching between data transformation and visualization steps.

Various approaches have emerged to simplify visualization creation, starting with the grammar of graphics concepts that established the foundation for mapping data to visual elements. High-level grammar-based tools like ggplot2, Vega-Lite, and Altair have gained popularity for their concise syntax and abstraction of complex implementation details. More advanced approaches include visualization by demonstration tools like Lyra 2 and VbD, which allow users to specify visualizations through direct manipulation. Natural language interfaces, such as NCNet and VisQA, have also been developed to make visualization creation more intuitive. However, these solutions either require tidy data input or introduce new complexities by focusing on low-level specifications similar to Falx.

A team from Microsoft Research has proposed Data Formulator, an innovative visualization authoring tool built around a new paradigm called concept binding. It allows users to express their visualization intent by binding data concepts to visual channels, where data concepts can either come from existing columns or be created on demand. The tool supports two methods for creating new concepts: natural language prompts for data derivation and example-based input for data reshaping. When users select a chart type and map their desired concepts, Data Formulator’s AI backend infers the necessary data transformations and generates candidate visualizations. The system provides explanatory feedback for multiple candidates, enabling users to inspect, refine, and iterate on their visualizations through an intuitive interface.

Data Formulator’s architecture is built around the core concept of treating data concepts as first-class objects that serve as abstractions of existing and potential future table columns. This design fundamentally differs from traditional approaches by focusing on concept-level transformations rather than table-level operators, making it more intuitive for users to communicate with the AI agent and verify results. The natural language component of the tool utilizes LLMs’ ability to understand high-level intent and natural concepts, while the programming-by-example component offers precise, unambiguous reshaping operations through demonstration. This hybrid architecture allows users to work with familiar shelf-configuration tools while accessing powerful transformation capabilities.

Data Formulator’s evaluation through user testing revealed promising results in task completion and usability. Participants completed all assigned visualization tasks within an average time of 20 minutes, with Task 6 requiring the most time due to its complexity involving 7-day moving average calculations. The system’s dual-interaction approach proved effective, though some participants needed occasional hints regarding concept type selection and data type management. For derived concepts, users averaged 1.62 prompt attempts with relatively concise descriptions (average of 7.28 words), and the system generated approximately 1.94 candidates per prompt. Most challenges encountered were minor and related to interface familiarization rather than fundamental usability issues.

In conclusion, the team introduced Data Formulator which represents a significant advancement in visualization authoring by effectively addressing the persistent challenge of data transformation through its concept-driven approach. The tool’s innovative combination of AI assistance and user interaction enables authors to create complex visualizations without directly handling data transformations. User studies have validated the tool’s effectiveness, showing that even users facing complex data transformation requirements can successfully create their desired visualizations. Looking forward, this concept-driven visualization approach shows promise for influencing the next generation of visual data exploration and authoring tools, potentially eliminating the long-standing barrier of data transformation in visualization creation.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

The post Microsoft Research Introduces Data Formulator: An AI Application that Leverages LLMs to Transform Data and Create Rich Visualizations appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Data Formulator 数据可视化 人工智能 LLM
相关文章