MarkTechPost@AI 2024年12月22日
Meet LOTUS 1.0.0: An Advanced Open Source Query Engine with a DataFrame API and Semantic Operators
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

LOTUS 1.0.0是由斯坦福和伯克利的研究人员推出的开源查询引擎,旨在简化大规模数据处理。它采用类似Pandas的接口,并引入了语义操作符,允许用户通过自然语言表达复杂的查询,系统后台则负责优化执行计划。LOTUS支持结构化和非结构化数据,利用大型语言模型和轻量级代理模型提高效率。在事实核查、多标签分类、搜索排序和图像处理等多个领域都取得了显著成果,证明了其在AI增强数据处理方面的强大能力。

💡**语义操作符**: LOTUS的核心是其创新的语义操作符,包括语义过滤器、语义连接和语义聚合,这些操作符允许用户使用自然语言条件来执行数据转换,简化了复杂查询的表达。

🚀**性能优化**: LOTUS采用模型级联和语义索引等优化技术,在保证结果质量的同时,显著降低了计算成本。例如,语义过滤器在实现高精度和召回率的同时,确保了计算效率。

🛠️**多场景应用**: LOTUS在多个实际应用中表现出色,包括在FEVER数据集上的事实核查,BioDEX数据集上的多标签分类,以及SciFact和CIFAR-bench数据集上的搜索排序,均展现出超越传统方法的性能。

Modern data programming involves working with large-scale datasets, both structured and unstructured, to derive actionable insights. Traditional data processing tools often struggle with the demands of advanced analytics, particularly when tasks extend beyond simple queries to include semantic understanding, ranking, and clustering. While systems like Pandas or SQL-based tools handle relational data well, they face challenges in integrating AI-driven, context-aware processing. Tasks such as summarizing Arxiv papers or fact-checking claims against extensive databases require sophisticated reasoning capabilities. Moreover, these systems often lack the abstractions needed to streamline workflows, leaving developers to create complex pipelines manually. This leads to inefficiencies, high computational costs, and a steep learning curve for users without a strong AI programming background.

Stanford and Berkeley researchers have introduced LOTUS 1.0.0: an advanced version of LOTUS (LLMs Over Tables of Unstructured and Structured Data), an open-source query engine designed to address these challenges. LOTUS simplifies programming with a Pandas-like interface, making it accessible to users familiar with standard data manipulation libraries. More importantly, now the research team introduces a set of semantic operators—declarative programming constructs such as filters, joins, and aggregations—that use natural language expressions to define transformations. These operators enable users to express complex queries intuitively while the system’s backend optimizes execution plans, significantly improving performance and efficiency.

Technical Insights and Benefits

LOTUS is built around the innovative use of semantic operators, which extend the relational model with AI-driven reasoning capabilities. Key examples include:

These operators leverage large language models (LLMs) and lightweight proxy models to ensure both accuracy and efficiency. LOTUS incorporates optimization techniques, such as model cascades and semantic indexing, to reduce computational costs while maintaining high-quality results. For instance, semantic filters achieve precision and recall targets with probabilistic guarantees, balancing computational efficiency with output reliability.

The system supports both structured and unstructured data, making it versatile for applications involving tabular datasets, free-form text, and even images. By abstracting the complexities of algorithmic choices and context limitations, LOTUS provides a user-friendly yet powerful framework for building AI-enhanced pipelines.

Results and Real-World Applications

LOTUS has proven its effectiveness across various use cases:

    Fact-Checking: On the FEVER dataset, a LOTUS pipeline written in under 50 lines of code achieved 91% accuracy, surpassing state-of-the-art baselines like FacTool by 10 percentage points. Additionally, LOTUS reduced execution time by up to 28 times.Extreme Multi-Label Classification: For biomedical text classification on the BioDEX dataset, LOTUS’ semantic join operator reproduced state-of-the-art results with significantly lower execution time compared to naive approaches.Search and Ranking: LOTUS’ semantic top-k operator demonstrated superior ranking capabilities on datasets like SciFact and CIFAR-bench, achieving higher quality while offering faster execution than traditional ranking methods.Image Processing: LOTUS has extended support to image datasets, enabling tasks like generating themed memes by processing semantic attributes of images.

These results highlight LOTUS’ ability to combine expressiveness with performance, simplifying development while delivering impactful results.

Conclusion

The latest version of LOTUS offers a fresh approach to data programming by combining natural language-based queries with AI-driven optimizations. By enabling developers to construct complex pipelines in just a few lines of code, LOTUS makes advanced analytics more accessible while enhancing productivity and efficiency. As an open-source project, LOTUS encourages community collaboration, ensuring ongoing enhancements and broader applicability. For users seeking to maximize the potential of their data, LOTUS provides a practical and efficient solution.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Meet LOTUS 1.0.0: An Advanced Open Source Query Engine with a DataFrame API and Semantic Operators appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LOTUS 语义操作符 AI驱动 数据查询 开源
相关文章