MarkTechPost@AI 2024年07月31日
Researchers at Stanford Present RelBench: An Open Benchmark for Deep Learning on Relational Databases
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

斯坦福等机构研究者推出RelBench,用于促进关系数据库的深度学习。该基准旨在标准化评估模型,解决传统方法的问题,提高预测任务的效率和准确性。

🌐Relational databases在多个领域至关重要,但其丰富信息常未充分利用,传统方法存在诸多问题,如数据简化导致信息丢失、需复杂数据提取流程且易出错等。

💡斯坦福等研究者引入RelBench,将关系数据库转为图表示,利用GNN进行预测任务,包括创建异质时态图、提取节点特征等,为RDL方法提供基础设施。

📊研究者对比RDL与传统方法,RDL模型在多种预测任务中表现出色,如在实体分类、回归和推荐任务中,显著提高了准确性并大幅减少人力和代码量。

🎉RelBench作为标准化基准和综合基础设施,提高了预测准确性,减少了人工努力,为复杂多表数据集开发了更高效可扩展的深度学习解决方案。

Relational databases are integral to many digital systems, providing structured data storage across various sectors, such as e-commerce, healthcare, and social media. Their table-based structure simplifies maintenance and data access via powerful query languages like SQL, making them crucial for data management. These databases underpin significant portions of the digital economy, efficiently organizing and retrieving data necessary for operations in diverse fields. However, the richness of relational information in these databases is often underutilized due to the complexity of handling multiple interconnected tables.

A major challenge in utilizing relational databases is extracting predictive signals embedded in the intricate relationships between tables. Traditional methods often flatten relational data into simpler formats, typically a single table. While simplifying data structure, this process leads to a substantial loss of predictive information and necessitates the creation of complex data extraction pipelines. These pipelines are prone to errors, increase software complexity, and require significant manual effort. Consequently, there is a pressing need for methods to exploit data’s relational nature without oversimplification fully.

Existing methods for managing relational data largely rely on manual feature engineering. In this approach, data scientists painstakingly transform raw data into formats suitable for ML models. This process is labor-intensive and often results in inconsistencies and errors. Manual feature engineering also limits the scalability of predictive models, as each new task or dataset requires substantial rework. Despite being the current gold standard, this method is inefficient and cannot fully leverage the predictive power inherent in relational databases.

Researchers from Stanford University, Kumo.AI, and the Max Planck Institute for Informatics introduced RelBench, a groundbreaking benchmark to facilitate deep learning on relational databases. This initiative aims to standardize the evaluation of deep learning models across diverse domains and scales. RelBench provides a comprehensive infrastructure for developing and testing relational deep learning (RDL) methods, enabling researchers to compare their models against consistent benchmarks.

RelBench leverages a novel approach by converting relational databases into graph representations, enabling the use of Graph Neural Networks (GNNs) for predictive tasks. This conversion involves creating a heterogeneous temporal graph where nodes represent entities and edges denote relationships. Initial node features are extracted using deep tabular models designed to handle diverse column types such as numerical, categorical, and text data. The GNN then iteratively updates these node embeddings based on their neighbors, facilitating the extraction of complex relational patterns.

Researchers compared their RDL approach to traditional manual feature engineering methods across various predictive tasks. The results were compelling: RDL models consistently outperformed or matched the accuracy of manually engineered models while drastically reducing the required human effort and lines of code by over 90%. For instance, in entity classification tasks, RDL achieved AUROC scores of 70.45% and 82.39% for user churn and item churn, respectively, significantly surpassing the traditional LightGBM classifier.

In entity regression tasks, RDL models demonstrated superior performance. For example, the Mean Absolute Error (MAE) for user lifetime value predictions was reduced by over 14%, showcasing the precision and efficiency of RDL models. In recommendation tasks, RDL models achieved remarkable improvements, with Mean Average Precision (MAP) scores increasing by over 300% in some cases. These results underscore the potential to automate and enhance predictive tasks on relational databases, opening new avenues for research and application.

In conclusion, the introduction of RelBench provides a standardized benchmark and comprehensive infrastructure, enabling researchers to powerfully exploit relational databases’ predictive power. This benchmark improves prediction accuracy and significantly reduces the manual effort required, making it a transformative tool for the field. With RelBench, researchers have developed more efficient and scalable deep learning solutions for complex multi-tabular datasets.


Check out the Paper, GitHub, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post Researchers at Stanford Present RelBench: An Open Benchmark for Deep Learning on Relational Databases appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RelBench 关系数据库 深度学习 预测任务
相关文章