MarkTechPost@AI 2024年07月31日
rLLM (relationLLM): A PyTorch Library Designed for Relational Table Learning (RTL) with Large Language Models (LLMs)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

rLLM是一个基于PyTorch的库,旨在利用大型语言模型(LLMs)进行关系型表格学习(RTL)。它通过将最先进的图神经网络(GNNs)、LLMs和表格神经网络(TNNs)分解成标准化模块,并使用“组合、对齐和联合训练”方法构建健壮的模型,来解决RTL中的挑战。rLLM还引入了BRIDGE算法,该算法使用TNNs处理表格数据,并利用关系表格中的“外键”来建立表格样本之间的关系,然后使用GNNs进行分析。为了解决RTL领域中数据集稀缺的问题,rLLM还引入了名为SJTUTables的强大数据集,其中包含三个关系型表格数据集:TML1M、TLF2K和TACM12K。

🤔 **rLLM的架构**:rLLM框架包含三个主要层:数据引擎层、模块层和模型层。数据引擎层专注于图和表格数据的基本数据结构,并通过Dataset子类和BaseGraph/BaseTable子类分别解耦数据加载和存储。模块层将GNNs、LLMs和TNNs的操作分解成标准子模块。模型层通过结合GNNs、LLMs和TNNs的模块构建特定于任务的模型,这些模型可以用于各种RTL任务。

🚀 **BRIDGE算法**:BRIDGE是rLLM中一个用于RTL的简单方法。它使用TNNs处理表格数据,并利用关系表格中的“外键”来建立表格样本之间的关系,然后使用GNNs进行分析。BRIDGE考虑了多个表格及其之间的互连,提供了对关系型数据进行全面分析的方法。

📊 **SJTUTables数据集**:为了解决RTL领域中数据集稀缺的问题,rLLM项目引入了名为SJTUTables的强大数据集。该数据集包含三个关系型表格数据集:TML1M、TLF2K和TACM12K。这些数据集涵盖了各种关系型数据,为研究人员提供了宝贵的资源,以开发和评估RTL方法。

💪 **优势与局限性**:rLLM框架在利用大型语言模型进行关系型表格学习方面提供了强大的方法。它通过集成先进的方法和优化数据结构来提高效率。然而,rLLM的应用范围仍然有限,并且需要进一步的开发和评估。

🤝 **合作与未来**:rLLM项目鼓励研究人员和软件工程师加入进来,共同扩展其功能和在关系型数据分析领域中的应用。该项目旨在推动RTL领域的发展,并为利用LLMs处理关系型数据提供一个强大的平台。

Large language models (LLMs) have emerged as powerful tools in artificial intelligence, demonstrating remarkable capabilities in understanding and generating text. These models utilize advanced technologies such as web-scale unsupervised pretraining, instruction fine-tuning, and value alignment, showcasing strong performance across various tasks. However, the application of LLMs to real-world big data presents significant challenges, primarily due to the enormous costs involved. By 2025, the total cost of LLMs is projected to reach nearly $5,000 trillion, far exceeding the GDP of major economies. This financial burden is particularly pronounced in processing text and structured data, which account for a substantial portion of the expenses despite being smaller in volume compared to multimedia data. As a result, there has been a growing focus on Relational Table Learning (RTL) in recent years, given that relational databases host approximately 73% of the world’s data.

Researchers from Shanghai Jiao Tong University and Tsinghua University present rLLM (relationLLM) project, which addresses the challenges in RTL by providing a platform for rapid development of RTL-type methods using LLMs. This innovative approach focuses on two key functions: decomposing state-of-the-art Graph Neural Networks (GNNs), LLMs, and Table Neural Networks (TNNs) into standardized modules, and enabling the construction of robust models through a “combine, align, and co-train” methodology. To demonstrate the application of rLLM, a simple RTL method called BRIDGE is introduced. BRIDGE processes table data using TNNs and utilizes “foreign keys” in relational tables to establish relationships between table samples, which are then analyzed using GNNs. This method considers multiple tables and their interconnections, providing a comprehensive approach to relational data analysis. Also, to address the scarcity of datasets in the emerging field of RTL, the project introduces a robust data collection named SJTUTables, comprising three relational table datasets: TML1M, TLF2K, and TACM12K.

The rLLM project introduces a comprehensive architecture consisting of three main layers: the Data Engine Layer, the Module Layer, and the Model Layer. This structure is designed to facilitate efficient processing and analysis of relational table data.

The Data Engine Layer forms the foundation, focusing on fundamental data structures for graph and table data. It decouples data loading and storage through Dataset subclasses and BaseGraph/BaseTable subclasses, respectively. This design allows for flexible handling of various graph and table data types, optimizing storage and processing for both homogeneous and heterogeneous graphs, as well as table data.

The Module Layer decomposes operations of GNNs, LLMs, and TNNs into standard submodules. For GNNs, it includes GraphTransform for preprocessing and GraphConv for implementing graph convolution layers. LLM modules comprise a Predictor for data annotation and an Enhancer for data augmentation. TNN modules feature TableTransform for mapping features to higher-dimensional spaces and TableConv for multi-layer interactive learning among feature columns.

BRIDGE demonstrates rLLM’s application in RTL-type methods. It addresses relational database complexity by processing both table and non-table features. A Table Encoder, using TableTransform and TableConv modules, handles heterogeneous table data to produce table embeddings. A Graph Encoder, employing GraphTransform and GraphConv modules, models foreign key relationships and generates graph embeddings. BRIDGE integrates outputs from both encoders, enabling simultaneous modeling of multi-table data and their interconnections. The framework supports both supervised and unsupervised training approaches, adapting to various data scenarios and learning objectives.

Experimental results reveal the limitations of traditional single-tabular TNNs in processing relational table data. These TNNs, confined to learning from a single target table, fail to utilize the rich information available in multiple tables and their interconnections, resulting in suboptimal performance. In contrast, the BRIDGE algorithm demonstrates superior capabilities by effectively combining a table encoder with a graph encoder. This integrated approach enables BRIDGE to extract valuable insights from both individual tables and their relationships. Consequently, BRIDGE achieves a significant performance improvement over conventional methods, highlighting the importance of considering the relational structure of data in table learning tasks.

The rLLM framework introduces a robust approach to relational table learning using Large Language Models. It integrates advanced methods and optimizes data structures for improved efficiency. The project invites collaboration from researchers and software engineers to expand its capabilities and applications in the field of relational data analysis.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post rLLM (relationLLM): A PyTorch Library Designed for Relational Table Learning (RTL) with Large Language Models (LLMs) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

rLLM 关系型表格学习 RTL 大型语言模型 LLMs 图神经网络 GNNs 表格神经网络 TNNs PyTorch 人工智能 机器学习
相关文章