A Large-Scale Web Search Dataset for Federated Online Learning to Rank

cs.AI updates on arXiv.org 13小时前

A Large-Scale Web Search Dataset for Federated Online Learning to Rank

本文提出AOL4FOLTR，一个包含2.6百万查询的大型网络搜索数据集，旨在解决FOLTR基准测试的局限性，通过引入用户标识、真实点击数据和查询时间戳，实现更真实的用户分区、行为建模和异步联邦学习。

arXiv:2508.12353v1 Announce Type: cross Abstract: The centralized collection of search interaction logs for training ranking models raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers a privacy-preserving alternative by enabling collaborative model training without sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning of classical learning-to-rank datasets, simulated user clicks, and the assumption of synchronous client participation. This oversimplifies real-world dynamics and undermines the realism of experimental results. We present AOL4FOLTR, a large-scale web search dataset with 2.6 million queries from 10,000 users. Our dataset addresses key limitations of existing benchmarks by including user identifiers, real click data, and query timestamps, enabling realistic user partitioning, behavior modeling, and asynchronous federated learning scenarios.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

FOLTR 隐私保护数据集

相关文章

MS MARCO Web Search: A Large-Scale Information-Rich Web Dataset Featuring Millions of Real Clicked Query-Document Labels

Understanding Cultural Style Trends with Computer Vision w/ Kavita Bala - #410

Neural Augmentation for Wireless Communication with Max Welling - #398

This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars

向未授权设备说“不”，苹果和谷歌联合推出防追踪新功能

沪版“八达通”来了，可乘车、观光、购物，一站式解决境内外游客支付痛点

New privacy-preserving robotic cameras obscure images beyond human recognition

CinePile: A Novel Dataset and Benchmark Specifically Designed for Authentic Long-Form Video Understanding

Federated Learning: Decentralizing AI to Enhance Privacy and Security

Recall feature in Microsoft Copilot+ PCs raises privacy and security concerns