热点
关于我们
xx
xx
"
预训练数据集
" 相关文章
Training on Documents About Reward Hacking Induces Reward Hacking
少点错误
2025-01-21T21:36:15.000000Z
OpenCSG开源最大中文合成数据集Chinese Cosmopedia
魔搭ModelScope社区
2025-01-20T16:07:49.000000Z
最大的顶级数据集开源,HuggingFace排名第一,可创建15万亿Token
OneFlow
2024-10-28T00:10:10.000000Z
LLM360 Group Introduces TxT360: A Top-Quality LLM Pre-Training Dataset with 15T Tokens
MarkTechPost@AI
2024-10-09T06:06:03.000000Z
MINT-1T: An Open-Source Trillion Token Multimodal Interleaved Dataset and a Key Component for Training Large Multimodal Models LMMs
MarkTechPost@AI
2024-06-20T07:01:47.000000Z