🔁 Hugging Face 转推了
TuringPost @TheTuringPost
A cool multilingual dataset for you — FineWeb2 by @huggingface
➡️ Builds a multilingual web-scale dataset pipeline
➡️ Adapts to 1,000+ languages with language-specific filtering and rebalancing
➡️ Totally provides filtered data for 1,868 language-script pairs
➡️ Builds a multilingual web-scale dataset pipeline
➡️ Adapts to 1,000+ languages with language-specific filtering and rebalancing
➡️ Totally provides filtered data for 1,868 language-script pairs
