🔁 Hugging Face 转推了
Guilherme Penedo @gui_penedo
We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset.
Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.
Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.
