MarkTechPost@AI 2024年08月09日
POA: A Novel Self-Supervised Learning Paradigm for Efficient Multi-Scale Model Pre-Training
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Ant Group团队提出POA自监督学习方法,可同时生成不同大小的多个模型,解决现实应用中的部署难题。

🎯POA基于教师-学生自蒸馏框架,引入创新的弹性学生分支,通过参数共享使用一系列子网络,认为小模型是现代网络结构中大模型的子网络。在预训练中,弹性学生从完整学生中随机采样参数,且学生都学习模仿教师网络的输出,能有效在不同参数子集上进行预训练。

📊POA框架使用ViT、Swin Transformer和ResNet三种流行骨干架构进行评估,在ImageNet - 1K数据集上进行预训练,通过k - NN和线性探测分类评估及对象检测和语义分割等下游任务测试性能。弹性学生作为模型集成,使训练过程顺畅并增强学习表示,POA在一次预训练中能在各种模型大小上达到最先进的精度。

💪POA与SEED进行比较,POA在ViT - S / 16和ViT - B / 16上的k - NN精度增益分别为2.8%和2.1%,优于SEED。且直接从POA预训练教师得出的ViT - S / 16和ViT - B / 16模型性能优于SEED增强的模型,尽管SEED使用了两倍的训练周期。

Visual representation learning using large models and self-supervised techniques has shown remarkable success in various visual tasks. However, deploying these models in real-world applications is challenging due to multiple resource constraints such as computation, storage, and power consumption. Adapting large pre-trained models for different scenarios with varying resource limitations involves weight pruning, knowledge distillation, or retraining smaller networks from scratch. These methods require significant development efforts, making it challenging to deploy AI products across various platforms. It poses a critical question: Is it possible to develop a pre-training method that simultaneously produces multiple models of different sizes, each capable of delivering high-quality visual representations?

Existing works attempt to overcome these challenges. One approach, Generative SSL, focuses on learning image representations in pixel space. Meanwhile, discriminative methods aim to bring representations of different views of the same image closer together while separating those from multiple images. Moreover, Contrastive learning with InfoNCE loss has become popular but struggles with dimensional collapse. Methods like AutoFormer and MaskTAS have explored neural architecture search (NAS) to train supernets that support the extraction of optimal sub-networks. However, these approaches often require additional search and re-training phases, which limits their efficiency in generating multiple models of varying sizes simultaneously.

A team from Ant Group has introduced a new self-supervised learning method called POA (Pre-training Once for All) to tackle the challenge of producing multiple models of varying sizes, simultaneously. POA is built upon the teacher-student self-distillation framework, introducing an innovative elastic student branch. This branch uses a series of sub-networks through parameter sharing, based on the idea that smaller models are sub-networks of larger ones in modern network structures. During pre-training, the elastic student randomly samples parameters from the complete student, and both students learn to mimic the teacher network’s output. This method enables effective pre-training on different parameter subsets.

The POA framework is evaluated using three popular backbone architectures: ViT, Swin Transformer, and ResNet. The Pre-training is performed on the ImageNet-1K dataset, with performance tested through k-NN and linear probing classification assessments and downstream tasks like object detection and semantic segmentation. Moreover, the Elastic Student acts as a model ensemble, making the training process smooth and enhancing learned representations. This architecture allows POA to achieve state-of-the-art accuracy across various model sizes in a single pre-training session, showing its ability to produce multiple high-performance models simultaneously.

The POA framework is compared with SEED, a self-supervised knowledge distillation method that uses a pre-trained DINOv2 network as the teacher. SEED significantly improves the performance of ViT-S/16 and ViT-B/16 when distilled from a pre-trained ViT-L/16 teacher, achieving k-NN accuracy gains of 1.8% and 1.4% respectively compared to learning from scratch. However, POA outperforms SEED, achieving even higher k-NN accuracy gains of 2.8% for ViT-S/16 and 2.1% for ViT-B/16. the ViT-S/16 and ViT-B/16 models derived directly from POA’s pre-trained teacher perform better than those enhanced by SEED, despite SEED using twice the training epochs.

In summary, a team from Ant Group has proposed POA (Pre-training Once for All), a new self-supervised learning method to overcome the challenge of producing multiple models of varying sizes. The POA framework integrates self-distillation with once-for-all model generation, allowing simultaneous pre-training of various model sizes through an innovative elastic branch design. This approach significantly enhances deployment flexibility and enables the pre-trained model to achieve state-of-the-art results across various vision tasks. The team plans to extend the POA to Multimodal Large Language Models to explore its potential for real-world AI product deployment.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post POA: A Novel Self-Supervised Learning Paradigm for Efficient Multi-Scale Model Pre-Training appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

POA 自监督学习 模型预训练
相关文章