MarkTechPost@AI 2024年08月19日
Google DeepMind Researchers Propose a Dynamic Visual Memory for Flexible Image Classification
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种创新的视觉记忆系统,解决深度学习模型静态表示知识的问题,该系统能无缝添加和移除数据,提高图像分类准确性

💡该视觉记忆系统将深度神经网络的符号强度与视觉记忆数据库的适应性相结合,通过分解图像分类任务,实现更灵活的数据处理。它构建了一个从预训练图像编码器中提取的特征-标签对数据库,能根据余弦相似度快速检索k个最近邻进行分类,无需重新训练模型

🎯传统图像分类方法依赖静态模型,需重新训练以纳入新类别或数据集,而该系统提出的基于检索的视觉记忆系统及RankVoting聚合方法,有效解决了现有聚合技术的局限性,提高了分类准确性

🚀此视觉记忆系统具有出色的性能指标,RankVoting方法随着邻居数量增加能提高准确性,稳定性能。通过结合Gemini的视觉语言模型重新排序检索到的邻居,实现了高达88.5%的ImageNet验证准确率,超越了其他基线性能

🌐视觉记忆的灵活性使其可扩展到数十亿规模的数据集,且能通过遗忘和记忆修剪移除过时数据,适应动态环境中持续学习和更新的需求

Deep learning models typically represent knowledge statically, making adapting to evolving data needs and concepts challenging. This rigidity necessitates frequent retraining or fine-tuning to incorporate new information, which could be more practical. The research paper “Towards Flexible Perception with Visual Memory” by Geirhos et al. presents an innovative solution that integrates the symbolic strength of deep neural networks with the adaptability of a visual memory database. By decomposing image classification into image similarity and fast nearest neighbor retrieval, the authors introduce a flexible visual memory capable of adding and removing data seamlessly. 

Current methods for image classification often rely on static models that require retraining to incorporate new classes or datasets. Traditional aggregation techniques, such as plurality and softmax voting, can lead to overconfidence in predictions, particularly when considering distant neighbors. The authors propose a retrieval-based visual memory system that builds a database of feature-label pairs extracted from a pre-trained image encoder, such as DinoV2 or CLIP. This system allows for rapid classification by retrieving the k nearest neighbors based on cosine similarity, enabling the model to adapt to new data without retraining.

The methodology consists of two main steps: constructing the visual memory and performing nearest neighbor-based inference. Visual memory is created by extracting and storing features from a dataset in a database. When a query image is presented, its features are compared to those in the visual memory to retrieve the nearest neighbors. The authors introduce a novel aggregation method called RankVoting, which assigns weights to neighbors based on rank, outperforming traditional methods and enhancing classification accuracy.

The proposed visual memory system demonstrates impressive performance metrics. The RankVoting method effectively addresses the limitations of existing aggregation techniques, which often suffer from performance decay as the number of neighbors increases. In contrast, RankVoting improves accuracy with more neighbors, stabilizing performance at higher counts. The authors report achieving an outstanding 88.5% top-1 ImageNet validation accuracy by incorporating Gemini’s vision-language model to re-rank the retrieved neighbors. This surpasses the baseline performance of both the DinoV2 ViT-L14 kNN (83.5%) and linear probing (86.3%).

The flexibility of the visual memory allows it to scale to billion-scale datasets without additional training, and it can also remove outdated data through unlearning and memory pruning. This adaptability is crucial for applications requiring continuous learning and updating in dynamic environments. The results indicate that the proposed visual memory not only enhances classification accuracy but also offers a robust framework for integrating new information and maintaining model relevance over time, providing a reliable solution for dynamic learning environments.

 The research highlights the immense potential of a flexible visual memory system as a solution to the challenges posed by static deep learning models. By enabling the addition and removal of data without retraining, the proposed method addresses the need for adaptability in machine learning. The RankVoting technique and the integration of vision-language models demonstrate significant performance improvements, paving the way for the widespread adoption of visual memory systems in deep learning applications and inspiring optimism for their future applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Google DeepMind Researchers Propose a Dynamic Visual Memory for Flexible Image Classification appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉记忆 图像分类 RankVoting 深度学习
相关文章