cs.AI updates on arXiv.org 16小时前
MNIST-Gen: A Modular MNIST-Style Dataset Generation Using Hierarchical Semantics, Reinforcement Learning, and Category Theory
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种名为MNIST-Gen的自动化框架,通过语义分类生成定制化的MNIST风格图像数据集,适用于特定领域任务,如树木、食品分类,同时实现高效自动分类和节省时间。

arXiv:2507.11821v1 Announce Type: cross Abstract: Neural networks are often benchmarked using standard datasets such as MNIST, FashionMNIST, or other variants of MNIST, which, while accessible, are limited to generic classes such as digits or clothing items. For researchers working on domain-specific tasks, such as classifying trees, food items, or other real-world objects, these data sets are insufficient and irrelevant. Additionally, creating and publishing a custom dataset can be time consuming, legally constrained, or beyond the scope of individual projects. We present MNIST-Gen, an automated, modular, and adaptive framework for generating MNIST-style image datasets tailored to user-specified categories using hierarchical semantic categorization. The system combines CLIP-based semantic understanding with reinforcement learning and human feedback to achieve intelligent categorization with minimal manual intervention. Our hierarchical approach supports complex category structures with semantic characteristics, enabling fine-grained subcategorization and multiple processing modes: individual review for maximum control, smart batch processing for large datasets, and fast batch processing for rapid creation. Inspired by category theory, MNIST-Gen models each data transformation stage as a composable morphism, enhancing clarity, modularity, and extensibility. As proof of concept, we generate and benchmark two novel datasets-\textit{Tree-MNIST} and \textit{Food-MNIST}-demonstrating MNIST-Gen's utility for producing task-specific evaluation data while achieving 85\% automatic categorization accuracy and 80\% time savings compared to manual approaches.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MNIST-Gen 数据集生成 神经网络 语义分类
相关文章