热点
关于我们
xx
xx
"
图像字幕
" 相关文章
SEMT: Static-Expansion-Mesh Transformer Network Architecture for Remote Sensing Image Captioning
cs.AI updates on arXiv.org
2025-07-18T04:14:04.000000Z
TNNLS24|动态网络!同一个模型走不同路径,就能生成不同的图像描述结果!
我爱计算机视觉
2024-11-14T12:11:04.000000Z
BLIP3-KALE: An Open-Source Dataset of 218 Million Image-Text Pairs Transforming Image Captioning with Knowledge-Augmented Dense Descriptions
MarkTechPost@AI
2024-11-14T07:50:16.000000Z
Pattern Recognition | 同时关注局部和全局信息,利用注意力抓取不同粒度的视觉信息来描述图片
我爱计算机视觉
2024-11-08T14:02:24.000000Z
Google DeepMind Unveils PaliGemma: A Versatile 3B Vision-Language Model VLM with Large-Scale Ambitions
MarkTechPost@AI
2024-07-12T11:16:31.000000Z