热点
"长视频处理" 相关文章
STORM (Spatiotemporal TOken Reduction for Multimodal LLMs): A Novel AI Architecture Incorporating a Dedicated Temporal Encoder between the Image Encoder and the LLM
MarkTechPost@AI 2025-03-11T07:35:16.000000Z
Researchers from China Develop Advanced Compression and Learning Techniques to process  Long-Context Videos at 100 Times Less Compute
MarkTechPost@AI 2025-01-20T01:30:36.000000Z
CoordTok: A Scalable Video Tokenizer that Learns a Mapping from Co-ordinate-based Representations to the Corresponding Patches of Input Videos
MarkTechPost@AI 2024-12-26T02:04:47.000000Z
Processing 2-Hour Videos Seamlessly: This AI Paper Unveils LONGVILA, Advancing Long-Context Visual Language Models for Long Videos
MarkTechPost@AI 2024-08-23T18:49:53.000000Z