热点
"推测解码" 相关文章
大语言模型推理优化技术综述(The Art of LLM Inference)
掘金 人工智能 2025-05-28T04:13:04.000000Z
苹果正在与英伟达合作,想让AI的响应速度更快
虎嗅-AI 2024-12-23T11:22:15.000000Z
Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding
MarkTechPost@AI 2024-11-13T16:04:56.000000Z
Transformer inference tricks
Artificial Fintelligence 2024-10-22T06:07:41.000000Z
AMD发布首个AI小语言模型:6900亿token、推测解码提速3.88倍
快科技资讯 2024-10-01T08:46:53.000000Z
This AI Paper from KAIST AI Introduces a Novel Approach to Improving LLM Inference Efficiency in Multilingual Settings
MarkTechPost@AI 2024-10-01T07:35:06.000000Z
AMD 推出自家首款小语言 AI 模型“Llama-135m ”,主打“推测解码”能力可减少 RAM 占用
IT之家 2024-09-29T09:23:30.000000Z
Together AI Optimizing High-Throughput Long-Context Inference with Speculative Decoding: Enhancing Model Performance through MagicDec and Adaptive Sequoia Trees
MarkTechPost@AI 2024-09-10T08:20:14.000000Z
3天把Llama训成Mamba,性能不降,推理更快!
智源社区 2024-09-06T05:07:41.000000Z
3天把Llama训成Mamba,性能不降,推理更快
36kr 2024-09-05T07:18:34.000000Z