热点
"并行解码" 相关文章
突破扩散LLM瓶颈,英伟达港大提全新Fast-dLLM,推理加速27.6倍!
智源社区 2025-06-18T06:37:50.000000Z
NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMs
MarkTechPost@AI 2025-06-02T05:10:55.000000Z
How Rufus doubled their inference speed and handled Prime Day traffic with AWS AI chips and parallel decoding
AWS Machine Learning Blog 2025-05-28T13:41:03.000000Z
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
2024-10-02T06:00:21.000000Z