热点
"INT-FlashAttention" 相关文章
Researchers from China Introduce INT-FlashAttention: INT8 Quantization Architecture Compatible with FlashAttention Improving the Inference Speed of FlashAttention on Ampere GPUs
MarkTechPost@AI 2024-10-01T05:06:26.000000Z