Quartz 4

Home

❯

02 AISystem

❯

MLFramework

❯

LLM推理引擎

❯

oneflow

oneflow

Oct 12, 20251 min read

https://zhuanlan.zhihu.com/p/339208452

https://www.nvidia.com/en-us/on-demand/session/gtc25-s72568/ https://arxiv.org/pdf/2402.17463 https://github.com/HKUNLP/ChunkLlama https://github.com/vllm-project/vllm/pull/11844 https://arxiv.org/pdf/2407.02490 https://github.com/microsoft/MInference https://github.com/vllm-project/flash-attention/pull/33 https://www.alibabacloud.com/help/en/pai/user-guide/what-is-bladellm/ https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/fused_moe.py https://qwenlm.github.io/zh/blog/qwen2.5-1m/


Graph View

Backlinks

  • LLM推理引擎

Created with Quartz v4.5.2 © 2025

  • GitHub
  • Discord Community