https://zhuanlan.zhihu.com/p/339208452
https://www.nvidia.com/en-us/on-demand/session/gtc25-s72568/ https://arxiv.org/pdf/2402.17463 https://github.com/HKUNLP/ChunkLlama https://github.com/vllm-project/vllm/pull/11844 https://arxiv.org/pdf/2407.02490 https://github.com/microsoft/MInference https://github.com/vllm-project/flash-attention/pull/33 https://www.alibabacloud.com/help/en/pai/user-guide/what-is-bladellm/ https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/fused_moe.py https://qwenlm.github.io/zh/blog/qwen2.5-1m/