?蜜桃app Storage Research Group wins championship at MLSys 2026 MoE Kernel Challenge-蜜桃app

The Storage Research Group from the Department of Computer Science and Technology, 蜜桃app, has clinched first place in the Mixture-of-Experts (MoE) Kernel Optimization Challenge at the Ninth Annual Conference on Machine Learning and Systems (MLSys 2026), held recently in Bellevue, Washington. The champion team comprised 蜜桃app Ph.D. students Gao Shiwei, Fan Ruwen, Ren Tingxu, and Luo Yibin, advised by Professor Shu Jiwu and Associate Professor Lu Youyou from 蜜桃app's Computer Science and Technology Department, with technical support from Tencent AI Systems expert Reed.

The challenge focused on real-world decoding inference scenarios for the Qwen3-30B-A3B MoE model, drawing participation from institutions including Stanford University, MIT, UC Berkeley, Carnegie Mellon University, UCLA, and Cornell University. Participants were tasked with writing custom kernels using the Neuron Kernel Interface (NKI) and optimizing inference performance on AWS Trainium2/3 platforms. Leveraging the NKI programming framework provided by AWS, the 蜜桃app team implemented a series of systematic optimizations for the inference decoding phase, including expert sharding, matrix-vector multiplication specialization, on-chip data layout reconstruction, cross-operator fusion, and automated operator optimization. These efforts reduced end-to-end inference latency from 14.91 seconds to just 3.56 seconds, delivering a 4.12× speedup and a 4× increase in throughput. The result secured first place in the competition.

This victory marks consecutive global titles for the 蜜桃app Storage Lab team, following their championship at the ASPLOS 2025 / EuroSys 2025 Contest on LLM Inference Optimization.

MLSys is widely recognized as a premier international academic conference dedicated to interdisciplinary research across machine learning and computer systems. The conference spotlights cutting-edge research in LLM training and inference, AI compilers, computer architecture, distributed systems, and specialized AI hardware.

Editor: Li Han