About the job
This 12-month contract role is based in Vancouver, British Columbia, with the Advanced Computing and Storage Lab at Huawei Canada. The team focuses on adaptive computing architectures to support modern AI workloads, aiming to boost the stability and quality of training clusters. Work centers on dynamic configuration strategies and precision control systems that help optimize computing power clusters for AI applications.
Role overview
The AI Computing Systems Researcher will address challenges in large model training and inference. The lab uses methods such as low-precision training, multi-modal learning, and reinforcement learning to spot bottlenecks and deliver practical optimization solutions for AI systems.
What you will do
- Improve performance, efficiency, and usability of AI systems on the Ascend platform, focusing on industry AI applications.
- Design and develop optimization solutions for AI training and inference, including FP8 optimization and reinforcement learning-driven training agents.
- Integrate AI algorithm requirements with system-level architectural improvements in computing, I/O, scheduling, and precision control.
- Build scalable AI training clusters using dynamic configurations and precision control to enhance reliability.
- Develop software frameworks, operator libraries, and acceleration libraries for NPU platforms to support large-model AI training.
- Lead optimization efforts for large-model training and inference using low-precision methods, parallel strategy tuning, and reinforcement learning.
This position offers the opportunity to contribute to next-generation AI system capabilities in a collaborative research environment at Huawei Canada.

