About the job
Join Huawei Canada as a Co-op Research Engineer specializing in AI Computing Systems.
About Our Team:
The Advanced Computing and Storage Lab, part of the Vancouver Research Centre, is dedicated to pioneering adaptive computing system architectures to tackle the challenges posed by dynamic application demands. Our mission includes enhancing the stability and quality of training clusters, developing dynamic configuration strategies, and implementing precision control systems to create robust computing power. We focus on critical AI applications, like large model training and inference, utilizing advanced technologies such as low-precision training, multi-modal training, and reinforcement learning to optimize performance and usability.
Position Overview:
This role is centered on enhancing the performance, efficiency, and usability of AI systems on the Ascend platform, specifically for large model training and inference. Responsibilities include low-precision training, multimodal optimization, and resource management to mitigate system bottlenecks and deliver cutting-edge AI capabilities.
You will design and develop optimization solutions for AI training and inference systems, focusing on FP8 optimization, RL-driven training agents, and next-generation multi-modal understanding and generation.
The role requires integration of AI algorithm needs with system-level architectural optimizations in computing, I/O, scheduling, and precision control to elevate performance.
You will build stable and efficient AI training clusters by leveraging dynamic configurations and precision control to ensure scalability and reliability.
This position involves the development of software frameworks, operator libraries, and acceleration libraries for NPU platforms to expedite large-model AI training.
You will drive innovation in optimizing large-model training and inference using strategies like low-precision training and reinforcement learning.

