About the job
Join Huawei Canada as an AI Systems Engineer specializing in Serverless Distributed Computing.
About Our Team:
The Distributed Data Storage and Management Lab is at the forefront of research in distributed data systems. We are dedicated to creating next-generation cloud serverless products that integrate essential infrastructure and databases. Our research tackles complex data challenges such as cloud-native disaggregated databases, innovative pay-by-query user models, and enhancing low-level data transfers using RDMA. Our teams are instrumental in developing advanced cloud serverless data infrastructures and implementing state-of-the-art networking technologies for Huawei's global AI ecosystem.
Your Role:
Design and implement frameworks and engines for cutting-edge serverless computing optimized for AI workloads, including LLM training/inference and reinforcement learning.
Evaluate and enhance the performance of end-to-end AI systems by focusing on distributed scheduling, data flow, and memory utilization across extensive clusters.
Conduct research and assess breakthrough technologies in distributed computing, serverless infrastructure, reinforcement learning, and LLM-driven AI agents.
Engage collaboratively with research, product, and platform teams to evolve conceptual AI agent or RL research into scalable production systems.
Provide thought leadership through innovation, technical presentations, and patent contributions.
Stay informed on industry trends by assessing emerging tools and frameworks (e.g., Ray, SkyPilot, vLLM, DeepSpeed, Mojo, etc.) to guide the team.
The total target annual compensation for this position ranges from $127,000 to $225,000 based on education, experience, and demonstrated expertise.

