At Canva, our mission is to empower individuals to unleash their creativity through design. We are innovating AI technology that not only feels intuitive but also creates meaningful impacts for millions, enabling everyone to design with confidence. We are seeking a Senior Research Scientist passionate about reinforcement learning, agentic systems, and mixture of experts (MoE) models to advance our capabilities in reasoning, tool utilization, latency, and reliability.About the TeamOur team delves into multimodal agentic architectures, establishing robust training and evaluation frameworks. We collaborate closely with product and platform teams to transform groundbreaking research into engaging product features. As a pioneering post-training team, we are dedicated to developing advanced multimodal agentic systems. We cover a wide array of topics, including multimodal modeling, post-training strategies, and agent design.About the RoleIn this role, you will influence research directions and engage in hands-on initiatives across the agent stack—from reward design and policy optimization to planning, memory management, tool orchestration, dataset construction, and the innovation of post-training methodologies. You will create meticulously designed experiments, iterate rapidly, and derive reliable conclusions, all while ensuring that research translates into safe, high-quality product experiences.Key ResponsibilitiesDesign and develop agent systems focused on planning, multimodal tool usage, retrieval, innovative training methods, and modeling experiments for real-world applications in design, vision, and language.Implement scalable post-training and reinforcement learning solutions across distributed systems (using PyTorch), optimizing data loaders, telemetry, and stable training of MoE architectures while ensuring reproducibility.Contribute to the reinforcement learning and agentic systems research agenda that aligns with Canva’s product vision; quickly identify and prioritize high-impact projects.Create reward models and learning loops, including RLHF/RLAIF, preference modeling, DPO/IPO-style objectives, offline/online RL, and curriculum learning.Develop simulation tasks that expose failure modes (planning errors, tool-use weaknesses, hallucinations, unsafe actions) and establish measurable targets for improvement.Lead rigorous evaluations for agents, focusing on task success, reliability, latency, safety, and regression testing. Set up offline suites and conduct online A/B testing; favor straightforward experiments that yield generalizable results.Collaborate closely with product, design, safety, and platform teams to successfully integrate research findings into reliable product features.
Feb 25, 2026