At Canva, we are on a mission to empower individuals to unleash their creativity through design. We are innovating with artificial intelligence to deliver transformative experiences for millions, enabling everyone to design with confidence. We are seeking a passionate and experienced Senior Research Scientist specializing in reinforcement learning, agentic systems, and mixture of expert models to advance the boundaries of reasoning, tool usage, latency, and reliability, ultimately delivering impactful solutions to our users.About the TeamOur team engages in exploring multimodal agentic architectures, creating scalable training and evaluation loops, and collaborating closely with product and platform teams to translate our breakthroughs into user-friendly product features. As part of a pioneering post-training team, we are focused on developing state-of-the-art multimodal agentic systems, addressing various aspects of multimodal modeling, post-training methodologies, and design agents. We are in search of a candidate with robust experience in post-training, reinforcement learning (RL), and mixture of expert models to join our innovative group.About the RoleYou will lead research initiatives and be actively involved in hands-on tasks throughout the agent stack—from reward design and policy optimization to planning, memory management, and tool orchestration, as well as dataset construction and the development of cutting-edge post-training approaches. You will conduct tightly controlled experiments, iterate rapidly, and draw trustworthy conclusions. Most importantly, you will play a key role in converting research insights into reliable, safe, and high-quality product experiences.Your ResponsibilitiesDesign and develop agent systems (including planning, multimodal tool usage, retrieval, novel training approaches, and modeling ablations) for practical applications in design, vision, and language.Enhance post-training and RL processes across distributed systems (using PyTorch) while ensuring efficient data loaders, telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize performance.Contribute to the research agenda for RL and agentic systems that align with Canva’s product objectives; quickly identify high-leverage opportunities and eliminate dead ends.Develop reward models and learning loops, including RLHF/RLAIF, preference modeling, DPO/IPO-style objectives, offline and online RL, curriculum learning, and credit assignment for multi-step reasoning.Create simulation and sandbox tasks that reveal failure modes (such as planning errors, tool-use brittleness, hallucination, and unsafe actions) and turn these into measurable targets.Establish rigorous evaluation standards for agents (task success, reliability, latency, safety, regressions). Implement offline suites and online A/B tests, prioritizing straightforward, controlled experiments that yield generalizable results.Work collaboratively with product, design, safety, and platform teams to successfully integrate research findings into dependable features.
Feb 25, 2026