companyCanva logo

Senior Research Scientist in Reinforcement Learning at Canva | London

CanvaLondon
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

We are looking for candidates with a strong background in reinforcement learning, agentic systems, and mixture of experts models. Proven experience in post-training methodologies and the ability to work in a collaborative environment are essential.

About the job

At Canva, our mission is to empower individuals to unleash their creativity through design. We are innovating AI technology that not only feels intuitive but also creates meaningful impacts for millions, enabling everyone to design with confidence. We are seeking a Senior Research Scientist passionate about reinforcement learning, agentic systems, and mixture of experts (MoE) models to advance our capabilities in reasoning, tool utilization, latency, and reliability.

About the Team

Our team delves into multimodal agentic architectures, establishing robust training and evaluation frameworks. We collaborate closely with product and platform teams to transform groundbreaking research into engaging product features. As a pioneering post-training team, we are dedicated to developing advanced multimodal agentic systems. We cover a wide array of topics, including multimodal modeling, post-training strategies, and agent design.

About the Role

In this role, you will influence research directions and engage in hands-on initiatives across the agent stack—from reward design and policy optimization to planning, memory management, tool orchestration, dataset construction, and the innovation of post-training methodologies. You will create meticulously designed experiments, iterate rapidly, and derive reliable conclusions, all while ensuring that research translates into safe, high-quality product experiences.

Key Responsibilities

  • Design and develop agent systems focused on planning, multimodal tool usage, retrieval, innovative training methods, and modeling experiments for real-world applications in design, vision, and language.

  • Implement scalable post-training and reinforcement learning solutions across distributed systems (using PyTorch), optimizing data loaders, telemetry, and stable training of MoE architectures while ensuring reproducibility.

  • Contribute to the reinforcement learning and agentic systems research agenda that aligns with Canva’s product vision; quickly identify and prioritize high-impact projects.

  • Create reward models and learning loops, including RLHF/RLAIF, preference modeling, DPO/IPO-style objectives, offline/online RL, and curriculum learning.

  • Develop simulation tasks that expose failure modes (planning errors, tool-use weaknesses, hallucinations, unsafe actions) and establish measurable targets for improvement.

  • Lead rigorous evaluations for agents, focusing on task success, reliability, latency, safety, and regression testing. Set up offline suites and conduct online A/B testing; favor straightforward experiments that yield generalizable results.

  • Collaborate closely with product, design, safety, and platform teams to successfully integrate research findings into reliable product features.

About Canva

Join the team that is transforming the way the world experiences designWelcome to Canva! We understand that job hunting can be challenging, so we aim to make your experience as seamless as possible. Work EnvironmentOur vibrant Canva London office is located in the heart of Hoxton Square, Shoreditch, offering a lively atmosphere for creativity and collaboration. While our global headquarters is in Sydney, London serves as our European hub, housing diverse teams and event spaces to foster community engagement. At Canva, you will be greeted with warmth by our Vibe team, enjoy delicious meals prepared by our Head Chef, and have access to a variety of workspaces catering to both team collaboration and individual focus. We believe in empowering our Canvanauts to choose a work balance that supports their goals.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.