Runway ML logo

Technical Research Engineer - GPU Performance

Runway MLRemote
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

4+ years of experience in systems engineering, machine learning infrastructure, or performance optimization for deep learning. Familiarity with GPU kernel development (CUDA, Triton, CUTLASS) and distributed systems (NCCL, collective communication, model parallelism). Experience with ML framework internals (PyTorch, TensorFlow, etc.) and optimization techniques.

About the job

At Runway ML, we are at the intersection of art and science, harnessing AI to create simulations of the world. Our vision is that world models represent the cutting edge of artificial intelligence. We believe that traditional language models alone fall short in addressing the most pressing global challenges, such as robotics, healthcare, and scientific innovation. Genuine advancement necessitates models that can learn from their interactions with the world, just like humans do. This iterative learning process can be substantially expedited through simulation rather than real-world experimentation.

World models pave the way for groundbreaking general-purpose simulations, revolutionizing storytelling, scientific exploration, and the pursuit of new horizons for humanity.

Our team is composed of imaginative, open-minded, compassionate, and driven individuals committed to making a difference. We aim to continuously achieve the remarkable, and our success hinges on cultivating an exceptional team. If you share this ambition, we invite you to connect with us.

Role Overview

We are seeking Research Engineers dedicated to enhancing the efficiency and speed of our world models without sacrificing their capabilities. Your responsibilities will include profiling, optimizing, and rearchitecting systems that transform research concepts into scalable, real-time models , directly influencing computational possibilities and the capabilities we can develop.

Key Responsibilities

  • Optimize training throughput on expansive GPU clusters, enhancing MFU through custom kernels, mixed-precision strategies (FP8, BF16), memory-efficient attention mechanisms, and activation checkpointing.
  • Design and uphold a distributed training infrastructure encompassing tensor parallelism, context parallelism, FSDP, and resilient multi-node configurations.
  • Profile and accelerate inference pipelines for real-time multimodal generation, including CUDA graph compilation, KV cache optimization, operator fusion, and latency reduction.
  • Optimize and scale our training infrastructure to bolster efficiency and reliability.
  • Contribute across the entire stack, from low-level kernel optimizations to high-level model design.

Qualifications

  • 4+ years of experience in systems engineering, machine learning infrastructure, or performance optimization specifically for deep learning.
  • Proficiency in GPU kernel development (CUDA, Triton, CUTLASS) and familiarity with distributed systems (NCCL, collective communication, model parallelism).
  • Understanding of machine learning framework internals (PyTorch, TensorFlow, etc.) and hands-on experience with optimization techniques.

About Runway ML

Runway ML is a pioneering company that blends art and science through advanced AI technologies. We are dedicated to building sophisticated models that simulate the world, driving innovation in various fields such as robotics, healthcare, and scientific research.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.