About the job
Waymo is a pioneering autonomous driving technology company dedicated to becoming the world’s most reliable driver. Originating from the Google Self-Driving Car Project in 2009, we have concentrated on developing the Waymo Driver—The World’s Most Experienced Driver™—which aims to enhance mobility access and prevent traffic-related fatalities. The Waymo Driver supports Waymo’s fully autonomous ride-hailing service and is adaptable across various vehicle platforms and product applications. Having completed over ten million rider-only trips, our technology has autonomously driven over 100 million miles on public roads and tens of billions in simulations across more than 15 U. S. states.
Our Semantics team is committed to producing the highest-fidelity and most comprehensive offboard perception autolabels at scale, serving as the cornerstone for training and validating our autonomous vehicle stack. We are an advanced ML and engineering team utilizing cutting-edge computer vision, deep learning, and generative AI technologies to automatically analyze driving logs, generate detailed scene understanding, and fuel the data engine that allows Waymo to scale safely and efficiently.
In this hybrid role, you will report directly to a Technical Lead Manager.
Your Responsibilities Will Include:
- Developing and training state-of-the-art computer vision and multimodal models (e.g., Gemini) to extract rich semantic information such as object attributes and scene properties essential for the AI agent.
- Designing and implementing a scalable AI agent framework that integrates large foundation models (e.g., Gemini) with outputs from our perception models and internal knowledge bases.
- Utilizing Fine-tuning and Reinforcement Learning (RL) techniques to establish a 'data flywheel' that continually enhances the system’s captioning and reasoning capabilities through automated feedback.
- Creating and prototyping innovative prompting strategies for Vision-Language Models (VLMs) to elicit complex causal reasoning related to driving scenarios.
- Collaborating closely with the ML Infrastructure, Perception, Behavior, and AI Foundation teams to define data requirements and integrate the captioning system into the broader ML development lifecycle.
- Taking ownership of the entire system lifecycle, from advanced model development and prototyping to production deployment and scaling for massive data generation.

