About the job
About the Role
Join our innovative team as a Senior Software Engineer specializing in ML Infrastructure: Development Enablement. Our goal is to create a seamless development environment that enables our researchers and engineers to push the boundaries of deep learning models for autonomous driving.
We oversee a high-performance Cloud Development Environment (CDE) platform designed to support standardized, efficient ML development workspaces. In this pivotal role, your contributions will be two-fold:
- Platform Ownership: You will take charge of our CDE platform, ensuring its scalability, reliability, and smooth integration into the ML workflow.
- Agentic Evolution: You will spearhead the transition to Agentic ML Workflows, architecting AI Agents that serve as force multipliers—automating debugging processes, optimizing resource usage, and accelerating the transition from code to trained models.
What You’ll Be Doing
- Scale & Evolve the Dev Platform: Lead engineering initiatives to enhance our existing CDE platform, ensuring it meets the stringent demands of large-scale ML experimentation.
- Architect AI Agents: Design and implement agents powered by LLM capable of managing the ML lifecycle—from automated code suggestions and log analysis to autonomous debugging of distributed training jobs.
- Infrastructure Integration: Create connections between AI agents and our core infrastructure, ensuring safe and effective interaction with Kubernetes, Ray, and AWS.
- Collaborative Automation: Collaborate with ML Engineers to identify obstacles to productivity and build solutions, such as agents that suggest fixes for common PyTorch distributed training errors.
- Champion Engineering Excellence: Introduce software engineering best practices to the rapidly evolving landscape of LLM development, including the establishment of evaluation frameworks for agent performance, reliability, and security.
- Mentor & Lead: Serve as a subject matter expert in Agentic AI within the infrastructure team, mentoring junior engineers and influencing our long-term technical vision.

