About the job
Intrinsic Robotics, a pioneering venture under Alphabet, is on a mission to redefine the capabilities of industrial robotics. Our team is committed to harnessing advancements in AI, perception, and simulation to revolutionize the field of industrial robotics, placing software and data at its core.
Our goal is to make intelligent robotics accessible and usable for countless businesses, entrepreneurs, and developers. We are a vibrant group of engineers, roboticists, designers, and technologists dedicated to unlocking the creative and economic potential of industrial robotics.
Role
As the Senior Software Engineering Lead, you will spearhead an engineering team focused on ensuring the resilience and reliability of our robotics software platform. You will architect systems that proactively identify and mitigate vulnerabilities across our APIs, SDKs, web interfaces, and cloud-to-edge communication layers. By simulating diverse scenarios, including AI model inference timeouts, high network latencies, data pipeline congestion, and malformed inputs, you will guarantee that our platform remains robust and predictable, even under adverse conditions. You will lead a specialized team in developing automated frameworks that replicate real-world disruptions, thereby providing a solid infrastructure for developers crafting the next generation of AI-driven robotics. Your responsibilities will also encompass the creation of essential monitoring tools to gain in-depth insights into overall system health. As a pivotal technical leader, you will collaborate with elite engineering teams in Mountain View and Munich to align resilience strategies and establish global benchmarks for software reliability.
How Your Work Propels Our Mission
- Design and implement automated resilience tests focusing on service boundaries and hybrid environments (both on-prem and Cloud).
- Enhance the robustness of AI integrations by applying failure injection within data pipelines.
- Utilize fuzzing and property-based testing techniques across the platform to ensure efficient degradation.
- Strengthen the stability of developer tools and frontend systems against latency and service interruptions.
- Foster a culture of reliability through mentorship in defensive programming and by leading global "Game Day" exercises.
- Develop observability tools to monitor and analyze comprehensive system health.

