About the job
About Etched
Etched is pioneering the world's first AI inference system specifically designed for transformers, offering over 10x greater performance while significantly reducing costs and latency compared to traditional options like the B200. With Etched ASICs, you can create products that were previously unattainable with GPUs, such as real-time video generation models and highly complex reasoning agents. Supported by substantial investments from leading investors and a team of top engineers, Etched is revolutionizing the infrastructure layer for one of the fastest-growing industries globally.
Job Summary
As an Infrastructure Software Engineer, you will be essential in developing state-of-the-art model-specific ASICs by constructing custom infrastructure and toolchains. This role focuses on ensuring ultra-fast, reliable, and scalable development from simulation to silicon. At Etched, we approach infrastructure development with the same best practices that we apply to our products, incorporating rigorous design discipline and high-quality standards in our testing processes.
You will spearhead the creation and adoption of next-generation infrastructure tools, empowering our ASIC, Software, and Platform engineers to accelerate iterations, increase reliability, and expand the frontiers of AI performance. Responsibilities include building and optimizing our hybrid high-performance compute (HPC) cluster for extensive parallel CI, EDA workflows, emulation, and hardware-aware job execution.
Additionally, you will design and implement an advanced observability stack featuring LLM integration, focusing on health and performance telemetry, log aggregation, distributed tracing, insight generation, synthetic testing, and intelligent alerting across CI pipelines, simulation clusters, and service endpoints.
This role demands a robust software engineering mindset, quality orientation, and a comprehensive understanding of systems. It involves not just writing scripts, but creating infrastructure code with precision, repeatability, and purpose.
Key Responsibilities
Architect and Scale Distributed Compute Systems: Design and build the orchestration layers driving our hybrid high-performance clusters—facilitating simulation, synthesis, and continuous integration of AI ASICs at an unprecedented scale.
Build Infrastructure-as-Code Systems: Develop and maintain a fully programmable infrastructure control plane to guarantee reproducibility, auditability, and swift iteration throughout the entire stack.

