About the job
About Etched
At Etched, we are pioneering the world’s first AI inference system designed specifically for transformers, achieving more than 10x the performance and significantly reduced costs and latency compared to traditional systems. Our custom ASICs enable the creation of groundbreaking products, such as real-time video generation models and highly complex reasoning agents. Supported by substantial investments from leading venture capitalists and a team of top-tier engineers, we are at the forefront of revolutionizing the infrastructure for the fastest growing industry in history.
Job Overview
We are seeking a proactive Developer Experience Engineer to drive improvements in developer productivity, automation, and infrastructure within our hardware and software teams. This role sits at the crossroads of DevOps, software engineering, and high-performance computing (HPC), where you will design and implement systems that expedite chip design, simulation, and the deployment of AI models in both cloud and on-premises environments.
Key Responsibilities
Create and maintain automation tools to enhance development, testing, and deployment processes.
Optimize and manage job scheduling using Slurm for AI tasks, simulations, and chip design.
Develop monitoring solutions with Grafana, Prometheus, and OpenTelemetry to track pipelines, infrastructure, and compute clusters.
Oversee and enhance containerized environments with Docker and Kubernetes to improve scalability and reproducibility.
Refine build, test, and deployment pipelines using CI/CD tools such as GitHub Actions, Jenkins, Buildkite, or Bazel.
Establish caching and artifact management systems to minimize build times and optimize dependency management.
Integrate and manage cloud resources (AWS, GCP) for scalable compute, storage, and hybrid workloads.
Assist in security and compliance initiatives, including secrets management and access control.
Document and disseminate best practices for effective developer tooling and workflows.
Ideal Candidate Qualifications
Proficient in Python for automation, scripting, and infrastructure development.
Experience with Slurm job scheduling in HPC or hybrid settings.
Familiarity with cloud services (AWS, GCP) and container orchestration tools.
Strong understanding of CI/CD practices and tools.
Excellent problem-solving and communication skills.

