About the job
About Etched
At Etched, we are pioneering the world's first AI inference system uniquely designed for transformers, achieving over 10x greater performance, along with significantly reduced costs and latency compared to conventional solutions like the B200. Our innovative ASICs enable the creation of groundbreaking products, such as real-time video generation models and highly advanced deep reasoning agents. With substantial backing from premier investors and a team of top engineers, Etched is transforming the infrastructure layer for the fastest growing industry in history.
Key Responsibilities
Develop detailed performance models and forecasts for Etched's transformer-centric architecture across various workloads and configurations.
Profile and assess deep learning workloads on Etched to detect micro-architectural bottlenecks and potential optimization areas.
Create analytical and simulation-driven models to anticipate performance across different architectural setups and design trade-offs.
Collaborate with hardware architects to influence micro-architectural decisions based on workload characteristics and performance insights.
Facilitate hardware/software co-optimization by pinpointing opportunities where architectural features can substantially enhance performance.
Analyze and optimize memory hierarchy efficiency, interconnect utilization, and computational resource effectiveness.
Establish performance benchmarking frameworks and methodologies tailored specifically for transformer inference workloads.
Performance Characterization
Construct detailed roofline models and performance forecasts for Etched across various transformer architectures (e.g., Llama, Mixtral).
Profile production inference workloads to identify and mitigate micro-architectural bottlenecks.
Evaluate memory bandwidth, compute utilization, and interconnect performance to inform next-gen architecture decisions.
Develop performance modeling tools that forecast chip behavior based on different batch sizes, sequence lengths, and model configurations.
Characterize the performance implications of architectural features such as specialized datapaths, memory hierarchies, and on-chip interconnects.
Benchmark Etched's architectural efficiency against competitive solutions to ensure industry-leading performance.

