About the job
Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, a staggering 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of dozens of GPUs within a single chip, paired with the simplicity of a unified programming interface. This unique approach enables us to achieve unparalleled training and inference speeds, empowering machine learning practitioners to execute large-scale ML applications effortlessly, without the complexities associated with hundreds of GPUs or TPUs.
Among our esteemed clientele are leading model laboratories, global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year collaboration with Cerebras, aiming to leverage 750 megawatts of scale to revolutionize key workloads through ultra-high-speed inference.
Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution available today, boasting speeds over ten times faster than GPU-based hyperscale cloud services. This extraordinary increase in speed is reshaping the user experience of AI applications, enabling real-time iterations and enhancing intelligence through advanced agentic computation.
About The Role
Join our inference model team, dedicated to advancing state-of-the-art models by numerically validating and accelerating innovative concepts on our wafer-scale hardware. In this role, you will prototype architectural enhancements, construct performance evaluation pipelines, and translate quantitative insights into actionable changes that drive production success.
Key Responsibilities
- Prototype and benchmark innovative concepts such as new attention mechanisms, mixture of experts (MoE), speculative decoding, and other emerging advancements.
- Create agent-driven automation tools that design experiments, schedule runs, triage regressions, and prepare pull requests.
- Collaborate closely with compiler, runtime, and silicon teams, gaining a unique perspective on the complete software/hardware innovation stack.
- Stay current with the latest open- and closed-source models; execute them on wafer scale first to identify new optimization opportunities.

