About the job
Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, 56 times larger than conventional GPUs. Our innovative wafer-scale architecture delivers the computational power equivalent to dozens of GPUs, all within a single chip, providing unparalleled programming simplicity. This unique approach enables us to achieve industry-leading training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing numerous GPUs or TPUs.
Cerebras serves a diverse clientele that includes premier model labs, multinational corporations, and pioneering AI-native startups. Notably, OpenAI has established a multi-year partnership with Cerebras to deploy 750 megawatts of capacity, significantly enhancing key workloads through ultra-high-speed inference.
With our groundbreaking wafer-scale architecture, Cerebras Inference offers the quickest Generative AI inference solution globally, outpacing GPU-based hyperscale cloud inference services by over 10 times. This remarkable leap in speed is transforming the AI application user experience, enabling real-time iterations and elevating intelligence through enhanced agentic computation.
About the Role:
Are you eager to contribute to the creation of the fastest Generative Models inference globally? Join the Cerebras Inference Team to help develop a unique software and hardware combination that showcases the best inference capabilities in the market while accommodating the largest models available.
The Cerebras wafer-scale inference platform facilitates the execution of Generative models at unprecedented speeds, thanks to its unique hardware architecture that ensures rapid access to local memory, ultra-fast interconnects, and a vast amount of computational resources.
You will work closely with the latest open and closed generative AI models, optimizing them for the Cerebras inference platform. Your responsibilities will encompass model representation, optimization, and the compilation stack to achieve optimal results on current and future Cerebras platforms.

