About the job
At Cerebras Systems, we are revolutionizing AI technology with the world's largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture equips users with the computational prowess of multiple GPUs on a single chip, simplifying programming tasks. This groundbreaking approach enables unparalleled training and inference speeds, allowing machine learning practitioners to seamlessly operate extensive ML applications without the complexities of managing numerous GPUs or TPUs.
Cerebras proudly serves an impressive clientele, ranging from leading model laboratories to major global enterprises and pioneering AI startups. A testament to our capabilities, OpenAI has recently forged a multi-year partnership with Cerebras, aimed at delivering 750 megawatts of scale and transforming critical workloads through ultra-fast inference.
Our state-of-the-art wafer-scale architecture empowers Cerebras Inference to provide the fastest Generative AI inference solution globally, boasting speeds over ten times faster than GPU-based hyperscale cloud inference services. This extraordinary increase in speed is redefining user experiences with AI applications, enabling real-time iterations and enhancing intelligence through additional agentic computation.
About the Role
We are on the lookout for a dynamic and seasoned engineer to join our Inference Core Model Bringup team. This team is tasked with the rapid deployment of advanced open-source models (such as LLaMA, Qwen, etc.) or proprietary models provided by customers on our Cerebras CSX systems. Success in this role necessitates a systems-oriented generalist who flourishes in fast-paced environments and is adept at navigating the entire Cerebras software stack. Your contributions will be instrumental in achieving unprecedented performance, efficiency, and scalability for AI applications.

