About the job
Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, boasting a size 56 times greater than traditional GPUs. Our cutting-edge wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, while simplifying programming through a unified device interface. This innovative approach enables Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to seamlessly deploy large-scale ML applications without the complexities of managing numerous GPUs or TPUs.
Cerebras counts among its clients leading model labs, multinational corporations, and pioneering AI startups. Notably, OpenAI has recently formed a multi-year alliance with Cerebras to harness 750 megawatts of capacity, redefining essential workloads with rapid inference capabilities.
Our groundbreaking wafer-scale architecture positions Cerebras Inference as the fastest Generative AI inference solution globally, achieving speeds more than 10 times quicker than GPU-based hyperscale cloud inference services. This dramatic enhancement in processing speed transforms the experience of AI applications, facilitating real-time iteration and amplifying intelligence through additional agentic computation.
About The Role
As a Research Engineer on the Inference ML team at Cerebras Systems, you will be tasked with adapting the most sophisticated language and vision models to optimize performance on our flagship Cerebras architecture. Collaborating closely with ML researchers and engineers, you will design, prototype, validate, and enhance models, gaining comprehensive exposure to advanced inference research on the world's fastest AI accelerator.

