companyCerebras Systems logo

LLM Inference Performance & Evaluations Engineer

Cerebras SystemsToronto, Ontario, Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Skills and Qualifications3+ years of experience in developing high-performance machine learning or systems software. Strong foundation in software architecture and design principles. Expertise in performance benchmarking and optimization techniques. Familiarity with modern programming languages and development tools. Ability to work collaboratively in a dynamic team environment.

About the job

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, a staggering 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers the computational power of dozens of GPUs within a single chip, paired with the simplicity of a unified programming interface. This unique approach enables us to achieve unparalleled training and inference speeds, empowering machine learning practitioners to execute large-scale ML applications effortlessly, without the complexities associated with hundreds of GPUs or TPUs.

Among our esteemed clientele are leading model laboratories, global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year collaboration with Cerebras, aiming to leverage 750 megawatts of scale to revolutionize key workloads through ultra-high-speed inference.

Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution available today, boasting speeds over ten times faster than GPU-based hyperscale cloud services. This extraordinary increase in speed is reshaping the user experience of AI applications, enabling real-time iterations and enhancing intelligence through advanced agentic computation.

About The Role

Join our inference model team, dedicated to advancing state-of-the-art models by numerically validating and accelerating innovative concepts on our wafer-scale hardware. In this role, you will prototype architectural enhancements, construct performance evaluation pipelines, and translate quantitative insights into actionable changes that drive production success.

Key Responsibilities

  • Prototype and benchmark innovative concepts such as new attention mechanisms, mixture of experts (MoE), speculative decoding, and other emerging advancements.
  • Create agent-driven automation tools that design experiments, schedule runs, triage regressions, and prepare pull requests.
  • Collaborate closely with compiler, runtime, and silicon teams, gaining a unique perspective on the complete software/hardware innovation stack.
  • Stay current with the latest open- and closed-source models; execute them on wafer scale first to identify new optimization opportunities.

About Cerebras Systems

Cerebras Systems is a pioneering technology company revolutionizing the field of artificial intelligence with the world’s largest AI chip. Our innovative approach to architecture and design allows for unmatched computational efficiency and efficacy, making us leaders in AI inference and training solutions. We are committed to empowering our clients with tools that enable groundbreaking advancements in machine learning.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.