companyCerebras Systems logo

Performance Engineer - Inference

Cerebras SystemsToronto, Ontario, Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

RequirementsA Bachelor's, Master's, or PhD in Electrical Engineering or Computer Science. Strong experience in performance modeling, debugging, and optimization. Familiarity with kernel micro code and compiler algorithms. Ability to analyze runtime performance metrics and system-level performance. Experience in developing tools for data visualization and performance diagnostics.

About the job

Cerebras Systems is at the forefront of AI technology, having developed the world's largest AI chip, which is 56 times larger than traditional GPUs. Our revolutionary wafer-scale architecture delivers unparalleled AI compute power equivalent to dozens of GPUs on a single chip, combined with the ease of programming as if it were a single device. This innovative approach enables us to achieve industry-leading training and inference speeds, allowing machine learning practitioners to run extensive ML applications effortlessly, without the complexities associated with managing numerous GPUs or TPUs.  

Cerebras is trusted by leading model labs, global enterprises, and pioneering AI-native startups. Notably, OpenAI recently announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra high-speed inference. 

Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution globally, exceeding GPU-based hyperscale cloud inference services by more than 10 times. This significant enhancement in speed is redefining the user experience of AI applications, facilitating real-time iterations and amplifying intelligence through enhanced agentic computation.

About The Role

As a member of the inference performance team, you will work at the critical intersection of hardware and software, enhancing end-to-end model inference speed and throughput. Your focus will encompass low-level kernel performance debugging and optimization, system-level performance analysis, performance modeling, and the creation of tools for performance diagnostics and projections.

Responsibilities

  • Develop performance models (kernel-level, end-to-end) to forecast the performance of state-of-the-art and client ML models.
  • Optimize and troubleshoot our kernel micro code and compiler algorithms to enhance ML model inference speed, throughput, and compute utilization on the Cerebras WSE.
  • Analyze and debug runtime performance at the system and cluster level.
  • Create tools and infrastructure to visualize performance data collected from the Wafer Scale Engine and our compute cluster.

About Cerebras Systems

Cerebras Systems is a pioneering technology company that specializes in building the world's largest AI chip. Their cutting-edge wafer-scale architecture sets a new standard in AI computing, enabling users to run large-scale machine learning applications with unprecedented speed and efficiency.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.