About the job
Cerebras Systems is at the forefront of AI technology, developing the largest AI chip in the world, which is 56 times larger than conventional GPUs. Our innovative wafer-scale architecture enables the power of dozens of GPUs on a single chip, simplifying programming and delivering unparalleled training and inference speeds. This allows users to seamlessly run large-scale machine learning applications without the complexities of managing multiple GPUs or TPUs.
Cerebras serves a diverse clientele, including leading model labs, global corporations, and pioneering AI startups. Recently, OpenAI forged a multi-year partnership with Cerebras, harnessing our technology to deliver 750 megawatts of scale for ultra-fast inference across critical workloads.
Our wafer-scale architecture has positioned Cerebras as the provider of the fastest Generative AI inference solution globally, achieving speeds over 10 times faster than traditional GPU-based hyperscale cloud inference services. This significant speed enhancement is revolutionizing the user experience of AI applications, facilitating real-time iterations and boosting intelligence through advanced computational capabilities.
About The Role
As the ML QA Lead, you will be instrumental in ensuring the quality of Cerebras software across all supported machine learning workloads and workflows. You will join the Machine Integration and Quality (MIQ) team, focusing on software component feature testing, ML training accuracy, performance validation, and pre-deployment checks for customer workloads.
In this role, you will advocate for best practices in testing, promote effective debugging methodologies, and encourage strong communication across teams to deliver world-class products.
Responsibilities
- Ensure the quality of various software and hardware components of Cerebras solutions to guarantee the accuracy, performance, and usability of model training.
- Implement effective testing methodologies, enhance team communication, and utilize strong debugging skills.
- Uphold the highest quality standards across all components within the Cerebras environment.
- Automate workflows, establish testbeds, and develop tools for efficient monitoring and debugging.
- Innovate testing strategies to identify weaknesses in the system.

