About the job
About Us
Graphcore stands at the forefront of innovation in the field of Artificial Intelligence compute. Our mission is to develop cutting-edge hardware, software, and systems infrastructure that will facilitate the next wave of AI advancements and promote the widespread integration of AI solutions across all sectors.
As a member of the esteemed SoftBank Group, Graphcore is part of a select group of companies that are driving transformative technological changes. Our united vision aims to empower Artificial Super Intelligence and make its advantages universally accessible.
Our teams are composed of individuals from various backgrounds, each contributing a unique skill set and perspective. With a rich blend of AI research specialists, silicon designers, software engineers, and systems architects, we foster a culture of continuous learning and relentless innovation.
Job Summary
We are on the lookout for a Software Engineering Intern to join our dynamic team dedicated to the creation of high-performance machine learning (ML) kernels for next-generation AI hardware. In this impactful role, you will play a key part in developing optimized compute kernels, enabling a diverse array of ML operators that power applications ranging from convolutional neural networks (CNNs) to large language models (LLMs).
You will utilize low-level programming and hardware-aware optimization techniques to drive peak performance and efficiency from contemporary accelerators. This is a unique opportunity to engage at the convergence of ML, numerical computing, and scalable systems.
The Team
Join an expanding team at Graphcore, where the Kernel Engineering team is tasked with delivering high-performance compute libraries that empower our customers to maximize their AI hardware capabilities.
Responsibilities and Duties
- Assist in the design and implementation of kernels for linear algebra and tensor operations (GEMM, batched GEMM, convolutions, reductions, element-wise and fused operations) using C++.
- Profile and optimize for next-gen AI hardware, focusing on threading, cache locality, memory layout, and kernel launch efficiency.
- Contribute to performance validation and correctness through microbenchmarks, regression tests, and numerical validation.
- Debugging issues, resolving bugs, and enhancing the quality and functionality of our products.
About You
You possess an open-minded, collaborative spirit and have a keen interest in performance optimization and memory-efficient designs. You are eager to join a team that values innovation and excellence.

