About the job
About Our Team
At OpenAI, our Hardware organization is pioneering the development of cutting-edge silicon and system-level solutions tailored to meet the distinctive needs of advanced AI workloads. We are dedicated to building the next generation of AI silicon, collaborating closely with software engineers and research partners to co-design hardware that integrates seamlessly with our AI models. Our mission includes not only delivering high-quality, production-grade silicon for OpenAI's supercomputing infrastructure but also creating custom design tools and methodologies that foster innovation and enable hardware optimized specifically for AI applications.
About the Role
We are on the lookout for a talented Research Hardware Co-Design Engineer to operate at the intersection of model research and silicon/system architecture. In this role, you will play a critical part in shaping the numerics, architecture, and technological strategies for the future of OpenAI's silicon in collaboration with both Research and Hardware teams.
Your responsibilities will include diagnosing discrepancies between theoretical performance and real-world measurements, writing quantization kernels, assessing the risks associated with numerics through model evaluations, quantifying system architecture trade-offs, and implementing innovative numeric RTL. This is a hands-on position for individuals who are passionate about tackling challenging problems, seeking practical solutions, and driving them to production. Strong prioritization and transparent communication skills are vital for success in this role.
Location: San Francisco, CA (Hybrid: 3 days/week onsite)
Relocation assistance available.
Key Responsibilities:
Enhance our roofline simulator to monitor evolving workloads and deliver analyses that quantify the impact of architectural decisions, supporting technology exploration.
Identify and resolve discrepancies between performance simulations and actual measurements; effectively communicate root causes, bottlenecks, and incorrect assumptions.
Develop emulation kernels for low-precision numerics and lossy compression techniques, equipping Research with the insights needed to balance efficiency with model quality.
Prototype numeric modules by advancing RTL through synthesis; either hand off innovative numeric solutions cleanly or occasionally take ownership of an RTL module from start to finish.
Proactively engage with new ML workloads, prototype them using rooflines and/or functional simulations, and initiate evaluations of new opportunities or risks.
Gain a holistic understanding of the transition from ML science to hardware optimization, breaking down this comprehensive objective into actionable short-term deliverables.
Foster collaborative relationships across diverse teams with varying goals and expertise, ensuring that progress remains unimpeded.
Clearly articulate design trade-offs with explicit assumptions and rationale.

