About the job
Join OpenAI’s Codex Team as a Senior Data Scientist
About Our Team
Codex is an innovative first-party developer product by OpenAI, concentrating on agentic software engineering. Our mission is to create cutting-edge tools that empower engineers to design, write, test, and deploy code more efficiently and safely at scale. We work collaboratively with research and product teams to transform advancements in AI models into significant productivity enhancements for developers.
Role Overview
In the role of Senior Data Scientist at Codex, you will play a crucial part in evaluating and driving product-market fit for our AI-driven developer tools. You will define the metrics of “developer productivity” for our offerings, conduct experiments on new coding models and user experiences, and identify areas where our technology can enhance or hinder performance across various languages and tasks. Your findings will have a direct impact on how the software industry evolves.
This position is located in San Francisco, CA, and we follow a hybrid work model requiring 3 days in the office weekly. We also provide relocation assistance for new hires.
Your Responsibilities
Collaborate closely with the Codex product team to uncover opportunities for enhancing developer outcomes and fostering growth.
Design and analyze A/B tests and staged rollouts of new coding models and product features.
Establish and implement key performance metrics such as suggestion acceptance rates, edit distances, compile/test pass rates, task completion times, latency, and overall session productivity.
Create informative dashboards and analyses that enable the team to independently find answers to product-related questions (segmented by language, framework, repository size, task type).
Investigate failure modes and work with Research to identify targeted improvements (model quality signals, user feedback, evaluations).
Ideal Candidate Profile
5+ years of experience in a quantitative role within a developer-focused or high-growth product environment.
Proficiency in SQL and Python, with a solid understanding of experimental design and causal inference methodologies.
Demonstrated experience in defining product metrics that relate to user value.
Strong communication skills to effectively engage with product managers, engineers, and designers, and influence product direction.
Preferred Qualifications
A robust programming background with the ability to prototype, run simulations, and assess code quality.
Familiarity with IDE/extension telemetry or analytics related to developer tooling.
Previous experience with Natural Language Processing (NLP), Large Language Models (LLMs), or code models is a plus.

