About the job
About Our Team
The Applied team at OpenAI is dedicated to collaborating across research, engineering, product, and design disciplines to effectively deliver cutting-edge AI technology to both consumers and businesses. Our mission is to learn from real-world deployments, ensuring that the benefits of AI are distributed equitably while prioritizing responsible and safe use of this powerful technology. Here, safety takes precedence over unchecked growth.
About the Role:
We are on the lookout for a Senior Data Engineer to spearhead the development of our data pipelines and foundational data tables at OpenAI. These pipelines are essential for driving analyses, enhancing safety systems that inform business decisions, fostering product growth, and mitigating risks from malicious actors. If you have a passion for data and a desire to create impactful solutions, we want to hear from you! This role also offers the chance to work closely with the innovative researchers behind ChatGPT, assisting them in training new models for user delivery. As we continue to expand rapidly, we place immense value on data-driven insights, and your contributions will be crucial to our ongoing success. Join us in shaping the future of AI!
Your Responsibilities Will Include:
- Designing, building, and maintaining our data pipelines, ensuring seamless integration of user event data into our data warehouse.
- Creating canonical datasets that track vital product metrics such as user growth, engagement, and revenue.
- Collaborating with diverse teams, including Infrastructure, Data Science, Product, Marketing, Finance, and Research, to understand their data needs and deliver effective solutions.
- Implementing robust, fault-tolerant systems for data ingestion and processing.
- Participating in architectural and engineering decisions related to data management, leveraging your extensive experience.
- Ensuring data security, integrity, and compliance with industry standards and company policies.
You Will Excel in This Role If You:
- Possess 3+ years of experience as a data engineer and at least 8 years in software engineering (including data engineering).
- Have proficiency in at least one programming language relevant to Data Engineering, such as Python, Scala, or Java.
- Bring experience with distributed processing technologies and frameworks like Hadoop, Flink, and distributed storage systems (e.g., HDFS, S3).
- Are well-versed in ETL schedulers such as Airflow, Dagster, Prefect, or similar tools.

