About the job
Join the dynamic Data Acquisition team at OpenAI, part of our Foundations organization, where you will play a crucial role in the data collection processes that power our model training initiatives. Our team is at the forefront of managing web crawling and GPTBot services, collaborating closely with departments such as Data Processing, Architecture, and Scaling. We are seeking a talented Software Engineer who is passionate about data acquisition and eager to make a significant impact.
Key Responsibilities:
Lead and innovate engineering projects focused on data acquisition, including web crawling, data ingestion, and search functionalities.
Collaborate effectively with cross-functional teams, including Data Processing, Architecture, and Scaling, to ensure seamless data flow and operational efficiency.
Partner with the legal team to navigate compliance and data privacy challenges.
Design and implement highly scalable distributed systems capable of processing petabytes of data.
Architect algorithms for efficient data indexing and robust search capabilities.
Build and manage backend services for data storage, including working with key-value databases and ensuring synchronization.
Implement solutions within a Kubernetes Infrastructure-as-Code environment and conduct regular system health checks.
Conduct experiments and analyze data to derive insights that enhance system performance.
Qualifications:
Bachelor's, Master's, or PhD in Computer Science or a related field.
4+ years of professional software development experience.
Familiarity with large-scale web crawlers is a plus.
Deep understanding of large stateful distributed systems and data processing methodologies.
Proficient in Kubernetes and knowledgeable about Infrastructure-as-Code principles.
Eager to explore and implement new technologies and approaches.
Proven ability to manage multiple tasks and adapt to shifting priorities.
Excellent written and verbal communication skills.
About OpenAI:
At OpenAI, we are pioneers in AI research and deployment, dedicated to ensuring that the advancements in artificial intelligence benefit humanity as a whole. Our mission is to push the boundaries of AI capabilities while adhering to safe and responsible deployment practices. Join us in our commitment to harnessing the power of AI for positive global impact.

