Software Engineer, Data Infrastructure - Research

OpenAISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Strong background in engineering with expertise in distributed systems and data infrastructure. Proven experience in designing APIs and creating scalable solutions. Ability to troubleshoot complex performance issues and a commitment to improving user experience in data handling.

About the job

About Our Innovative Team

Join the Workload team at OpenAI, where we are at the forefront of designing and managing the cutting-edge infrastructure that drives the training and inference of large language models (LLMs) at an unprecedented scale. Our systems are engineered to harmonize the complex processes of model training and serving, abstracting performance, parallelism, and execution across extensive GPU and accelerator networks. This robust foundation allows researchers to concentrate on elevating model capabilities, while we take care of the scalability, efficiency, and reliability needed to bring these advanced models to life.

Your Role and Responsibilities

We are seeking a talented engineer to design and implement the dataset infrastructure that will fuel OpenAI’s next-generation training stack. Your primary focus will be on creating standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively identifying and addressing performance bottlenecks. Collaboration with multimodal researchers and infrastructure teams will be key to ensuring that our datasets are unified, efficient, and user-friendly.

Key Responsibilities Include:

Design and maintain standardized dataset APIs, including those for multimodal (MM) data that exceeds memory capacity.
Develop proactive testing and validation pipelines for dataset loading at GPU scale.
Work collaboratively to integrate datasets into training and inference pipelines, ensuring seamless user experiences.
Document and maintain dataset interfaces to ensure they are discoverable, consistent, and easily adoptable by other teams.
Establish validation systems to assure datasets remain reproducible and unchanged once standardized.
Identify and troubleshoot performance bottlenecks in distributed dataset loading, such as stragglers impacting global training speed.
Create visualization and inspection tools to highlight errors, bugs, or bottlenecks in datasets.

Ideal Candidate Profile

Possess strong engineering fundamentals and experience in distributed systems, data pipelines, or infrastructure.
Have a proven track record in building APIs, modular code, and scalable abstractions, with a user-centric approach to design.
Be adept at debugging performance issues across large-scale machine fleets.
Demonstrate a passion for advancing data infrastructure to enhance research capabilities.

About OpenAI

At OpenAI, we are dedicated to pioneering advancements in artificial intelligence. Our mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. With a team of world-class researchers and engineers, we are pushing the boundaries of AI technology to solve some of the most pressing challenges facing society today.

Similar jobs

1 - 20 of 6,561 Jobs

Search for Software Engineer Research Human Data

6,561 results

Select all on this page (20)

Apply

Software Engineer, Research - Human Data

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamAt OpenAI, we are dedicated to ensuring that artificial general intelligence (AGI) serves and benefits all of humanity. A vital component of this mission involves developing models that genuinely understand and resonate with human preferences. Our Human Data team is instrumental in making this a reality.The Human Data engineering team is responsible for creating sophisticated systems that facilitate scalable and high-quality human feedback, which is crucial for training and refining OpenAI's most advanced models. Our engineers work in close collaboration with top-tier researchers to implement alignment techniques—from initial experimental concepts to production-ready feedback loops.Position OverviewWe are seeking passionate software engineers to become part of the Human Data team, tasked with developing the platforms, prototypes, tools, and infrastructure essential for training, aligning, and evaluating our AI models. In this role, you will collaborate with researchers and cross-functional teams to actualize alignment concepts, influence the training of future models, and enhance how our models engage with the real world.We are looking for individuals who thrive on technical ownership, enjoy working across the stack, and are eager to tackle complex challenges in a dynamic, impactful environment.This position is based in San Francisco, CA, and follows a hybrid work model of three days in the office each week. We also provide relocation assistance for new hires.Your ResponsibilitiesDevelop and maintain robust full-stack systems for feedback collection, data labeling, and evaluation pipelines while ensuring high levels of security.Convert experimental alignment research into scalable production infrastructure, including inference and model training systems.Design and enhance user-facing tools and backend services to support high-quality data workflows.Collaborate with researchers, engineers, and program leads to refine feedback loops and model interaction strategies.Lead infrastructure improvements that promote faster iterations and scaling across OpenAI’s cutting-edge models, from internal research tools to production-level ChatGPT.QualificationsProven software engineering skills with experience in building scalable production systems.A strong preference for full-stack development with end-to-end ownership—from backend pipelines to user interfaces.Driven by high-impact projects and capable of navigating ambiguous challenges.

Jan 23, 2026

Apply

Software Engineer for Research Data Platform

Anthropic

Full-time|On-site|San Francisco, CA | New York City, NY

About the Role Anthropic is hiring a Software Engineer to support our Research Data Platform. This position is based in San Francisco, CA or New York City, NY. The team works closely with engineers and researchers to create data solutions that strengthen our research efforts. The work directly supports Anthropic’s mission to develop safe and beneficial AI.

Apr 16, 2026

Apply

Software Engineer, Data Infrastructure - Research

OpenAI

Full-time|On-site|San Francisco

About Our Innovative TeamJoin the Workload team at OpenAI, where we are at the forefront of designing and managing the cutting-edge infrastructure that drives the training and inference of large language models (LLMs) at an unprecedented scale. Our systems are engineered to harmonize the complex processes of model training and serving, abstracting performance, parallelism, and execution across extensive GPU and accelerator networks. This robust foundation allows researchers to concentrate on elevating model capabilities, while we take care of the scalability, efficiency, and reliability needed to bring these advanced models to life.Your Role and ResponsibilitiesWe are seeking a talented engineer to design and implement the dataset infrastructure that will fuel OpenAI’s next-generation training stack. Your primary focus will be on creating standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively identifying and addressing performance bottlenecks. Collaboration with multimodal researchers and infrastructure teams will be key to ensuring that our datasets are unified, efficient, and user-friendly.Key Responsibilities Include:Design and maintain standardized dataset APIs, including those for multimodal (MM) data that exceeds memory capacity.Develop proactive testing and validation pipelines for dataset loading at GPU scale.Work collaboratively to integrate datasets into training and inference pipelines, ensuring seamless user experiences.Document and maintain dataset interfaces to ensure they are discoverable, consistent, and easily adoptable by other teams.Establish validation systems to assure datasets remain reproducible and unchanged once standardized.Identify and troubleshoot performance bottlenecks in distributed dataset loading, such as stragglers impacting global training speed.Create visualization and inspection tools to highlight errors, bugs, or bottlenecks in datasets.Ideal Candidate ProfilePossess strong engineering fundamentals and experience in distributed systems, data pipelines, or infrastructure.Have a proven track record in building APIs, modular code, and scalable abstractions, with a user-centric approach to design.Be adept at debugging performance issues across large-scale machine fleets.Demonstrate a passion for advancing data infrastructure to enhance research capabilities.

Sep 18, 2025

Apply

Research Engineer/Scientist - Human Alignment in Consumer Devices

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamThe Future of Computing Research team is a dynamic applied research unit within the Consumer Devices group at OpenAI. We are dedicated to pioneering innovative methods, models, and evaluation frameworks that propel our vision for the future of computing. Our focus lies at the cutting edge of multimodal AI, transforming emerging model capabilities into product experiences that are not only functional and enjoyable but also foster long-term trust.Our research delves into a new generation of AI systems capable of learning and evolving over time, adapting to individual needs, and enhancing daily life. This includes exploring long-term memory, user modeling, and personalized systems aligned with broader human goals, values, and overall well-being.We collaborate closely across multiple disciplines—research, engineering, design, product management, and safety—to define what it means to build AI systems that recognize and respond to user needs in a contextually aware and respectful manner, ensuring demonstrable benefits.About the PositionWe are seeking a passionate Research Engineer/Scientist to join our Future of Computing Research team, focusing on Reinforcement Learning from Human Feedback (RLHF) and post-training techniques for personalized multimodal AI systems.In this role, you will be instrumental in establishing the learning and evaluation foundations necessary for models to become increasingly context-aware, adaptive, and useful over time. You will tackle challenges such as reward modeling, preference learning, long-horizon evaluation, and policy improvement for systems that are required to make high-quality behavioral decisions in real-world settings. Our success is measured not just by improved benchmark performance but by enhanced model behavior in actual use cases.The ideal candidate is enthusiastic about advancing beyond simplistic one-turn assistant interactions towards systems that learn and grow through feedback, utilizing richer signals and training against meaningful notions of user value. This requires a thoughtful approach to reward design, feedback mechanisms, and evaluation frameworks that assess the long-term benefits of interventions.This position is based in San Francisco, CA, with a hybrid work model of four days in the office each week. We also provide relocation assistance for new hires.Key Responsibilities:Develop RLHF and post-training strategies for multimodal models.Create reward models and preference-learning pipelines to foster adaptive, personalized model behavior.Engage in long-term evaluation and policy refinement to enhance user interactions.

Mar 11, 2026

Apply

Senior Software Engineer, Data Acquisition

OpenAI

Full-time|On-site|San Francisco

Senior Software Engineer - Data AcquisitionOverview:Join the dynamic Data Acquisition team at OpenAI, where we spearhead the data collection efforts essential for powering our advanced model training operations. Our team plays a pivotal role in managing web crawling and GPTBot services, collaborating closely with Data Processing, Architecture, and Scaling teams. We are seeking a talented Senior Software Engineer to enhance our Data Acquisition initiatives.Key Responsibilities:Lead engineering projects focused on data acquisition, including web crawling, data ingestion, and search optimization.Collaborate effectively with cross-functional teams to maintain seamless data flow and system performance.Engage with the legal team to navigate compliance and data privacy regulations.Design and implement robust distributed systems capable of processing petabytes of data.Develop algorithms for efficient data indexing and search functionalities.Build and sustain backend services for data storage, including experience with key-value databases and data synchronization.Deploy solutions in a Kubernetes Infrastructure-as-Code environment while conducting regular system audits.Conduct experiments on data to derive insights that drive system enhancements.Qualifications:Bachelor's, Master's, or PhD in Computer Science or a related discipline.A minimum of 6 years of professional experience in software development.Prior experience with large-scale web crawlers is a significant advantage.In-depth knowledge of large stateful distributed systems and data processing techniques.Expertise in Kubernetes and familiarity with Infrastructure-as-Code practices.A proactive approach to exploring new technologies and methodologies.Strong ability to juggle multiple tasks and adapt to changing priorities.Excellent communication skills, both written and verbal.About OpenAI:OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring that the benefits of general-purpose AI are shared by all of humanity. We strive to push the boundaries of innovation while adhering to ethical standards.

Sep 25, 2023

Apply

Research Engineer - Multimodal Data

Eventual Computing

Full-time|On-site|San Francisco

Eventual Computing builds tools that help AI teams work with large, complex datasets. Based in San Francisco, the company supports projects in robotics, autonomous vehicles, and advanced video generation. Its open-source engine, Daft, is already in use at organizations with demanding data needs. The team focuses on making data curation and model training more efficient, so the right datasets are always within reach. The office is located in the Mission district, where collaboration with leading AI labs and infrastructure companies is part of daily work. Role overview The Research Engineer - Multimodal Data will join the Visual Understanding team. This position centers on building solutions to make vast amounts of video and sensor data accessible and easy to query. The work directly supports researchers who need to find and use specific datasets quickly. What you will do Develop and refine systems that process petabytes of multimodal data, including video and sensor streams. Apply vision-language models to improve how data is discovered and retrieved. Define and influence the roadmap for visual understanding features. Train models to streamline large-scale data annotation and improve efficiency for research teams.

Apr 29, 2026

Apply

Software Engineer, Data Acquisition

OpenAI

Full-time|On-site|San Francisco

Overview:Join the dynamic Data Acquisition team at OpenAI, part of our Foundations organization, where you will play a crucial role in the data collection processes that power our model training initiatives. Our team is at the forefront of managing web crawling and GPTBot services, collaborating closely with departments such as Data Processing, Architecture, and Scaling. We are seeking a talented Software Engineer who is passionate about data acquisition and eager to make a significant impact.Key Responsibilities:Lead and innovate engineering projects focused on data acquisition, including web crawling, data ingestion, and search functionalities.Collaborate effectively with cross-functional teams, including Data Processing, Architecture, and Scaling, to ensure seamless data flow and operational efficiency.Partner with the legal team to navigate compliance and data privacy challenges.Design and implement highly scalable distributed systems capable of processing petabytes of data.Architect algorithms for efficient data indexing and robust search capabilities.Build and manage backend services for data storage, including working with key-value databases and ensuring synchronization.Implement solutions within a Kubernetes Infrastructure-as-Code environment and conduct regular system health checks.Conduct experiments and analyze data to derive insights that enhance system performance.Qualifications:Bachelor's, Master's, or PhD in Computer Science or a related field.4+ years of professional software development experience.Familiarity with large-scale web crawlers is a plus.Deep understanding of large stateful distributed systems and data processing methodologies.Proficient in Kubernetes and knowledgeable about Infrastructure-as-Code principles.Eager to explore and implement new technologies and approaches.Proven ability to manage multiple tasks and adapt to shifting priorities.Excellent written and verbal communication skills.About OpenAI:At OpenAI, we are pioneers in AI research and deployment, dedicated to ensuring that the advancements in artificial intelligence benefit humanity as a whole. Our mission is to push the boundaries of AI capabilities while adhering to safe and responsible deployment practices. Join us in our commitment to harnessing the power of AI for positive global impact.

Sep 22, 2023

Apply

Technical Staff Member at humans& | San Francisco

humans&

Full-time|On-site|San Francisco

Technical Staff MemberAt humans&, we are dedicated to pioneering a human-centric approach to artificial intelligence. Our mission is to redefine AI by placing individuals and their interpersonal connections at the heart of our innovations.We invite talented researchers and engineers who have made significant contributions to the cutting-edge of AI to join our dynamic team. If you excel in your field and are driven to innovate, we want to hear from you!

Jan 20, 2026

Apply

Software Engineer - Post-Training Research

OpenAI

Full-time|On-site|San Francisco

OpenAI is hiring a Software Engineer for Post-Training Research in San Francisco. This position centers on improving the performance and capabilities of advanced machine learning models after their initial training phase. Role overview Work closely with a skilled team to explore new ways of strengthening AI systems. The focus is on researching and developing methods that push the boundaries of what these models can achieve once training is complete. Collaboration Expect to contribute to ongoing research efforts and share insights with colleagues who are passionate about advancing AI. Teamwork and knowledge exchange are key parts of this role. Location This position is based in San Francisco.

Apr 29, 2026

Apply

Software Engineer, Research Developer Productivity

OpenAI

Full-time|Hybrid|San Francisco

About the TeamJoin the Fleet team at OpenAI, where we create essential components that empower innovative research across a spectrum from small-scale initiatives to cutting-edge advancements. Our mission is to accelerate the journey toward Artificial General Intelligence (AGI) by collaborating with diverse teams to enhance the development of groundbreaking capabilities.About the RoleAs OpenAI continues to expand its talented pool of researchers and engineers, we are looking for a dedicated and resourceful Software Engineer focused on enhancing the development experience for both our engineering and research teams.In this pivotal role, you will design and maintain systems that facilitate rapid development, testing, and deployment of new features, ensuring a seamless and efficient development cycle. You will be instrumental in shaping our vision for software development, overseeing the architecture of our continuous integration pipelines, testing frameworks, and the support structures for our build environments.Your expertise in Python, Rust, and C++ will be crucial as you strive to elevate our development processes, creating a state-of-the-art experience for our researchers. Your contribution will be vital in providing the necessary tools and metrics to support our fast-paced culture, ensuring a stable and scalable platform for growth while promoting a frictionless experience for OpenAI’s research efforts.This position is based in San Francisco, CA, following a hybrid work model of three days in the office each week, along with relocation assistance for new hires.You may excel in this role if you:Have experience supporting large-scale monorepo development and deploymentPossess strong programming skills in Python, particularly in large monoreposAre proficient in Docker and KubernetesHave a solid background in Continuous Integration/Continuous Deployment (CI/CD)

Sep 22, 2025

Apply

Research Engineer Specializing in Multimodal Data Scaling

worldlabs

Full-time|On-site|San Francisco

Join worldlabs as a Research Engineer focused on scaling multimodal data. In this dynamic role, you will leverage cutting-edge technologies and methodologies to enhance data processing capabilities. You will be responsible for developing innovative solutions that integrate various data types and drive impactful research outcomes.

Mar 12, 2026

Apply

Software Engineer II, Data Engineering

DoorDash

Full-time|$130.6K/yr - $235K/yr|On-site|San Francisco, CA; Sunnyvale, CA

About Our TeamAt DoorDash, data drives our success. Our Data Engineering team is pivotal in building robust database solutions tailored for diverse applications, including reporting, product analytics, marketing optimization, and financial reporting. By architecting pipelines, data structures, and data warehouse environments, we enable data-driven decision-making across the organization.About the RoleWe are seeking a talented Software Engineer II to join our team as a technical leader, responsible for scaling our data infrastructure, enhancing automation, and developing tools to support our expanding business needs.What You Will DoCollaborate with business partners and stakeholders to gather and understand data requirements.Work alongside engineering, product teams, and external partners to ensure seamless data collection.Design, develop, and implement high-performance data models and pipelines for our Data Lake and Data Warehouse.Establish and execute data quality checks, conduct thorough QA, and implement monitoring routines.Enhance the reliability and scalability of our ETL processes.Manage a suite of data products that deliver accurate and trustworthy data.Support and onboard new engineers as they join our team.What We Are Looking For3+ years of professional experience in data engineering, business intelligence, or a related field.Proficiency in programming languages such as Python and Java.3+ years of experience with ETL orchestration and workflow management tools, including Airflow, Flink, Oozie, and Azkaban, using AWS/GCP platforms.Strong understanding of database fundamentals, SQL, and distributed computing.3+ years of experience with distributed data ecosystems (e.g., Spark, Hive, Druid, Presto) and streaming technologies like Kafka and Flink.Experience with Snowflake, Redshift, PostgreSQL, and/or other database management systems.Excellent communication skills with a proven ability to liaise with both technical and non-technical teams.Familiarity with reporting tools such as Tableau, Superset, and Looker.Able to thrive in a fast-paced and dynamic environment.

Feb 5, 2026

Apply

Senior Backend and Data Platform Engineer

Condor Software

Full-time|On-site|San Francisco

About CondorAt Condor, we are transforming the financial infrastructure of clinical development. While substantial investments are made annually to discover and develop new therapies, the processes behind these advancements often remain outdated and disconnected. Our mission is to bridge this gap, creating a cohesive system that integrates clinical operations, vendor activities, and financial data into a real-time intelligence layer. This empowers R&D and finance teams with the insights they need to make informed decisions.Our AI-driven, pharma-native infrastructure is designed to scale industry standards that we have helped shape alongside major partners. We facilitate prediction, control, and execution in some of the most complex R&D environments globally.As we continue to gain the trust of enterprise teams, we are now focused on the critical task of scaling our operations in a high-stakes environment.Condor is a rapidly growing company, backed by leading institutional investors such as Felicis and 645 Ventures, collaborating with top 200 biopharma companies. This is a unique opportunity to contribute to the infrastructure that influences how new therapies reach patients.The RoleWe are seeking a Senior Backend and Data Platform Engineer to play a key role in developing the foundational data infrastructure for Condor’s financial intelligence platform. This position is pivotal in turning complex clinical and financial data into actionable intelligence that enterprise biopharma teams can rely on.In this role, you will be responsible for designing and managing the core data foundations that underpin Condor’s financial engine and AI capabilities. Your work will involve modeling intricate, high-stakes data, constructing reliable data pipelines and services, and ensuring that product features and intelligence workflows function with precision, consistency, and scalability. The systems you develop will directly support critical finance and operational applications.This hands-on, senior engineering position provides you with significant ownership. You will engage with backend services, data pipelines, and APIs, bringing features from concept to production. You will define necessary data schemas, transformations, and architectural patterns that become essential as our platform evolves. Although your primary focus will be on backend and data engineering, you will also be encouraged to work across the stack to ensure seamless integration of data and intelligence.

Feb 3, 2026

Apply

Research Scientist - Frontier Data

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.

Apr 14, 2026

Apply

Software Engineer - ETL & Data

sfcompute

Full-time|On-site|San Francisco, CA

Role Overview sfcompute is hiring a Software Engineer focused on ETL and Data in San Francisco, CA. This position centers on building and maintaining data pipelines that turn raw data into actionable insights. What You Will Do Design and implement ETL processes to move and transform data efficiently Work with teams across the company to improve data quality and accessibility Support data-driven decision-making by ensuring reliable and accurate information is available

Apr 15, 2026

Apply

Senior+ Software Engineer, Research Tools

Anthropic

On-site|On-site|San Francisco, CA | New York City, NY

About AnthropicAt Anthropic, we are dedicated to developing AI systems that are safe, interpretable, and controllable. Our mission is to ensure that AI technology benefits not just our users, but society as a whole. We are a dynamic and rapidly expanding team of researchers, engineers, policy experts, and business leaders, all working collaboratively to create AI solutions that truly enhance human capability.About the RoleThe Research Tools team at Anthropic is at the forefront of AI safety and capability research, and we require outstanding software solutions to optimize our efforts. As a Senior Software Engineer on this team, you will be pivotal in creating the infrastructure and applications that empower our researchers to conduct complex experiments, iterate rapidly, and derive insights from cutting-edge AI systems.This role merges product development with full-stack engineering. You will collaborate closely with researchers and engineers to understand their workflows, identify roadblocks, and deliver swift solutions that enhance productivity. Whether you are developing interfaces for human feedback on model evaluations, creating platforms for orchestrating experiments, or crafting innovative visualization tools to interpret model behavior, your contributions will significantly advance our mission to build trustworthy AI systems.We seek an individual who thrives in ambiguous environments, possesses a high degree of autonomy, and can swiftly acquire domain knowledge to drive impactful projects from inception to completion.No prior machine learning or research experience is necessary.

Jan 29, 2026

Apply

Staff Software Engineer, Data Engineering

Airbnb, Inc.

Full-time|$191K/yr - $225K/yr|On-site|United States

Founded in 2007, Airbnb began its journey when two hosts welcomed three guests into their San Francisco home. Today, we boast a thriving community of over 5 million hosts who have welcomed more than 2 billion guest arrivals across nearly every country worldwide. Our hosts provide exceptional stays and unique experiences, enabling guests to connect with local communities in a genuine and meaningful way.Join Our Community:At Airbnb, we prioritize the importance of reliable data across all business sectors to drive insight and innovation. To achieve this, we focus on understanding business needs, securing appropriate data sources, designing effective data models, and establishing robust and dependable data pipelines.We are currently recruiting for the following teams:The Data Stewardship Team: A dedicated group of data enthusiasts with diverse expertise in analytics, data modeling, governance, compliance, and scalable data quality. Our mission is to ensure that Airbnb meets its compliance obligations within our data ecosystem while enabling data consumers to easily find the best data suited for their needs. As part of the overall Data Infrastructure organization, we manage the online and offline data infrastructure and oversee the processes that facilitate data transitions between these environments.The Users and Contextualization Data & AI Team: A crucial component of the Marketplace Data & AI, this team focuses on developing foundational data systems that provide deeper insights into essential domains. Specifically, we concentrate on user data (Guests & Hosts) to create high-quality, well-governed user data and insights. These insights are vital for crafting personalized and context-aware experiences that enhance trip quality both on and off the Airbnb platform, ultimately enabling Airbnb to better understand and serve its users throughout their journey.Your Impact:Data Stewardship: This is integral to Airbnb's operations. High-quality data is imperative for our business decisions and the future of our AI initiatives. We are responsible for the overall strategy regarding data quality, identifying critical data and its provenance, measuring the effectiveness of our internal data products, and collaborating with our core catalog team to provide optimal data solutions.

Mar 30, 2026

Apply

Data Research Engineer at Cartesia | San Francisco, CA

Cartesia

Full-time|On-site|*HQ - San Francisco, CA

About CartesiaAt Cartesia, our vision is to create the future of artificial intelligence—intelligent systems that are seamlessly integrated into daily life. We aim to overcome current limitations by enabling models to continuously understand and analyze vast streams of audio, video, and text data—ranging from 1 billion text tokens to 1 trillion video tokens—right on your device.Our pioneering team, comprised of PhDs from the Stanford AI Lab, has developed State Space Models (SSMs), a groundbreaking approach to training efficient, large-scale foundation models. With a rich blend of expertise in model innovation and systems engineering, alongside a product-focused engineering team, we are committed to developing and delivering cutting-edge AI models and user experiences.Supported by prominent investors including Index Ventures and Lightspeed Venture Partners, as well as many esteemed advisors and over 90 angel investors from diverse industries, we are at the forefront of AI advancements.About The RoleIn our quest to create truly global AI, we must train our models using datasets that represent the vast diversity of languages and cultures around the world. We are looking for a Research Engineer to take charge of the quality and comprehensiveness of the data that drives our models. As our in-house expert in global data, you will ensure that our models excel across multiple languages, leveraging your keen understanding of linguistic subtleties and your enthusiasm for building inclusive, large-scale datasets.Your ImpactDesign and construct extensive datasets for model training, conducting controlled experiments to evaluate their effect on model performance.Develop assessments for speech models through both manual annotation and automated evaluation metrics.Utilize data generation techniques to enhance model intelligence and reduce biases.Create automated quality control systems to validate and filter the generated data.Collaborate with product teams to ensure optimal support for key languages and markets.What You BringProven experience in developing or working with extensive multilingual datasets.Familiarity with generative models, including speech, text, or multimodal systems.Ability to guide human annotation and evaluation across various languages.Strong analytical skills and a passion for data-driven decision-making.

Jan 6, 2026

Apply

Program Manager, Human Data

OpenAI

Full-time|On-site|San Francisco

About Our Innovative TeamThe Human Data team plays a vital role in transforming human feedback into actionable insights for training and evaluation. We meticulously design and implement comprehensive programs that delve into the nuances of human intent across various applications of our models, from everyday tasks to critical scenarios. Our work encompasses tailored data initiatives, scalable synthetic data solutions, and integrated product signals. Collaborating closely with all research teams, we convert these insights into training datasets, innovative evaluations, and feedback mechanisms that not only enhance our models but also broaden their practical applications.Your Role in Shaping the FutureAs a Program Manager (PGM) within the Human Data team, you will collaborate with research teams, operations, and engineering to drive complex programs aimed at acquiring high-quality data. You will serve as a crucial liaison between external vendors and AI trainers, ensuring the successful execution of human data campaigns. Your contributions will be instrumental in empowering OpenAI to develop safe and effective models for real-world deployment.This position is based in our San Francisco headquarters.Your Responsibilities Include:Thriving in a fast-paced environment where your work directly impacts the models OpenAI deploys.Collaborating with external vendors, trainers, and internal researchers to gather, review, and deliver top-tier data.Defining project requirements, drafting instructions, establishing success metrics, and calibrating AI trainers.Utilizing internal tools to evaluate labeled data and provide insightful feedback to AI trainers.Engaging in critical thinking to offer recommendations on tooling and process enhancements aimed at optimizing quality, efficiency, and the overall experience for AI trainers.You Will Succeed in This Role If:You excel in dynamic environments. You can navigate uncertainty, manage shifting priorities, and adapt quickly to fast-paced changes without missing a beat.You possess a curiosity for AI, LLMs, and Agents. While not mandatory, an interest or background in these areas will enhance your ability to connect with our broader mission.You have a proactive mindset. You embrace the opportunity to engage in hands-on tasks with the same enthusiasm as high-visibility projects.You are an exceptional communicator. You thrive on bringing teams together, fostering collaboration, and ensuring clear communication across all channels.

Nov 4, 2025

Apply

Software Engineer at Sieve | San Francisco

Sieve

Full-time|On-site|San Francisco

About UsSieve is a pioneering AI research lab dedicated solely to harnessing the potential of video data. Our innovative approach combines massive exabyte-scale video infrastructure with cutting-edge video understanding techniques and a multitude of data sources, allowing us to create datasets that redefine video modeling. Given that video comprises 80% of internet traffic, it serves as a vital digital medium fueling creativity, communication, gaming, AR/VR, and robotics. At Sieve, we aim to eliminate the primary bottleneck hindering the growth of these applications: the need for high-quality training data.In just a small team of 15, we've collaborated with leading AI labs and generated $XXM in revenue last quarter. Our growth has been supported by our Series A funding from top-tier firms such as Matrix Partners, Swift Ventures, Y Combinator, and AI Grant.About the RoleAs a Software Engineer at Sieve, you will play a pivotal role in developing and scaling the data pipelines that produce the datasets we provide to our customers. You will take full ownership of projects from inception to deployment: managing data sourcing and curation, developing machine learning filters, enhancing system efficiency, and creating internal dashboards for quality assurance and delivery. Your contributions will be essential in ensuring that our customers receive timely and high-quality data consistently.This position is perfect for individuals who excel at tackling challenging problems, enjoy direct customer engagement, and aspire to push the boundaries of Video AI technology.

Sep 2, 2025

Create account — see all 6,561 results