Senior Software Engineer - Data Infrastructure
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About Plaid
Plaid is at the forefront of the financial technology revolution, dedicated to making financial services accessible and efficient for all. We empower businesses to harness the potential of consumer-permissioned financial data, enabling them to deliver innovative solutions to their customers. Our culture thrives on collaboration, innovation, and a commitment to data integrity, ensuring that our team members are equipped with the best tools and resources to excel in their roles.
Similar jobs
Search for Software Engineer Data Infrastructure Research
6,605 results
About Our Innovative TeamJoin the Workload team at OpenAI, where we are at the forefront of designing and managing the cutting-edge infrastructure that drives the training and inference of large language models (LLMs) at an unprecedented scale. Our systems are engineered to harmonize the complex processes of model training and serving, abstracting performance, parallelism, and execution across extensive GPU and accelerator networks. This robust foundation allows researchers to concentrate on elevating model capabilities, while we take care of the scalability, efficiency, and reliability needed to bring these advanced models to life.Your Role and ResponsibilitiesWe are seeking a talented engineer to design and implement the dataset infrastructure that will fuel OpenAI’s next-generation training stack. Your primary focus will be on creating standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively identifying and addressing performance bottlenecks. Collaboration with multimodal researchers and infrastructure teams will be key to ensuring that our datasets are unified, efficient, and user-friendly.Key Responsibilities Include:Design and maintain standardized dataset APIs, including those for multimodal (MM) data that exceeds memory capacity.Develop proactive testing and validation pipelines for dataset loading at GPU scale.Work collaboratively to integrate datasets into training and inference pipelines, ensuring seamless user experiences.Document and maintain dataset interfaces to ensure they are discoverable, consistent, and easily adoptable by other teams.Establish validation systems to assure datasets remain reproducible and unchanged once standardized.Identify and troubleshoot performance bottlenecks in distributed dataset loading, such as stragglers impacting global training speed.Create visualization and inspection tools to highlight errors, bugs, or bottlenecks in datasets.Ideal Candidate ProfilePossess strong engineering fundamentals and experience in distributed systems, data pipelines, or infrastructure.Have a proven track record in building APIs, modular code, and scalable abstractions, with a user-centric approach to design.Be adept at debugging performance issues across large-scale machine fleets.Demonstrate a passion for advancing data infrastructure to enhance research capabilities.
OpenEvidence
Position OverviewJoin OpenEvidence as a Data Infrastructure Software Engineer, where you will engineer comprehensive systems that drive essential product and research operations. Your focus will be on optimizing performance, ensuring scalability, and enhancing accuracy, while enjoying the autonomy to manage the infrastructure that assists healthcare professionals in navigating complex clinical decisions in real-time.We value exceptional creators who thrive in versatile roles. Our engineers engage across various products and projects, taking ownership wherever they can make the most significant impact.About OpenEvidenceOpenEvidence is the leading medical AI platform globally, utilized by over 40% of clinicians in the U.S. in just over a year through organic product-led growth. As a $12 billion company, our engineering team comprises 30 talented individuals from MIT, Harvard, and Stanford. We believe that groundbreaking products are born from a small group of exceptional builders, driven by focused goals and empowered to take ownership and act swiftly. We are expanding our team to capitalize on an unparalleled opportunity to set the standard for medical AI platforms.If you are a top-tier engineer or scientist eager to push the boundaries and achieve tangible outcomes that affect millions of lives, we want to connect with you.Our CultureWe expect our work to be performed at an elite level. The journey from concept to execution and scaling is akin to a professional sport, where excellence is non-negotiable. We believe that the creation of innovative technologies is only achievable through complete ownership. Significant achievements happen when individuals take the initiative to see them through.Your ProfileThis role is not for those seeking a 9-to-5 job or merely looking to write papers. If you are ready to dive into the trenches, tackle challenges head-on, and create something from scratch that could impact millions and drive substantial revenue, you might be the perfect fit.We seek brilliant builders who are intelligent, ambitious, resourceful, self-reliant, detail-oriented, driven, hardworking, and humble. Does this sound rare? It is, as we have only found 30 of them so far, and we are eager to discover more.
At Figma, we are expanding our team of dedicated creatives and innovators committed to making design accessible for everyone. Our platform empowers teams to transform ideas into reality—whether you're brainstorming, prototyping, converting designs into code, or utilizing AI for enhancements. From concept to product, Figma enables teams to optimize workflows, accelerate processes, and collaborate in real-time from anywhere in the world. If you're passionate about shaping the future of design and teamwork, we invite you to join us!The Data Platform team at Figma is responsible for constructing and managing the essential systems that drive analytics, AI/ML initiatives, and data-informed decision-making across our organization. We cater to a wide array of stakeholders, including AI researchers, machine learning engineers, data scientists, product engineers, and business teams that depend on data for insights and strategic planning. Our team is tasked with owning and scaling critical platforms such as the Snowflake data warehouse, ML Datalake, orchestration and pipeline infrastructure, and extensive data ingestion and processing systems, overseeing all data transactions that occur within these platforms.Despite our small size, we tackle significant, high-impact challenges. In the upcoming years, we are focused on developing the data infrastructure layer for Figma's AI-driven products, enhancing cost and performance efficiencies across our data stack, scaling our ingestion and reverse ETL capabilities for new product applications, and reinforcing data quality, reliability, and compliance at every level. If you are enthusiastic about creating scalable, high-performance data platforms that empower teams across Figma, we would love to connect with you!This is a full-time role that can be performed from one of our US hubs or remotely within the United States.
Airbnb, Inc.
Founded in 2007, Airbnb has transformed the way people experience travel, connecting over 5 million hosts with more than 2 billion guests worldwide. Our platform enables unique stays and authentic experiences, fostering connections with local communities.The Team You Will Join:As a pivotal member of the Data Warehouse Infrastructure team, you will help shape the backbone of Airbnb's big data capabilities, enabling hundreds of engineers to efficiently collect, manage, and analyze vast amounts of data. We leverage cutting-edge open-source technologies such as Hadoop, Spark, Trino, Iceberg, and Airflow.Typical Responsibilities:Design and architect Airbnb's next-generation big data compute platform to enhance data ETL, analytics, and machine learning efforts.Oversee the platform's operations, focusing on improving reliability, performance, observability, and cost-effectiveness.Create high-quality, maintainable, and self-documenting code while engaging actively in code review processes.Contribute to open-source projects, making a significant impact on the industry.
Thinking Machines Lab
At Thinking Machines Lab, our vision is to enhance human potential by advancing collaborative general intelligence. We are dedicated to creating a future where individuals have the resources and knowledge to harness AI for their specific objectives and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most popular AI products, including ChatGPT and Character.ai, as well as influential open-weight models like Mistral, along with highly regarded open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented engineer to enhance our data infrastructure. You will become part of a dynamic, high-impact team tasked with designing and scaling the foundational infrastructure for distributed training pipelines, multimodal data catalogs, and sophisticated processing systems that manage petabytes of data.Our infrastructure is pivotal; it serves as the foundation for every groundbreaking achievement. You will collaborate directly with researchers to expedite experiments, develop novel datasets, optimize infrastructure efficiency, and derive essential insights from our data repositories.If you are passionate about distributed systems, large-scale data mining, and open-source tools such as Spark, Kafka, Beam, Ray, and Delta Lake, and enjoy building innovative solutions from scratch, we encourage you to apply.Note: This is an evergreen role that we keep open continuously for expressions of interest. We receive a high volume of applications, and while there may not always be an immediate position that aligns perfectly with your skills and experience, we encourage you to apply. We regularly review applications and reach out as new opportunities arise. You are welcome to reapply after gaining more experience, but please refrain from applying more than once every six months. We may also post for specific roles for particular projects or team needs, and in those cases, you are welcome to apply directly in addition to this evergreen role.
About Our TeamAt OpenAI, our Data Platform team is at the heart of our innovative approaches to data management, powering essential product, research, and analytics workflows. We manage some of the largest Spark compute fleets in production, architect data lakes and metadata systems on Iceberg and Delta, and envision exabyte-scale architectures. Our high-throughput streaming platforms utilize Kafka and Flink, while our orchestration is powered by Airflow. We also support machine learning feature engineering tools such as Chronon. Our mission is to provide secure, reliable, and efficient data access at scale, thereby enhancing intelligent, AI-assisted data workflows.Join us in building and maintaining these core platforms that are foundational to OpenAI's products, research, and analytics capabilities.We are not just scaling infrastructure; we are transforming the way people engage with data. Our vision includes intelligent interfaces and AI-powered workflows that make data interactions faster, more reliable, and intuitive.About the PositionIn this role, you will focus on constructing and managing data infrastructure that supports extensive compute fleets and storage systems optimized for high performance and scalability. You will be instrumental in designing, developing, and operating the next generation of data infrastructure at OpenAI. Your responsibilities will encompass scaling and securing big data compute and storage platforms, building and maintaining high-throughput streaming systems, ensuring low-latency data ingestion, and facilitating secure, governed data access for machine learning and analytics. You will also prioritize reliability and performance at extreme scales.You will have complete ownership of the full lifecycle: from architecture to implementation, production operations, and on-call responsibilities.You should be experienced with platforms such as Spark, Kafka, Flink, Airflow, Trino, or Iceberg. Familiarity with infrastructure tools like Terraform, along with expertise in debugging large-scale distributed systems, is essential. A passion for addressing data infrastructure challenges in the AI domain is a must.This role is based in San Francisco, CA. We offer a hybrid work model requiring 3 days in the office each week and provide relocation assistance for new hires.Responsibilities:Design, build, and maintain data infrastructure systems including distributed compute, data orchestration, distributed storage, streaming infrastructure, and machine learning infrastructure, ensuring they are scalable, reliable, and secure.Ensure our data platform can scale significantly while maintaining reliability and efficiency.Enhance company productivity by empowering your fellow engineers and teammates through innovative data solutions.
Foxglove develops data infrastructure for robotics teams operating in real-world environments such as factories and warehouses. As robots leave the lab, engineers need reliable tools for analyzing data, diagnosing issues, and improving system performance. Foxglove delivers observability, visualization, and data management solutions designed to help teams manage large volumes of multimodal sensor data from deployed fleets. Role overview This Software Engineer - Robotics Data Infrastructure position centers on building and optimizing the systems behind Foxglove’s products. The scope covers desktop and web visualization tools, backend services for data ingestion and streaming, and client libraries running directly on robots. Work ranges from enhancing decoding performance in Rust, to extending MCAP tooling in C++, integrating new data sources with TypeScript, and occasionally working with customers to resolve performance issues. What you will do Design, build, and deploy product features from start to finish, incorporating feedback from users. Work across the stack: from Rust and C++ libraries on devices, to backend cloud services, to browser-based visualization tools. Identify and address performance bottlenecks in data pipelines, including ingestion, decoding, streaming, and rendering. Contribute to MCAP and other open-source libraries used by the robotics community. Collaborate with customers and robotics engineers to gather requirements and validate new solutions. Maintain high engineering standards and help foster a culture of ownership within the team. Design systems for efficient storage and querying of petabyte-scale robotics data. Requirements At least 5 years of experience developing production software. Strong proficiency in Rust, C++, and TypeScript, with a willingness to learn new languages or frameworks as needed. Location This position is based in San Francisco, CA.
At Plaid, we believe in the power of data-driven decision-making. Our data culture demands robust and scalable data systems that ensure accuracy and completeness. As a Senior Software Engineer focusing on Data Infrastructure, you will play a pivotal role in empowering teams across engineering, product, and business sectors to swiftly and securely extract valuable data insights. Your work will directly enhance our ability to serve customers effectively. You will be responsible for building and optimizing our data and machine learning infrastructure, allowing Plaid engineers to innovate and iterate on products built on consumer-permissioned financial data. Our Data Infrastructure engineers are experts in Data Warehousing, Data Lakehouse architecture, Spark, Workflow Orchestration, and Streaming technologies. You will enhance our existing data pipelines for performance and cost efficiency while creating intuitive abstractions that simplify the development process for other engineers at Plaid.
Cognition
Join our dynamic team at Cognition as a Research Engineer specializing in Infrastructure. In this role, you will be at the forefront of cutting-edge research, contributing to innovative solutions that shape the future of our infrastructure projects.Your responsibilities will include conducting thorough research, analyzing data, and collaborating with cross-functional teams to implement effective strategies. We are looking for an individual who is passionate about technology and infrastructure, eager to solve complex problems, and ready to drive impactful results.
Fable Security
About Fable SecurityAt Fable Security, we recognize that AI-driven threats and human error pose significant risks to enterprise security. Cybercriminals exploit human behavior, which is responsible for over 70% of security breaches. Our mission is to empower individuals with the right tools, transforming them from targets into an active line of defense.We have developed a human risk platform that effectively shapes employee behavior. Our user-friendly and scalable platform integrates complex employee data, identifies risky behaviors, and automatically delivers timely, relevant interventions where employees are most engaged—in real time.Supported by renowned investors such as Redpoint Ventures and Greylock Partners, and founded by members of the Abnormal Security team, Fable is addressing one of cybersecurity’s most pressing challenges within a multi-billion-dollar market. Our diverse team includes alumni from Meta, Twitter, and prestigious universities like Columbia, Stanford, and UCLA. As we experience rapid growth, this is a prime opportunity to contribute to and influence the future of security.Why Join UsHelp us build and scale the core data infrastructure that drives a groundbreaking product.Collaborate with engineering, data science, and product teams to operationalize data effectively at scale.Be part of a small, elite team where your contributions will have a significant impact.As part of an early-stage company, every engineer plays a crucial role in shaping product functionality and evolution. You will define not only the technical architecture but also the company’s data philosophy.Your RoleIn the position of Data Infrastructure Engineer, you will be responsible for the architecture, scalability, and reliability of our data platform.You will design and construct systems that support everything from real-time product functionalities to internal analytics and machine learning processes, covering the spectrum from data ingestion to production-ready datasets. Additionally, you will establish best practices that underpin our data-driven products.This role is highly cross-functional, requiring close collaboration with engineering, data, and product teams to ensure our data foundation evolves in tandem with our growth.ResponsibilitiesDesign, develop, and sustain scalable data systems.Implement best practices for data architecture and management.Collaborate with cross-functional teams to facilitate data-driven decision-making.
About Our TeamAt OpenAI, we are dedicated to ensuring that artificial general intelligence (AGI) serves and benefits all of humanity. A vital component of this mission involves developing models that genuinely understand and resonate with human preferences. Our Human Data team is instrumental in making this a reality.The Human Data engineering team is responsible for creating sophisticated systems that facilitate scalable and high-quality human feedback, which is crucial for training and refining OpenAI's most advanced models. Our engineers work in close collaboration with top-tier researchers to implement alignment techniques—from initial experimental concepts to production-ready feedback loops.Position OverviewWe are seeking passionate software engineers to become part of the Human Data team, tasked with developing the platforms, prototypes, tools, and infrastructure essential for training, aligning, and evaluating our AI models. In this role, you will collaborate with researchers and cross-functional teams to actualize alignment concepts, influence the training of future models, and enhance how our models engage with the real world.We are looking for individuals who thrive on technical ownership, enjoy working across the stack, and are eager to tackle complex challenges in a dynamic, impactful environment.This position is based in San Francisco, CA, and follows a hybrid work model of three days in the office each week. We also provide relocation assistance for new hires.Your ResponsibilitiesDevelop and maintain robust full-stack systems for feedback collection, data labeling, and evaluation pipelines while ensuring high levels of security.Convert experimental alignment research into scalable production infrastructure, including inference and model training systems.Design and enhance user-facing tools and backend services to support high-quality data workflows.Collaborate with researchers, engineers, and program leads to refine feedback loops and model interaction strategies.Lead infrastructure improvements that promote faster iterations and scaling across OpenAI’s cutting-edge models, from internal research tools to production-level ChatGPT.QualificationsProven software engineering skills with experience in building scalable production systems.A strong preference for full-stack development with end-to-end ownership—from backend pipelines to user interfaces.Driven by high-impact projects and capable of navigating ambiguous challenges.
Innovating the Future of SoftwareAs we approach 2026, the software industry is facing an unprecedented challenge: the 'infinite software crisis.' At Sazabi, we are dedicated to redefining how engineering teams support, maintain, and operate the rapid growth in application development.Introducing Sazabi: The AI-Native Observability Platform for Agile Engineering Teams.Our platform empowers teams by providing a centralized solution to inquire about their production systems in natural language, visualize system activities automatically, and diagnose issues ten times faster.Say goodbye to tedious instrumentation, dashboard setups, and alert tuning—just straightforward answers.We are proud to be backed by pioneers from leading AI organizations, including Vercel, Graphite, Daytona, Browserbase, LangChain, Mastra, Replit, and others.
Who We AreBaton is Ryder’s innovative product development division dedicated to leveraging cutting-edge technologies to transform the transportation and logistics landscape. Managing over $10 billion in freight, our technology has a significant impact across the U.S. economy.We are committed to creating and delivering software that not only meets but exceeds the needs of Ryder and its 50,000+ clients, which includes some of the most recognized brands globally. Our projects range from user-centric applications to the robust data platform that will drive the future of Ryder’s innovations.Baton’s mission: To enable a supply chain that operates on autopilot.Since Ryder’s acquisition of Baton in 2022, we have been operating with the agility of a startup while benefiting from the extensive reach of a Fortune 500 company. If you're passionate about tackling intricate challenges and making a real impact in the backbone of the American economy, you’ll thrive with us.Role: Software Engineer - InfrastructureDepartment: Data PlatformLocation: Hayes Valley, San Francisco, CA
About the Role Anthropic is hiring a Software Engineer to support our Research Data Platform. This position is based in San Francisco, CA or New York City, NY. The team works closely with engineers and researchers to create data solutions that strengthen our research efforts. The work directly supports Anthropic’s mission to develop safe and beneficial AI.
About the TeamJoin the innovative Frontier Systems team at OpenAI, where we design, implement, and maintain the world's largest supercomputers, essential for advancing our most groundbreaking model training initiatives.We transform data center blueprints into operational systems while crafting the software necessary for executing large-scale frontier model trainings.Our mission is to establish, stabilize, and ensure the reliability and efficiency of these hyperscale supercomputers throughout the training of our frontier models.About the RoleWe are seeking passionate engineers to manage the next generation of compute clusters that underpin OpenAI’s frontier research.This position merges distributed systems engineering with practical infrastructure work across our expansive data centers. You will scale Kubernetes clusters to unprecedented levels, automate bare-metal setups, and create the software layer that simplifies the complexity of numerous nodes across various data centers.Your work will be at the crossroads of hardware and software, where speed and reliability are paramount. Be prepared to oversee dynamic operations, swiftly identify and resolve pressing issues, and constantly elevate the standards for automation and uptime.In this role, you will:Provision and scale extensive Kubernetes clusters, including automation for deployment, bootstrapping, and lifecycle managementCreate software abstractions that integrate multiple clusters and provide a cohesive interface for training workloadsOversee node deployment from bare metal to firmware upgrades, ensuring rapid, repeatable setups at scaleEnhance operational metrics by reducing cluster restart times (e.g., from hours to minutes) and expediting firmware and OS upgrade cyclesIntegrate networking and hardware health systems to ensure end-to-end reliability across servers, switches, and data center infrastructureDevelop monitoring and observability systems to identify issues early and maintain cluster stability under high loadsYou might thrive in this role if you:Have extensive experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environmentsPossess strong programming skills in languages relevant to cloud and infrastructure management
Watney Robotics
About Watney RoboticsAt Watney Robotics, we are pioneers in developing autonomous robotic solutions aimed at enhancing critical infrastructure. Recently securing $21 million in seed funding from leading investors such as Conviction, Abstract, and A*, we are collaborating with the world’s largest hyperscalers to propel the expansion of data centers and streamline maintenance processes.This is an extraordinary opportunity to join our team at a pivotal stage as we transition from prototype to large-scale production. Be part of a team that not only ships cutting-edge systems but also plays a crucial role in shaping the operational framework of an innovative robotics company.
OpenAI's research infrastructure group creates and maintains the backbone systems for advanced machine learning model training. This team often goes beyond conventional training methods, developing new infrastructure to support novel research at scale. Their work closely connects systems engineering with research progress, making it possible to run experiments that would otherwise be too slow or complex. Role overview The Research Infrastructure Engineer for Training Systems designs and improves the platforms that power large-scale ML training. This role bridges research concepts and the practical systems that make large model training possible. The work has a direct impact on model release timelines and requires building systems that perform reliably in demanding, real-world scenarios. What you will do Build and maintain infrastructure for large-scale model training and experimentation Design APIs and interfaces to simplify complex training workflows and prevent misuse Enhance reliability, debuggability, and performance across training and data pipelines Troubleshoot issues involving Python, PyTorch, distributed systems, GPUs, networking, and storage Create tests, benchmarks, and diagnostic tools to catch regressions early Requirements Interest in building systems that support new training methods, not just optimizing existing ones Strong instincts in systems engineering, especially regarding performance, reliability, and clean abstractions Experience designing APIs and interfaces for researchers and engineers Ability to work across ML research code and production infrastructure Enjoys evidence-based debugging using profiles, traces, logs, tests, and reproducible cases
Team and Platform Focus The Compute Infrastructure team at OpenAI designs, builds, and maintains the systems that support AI research at scale. This work brings together accelerators, CPUs, networking, storage, data centers, orchestration software, agent infrastructure, developer tools, and observability. The aim is to create a reliable, unified experience for researchers and product teams across the company. Projects span the full stack: capacity planning, cluster lifecycle management, bare-metal automation, and distributed systems. The team manages Kubernetes scheduling, system optimization, high-performance networking, storage, fleet health, reliability, workload profiling, benchmarking, and improvements to the developer experience. Even small improvements in communication, scheduling, hardware efficiency, or debugging can significantly accelerate research. OpenAI matches engineers to areas within Compute Infrastructure that align with their skills and interests. Role Overview This Software Engineer role centers on building and evolving the compute platform that supports OpenAI’s research and products. Candidates may bring expertise in low-level systems, high-performance computing, distributed infrastructure, reliability, CaaS, agent infrastructure, developer platforms, tooling, or infrastructure user experience. The most important qualities are strong analytical skills, the ability to write resilient code, and a collaborative approach that helps colleagues move faster and with more confidence. What You Will Work On Working close to hardware or at the user interaction layer Developing CaaS and agent infrastructure Managing control and data planes that connect the system Bringing new supercomputing capabilities online Optimizing training workloads through profiler traces and benchmarks Improving NCCL and collective communication Analyzing GPUs, NICs, topology, firmware, thermal dynamics, and failure modes Designing abstractions to unify diverse clusters into a single platform Areas of Expertise No one is expected to cover every area listed. Some engineers focus on system performance, kernel or runtime behavior, large-scale networking protocols, RDMA, NCCL, GPU hardware, benchmarking, scheduling, or hardware reliability. Others improve the platform’s usability through APIs, tools, workflows, and developer experience. The team values strong engineering judgment and a drive to advance the field.
About Our TeamAt DoorDash, data drives our success. Our Data Engineering team is pivotal in building robust database solutions tailored for diverse applications, including reporting, product analytics, marketing optimization, and financial reporting. By architecting pipelines, data structures, and data warehouse environments, we enable data-driven decision-making across the organization.About the RoleWe are seeking a talented Software Engineer II to join our team as a technical leader, responsible for scaling our data infrastructure, enhancing automation, and developing tools to support our expanding business needs.What You Will DoCollaborate with business partners and stakeholders to gather and understand data requirements.Work alongside engineering, product teams, and external partners to ensure seamless data collection.Design, develop, and implement high-performance data models and pipelines for our Data Lake and Data Warehouse.Establish and execute data quality checks, conduct thorough QA, and implement monitoring routines.Enhance the reliability and scalability of our ETL processes.Manage a suite of data products that deliver accurate and trustworthy data.Support and onboard new engineers as they join our team.What We Are Looking For3+ years of professional experience in data engineering, business intelligence, or a related field.Proficiency in programming languages such as Python and Java.3+ years of experience with ETL orchestration and workflow management tools, including Airflow, Flink, Oozie, and Azkaban, using AWS/GCP platforms.Strong understanding of database fundamentals, SQL, and distributed computing.3+ years of experience with distributed data ecosystems (e.g., Spark, Hive, Druid, Presto) and streaming technologies like Kafka and Flink.Experience with Snowflake, Redshift, PostgreSQL, and/or other database management systems.Excellent communication skills with a proven ability to liaise with both technical and non-technical teams.Familiarity with reporting tools such as Tableau, Superset, and Looker.Able to thrive in a fast-paced and dynamic environment.
About the TeamJoin OpenAI's Privacy Engineering team, where we operate at the vital crossroads of Security, Privacy, Legal, and Core Infrastructure. Our mission is to develop cutting-edge data infrastructure and systems that empower our privacy, legal, and security teams to operate securely, swiftly, and at scale. We adhere to principles of defensibility by default, enabling impactful research, and fostering a robust security culture in preparation for transformative technologies.About the RoleWe are seeking a talented Software Engineer to design and implement technical systems that facilitate legal compliance workflows, including secure data processing and document review. In this role, you will collaborate closely with Legal, Security, IT, and engineering teams to translate legal processes into actionable technical workflows. This position is perfect for an engineer passionate about large-scale data challenges and who understands the meticulousness required in ensuring compliance.Located in the vibrant city of San Francisco, we offer relocation assistance for qualified candidates.Key Responsibilities:Design and maintain scalable data storage pipelines.Develop search and discovery services (e.g., Spark/Databricks, index layers, metadata catalogs) tailored to partner team requirements.Automate secure data transfers, including encryption, checksumming, and auditing exports to reviewers.Establish secure compute environments that balance usability with stringent security controls.Implement monitoring and KPIs to ensure accountability of data holds and productions.Work cross-functionally to document SOPs, threat models, and chain-of-custody documentation that can withstand scrutiny.Ideal Candidates Will:Possess practical experience in building or operating large-scale data-lake or backup systems (Azure, AWS, GCP).Be proficient with Terraform or Pulumi, CI/CD processes, and capable of converting ad-hoc legal requests into repeatable pipelines.Be comfortable working with discovery workflows (legal holds, enterprise document collections, secure review) or eager to quickly gain expertise.Effectively communicate technical concepts—from storage governance to block-ID APIs—to interdisciplinary teams such as Legal and Engineering.
Sign in to browse more jobs
Create account — see all 6,605 results

