Software Engineer - Reinforcement Learning Research & Environments

Magic.devSan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

The ideal candidate will possess a strong background in software engineering and demonstrate expertise in large-scale data or machine learning systems. Your analytical skills and passion for solving complex issues will be crucial in our mission to enhance AGI capabilities.

About the job

Role Overview

As a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.

This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.

Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.

This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.

Key Responsibilities

Design and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.
Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.
Develop and sustain evaluation frameworks that identify long-context failure modes.
Create reward signals and training environments aimed at specific capability advancements.
Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.
Enhance the reliability and observability of post-training data and environment pipelines.
Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.

Ideal Candidate Profile

Solid foundation in software engineering principles.
Proven experience in developing or managing large-scale data or machine learning systems.
Strong analytical skills and a passion for tackling complex problems.

About Magic.dev

Magic.dev is at the forefront of artificial intelligence research, dedicated to building safe AGI that accelerates progress on the world's most pressing problems. Our innovative approach is driven by a commitment to leveraging cutting-edge technology and methodologies to achieve our mission.

Similar jobs

1 - 20 of 5,428 Jobs

Search for Reinforcement Learning Environment Engineer

5,428 results

Select all on this page (20)

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Software Engineer - Reinforcement Learning Research & Environments

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.Role OverviewAs a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.Key ResponsibilitiesDesign and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.Develop and sustain evaluation frameworks that identify long-context failure modes.Create reward signals and training environments aimed at specific capability advancements.Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.Enhance the reliability and observability of post-training data and environment pipelines.Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.Ideal Candidate ProfileSolid foundation in software engineering principles.Proven experience in developing or managing large-scale data or machine learning systems.Strong analytical skills and a passion for tackling complex problems.

Nov 8, 2024

Apply

Reinforcement Learning Environments Engineer (Contractor, Remote)

Preference Model

Contract|Remote|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.About the RoleWe are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.

Mar 18, 2026

Apply

Reinforcement Learning Environments Engineer Summer Intern

Preference Model

Internship|Remote|San Francisco

Location: Preference for San Francisco, but remote candidates are welcome to apply.Duration: This internship will last for 10-12 weeks during Summer 2026.Compensation: This is a paid internship opportunity.About UsAt Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.About the RoleWe are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.What You'll DoDesign and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.Produce clean, production-quality Python code (not just notebooks).Utilize Docker to create reproducible environments and troubleshoot issues as they arise.Translate ML research papers and concepts into actionable training tasks.Who We're Looking ForYou are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.Must-Have Qualifications:Proficient in Python programming.Understanding of large language models (LLMs), their strengths, and limitations.Self-motivated and capable of taking feedback to iterate quickly.Preferred Qualifications:Familiarity with transformer architecture and experience with training or inference code.Experience in writing CUDA kernels or engaging in low-level GPU programming.Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).A passion for continuous learning and research in the field of AI.

Mar 18, 2026

Apply

Reinforcement Learning Environment Reviewer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to fuel the evolution of AI technology. Although today's models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning (RL) environments where models face research and engineering challenges, allowing them to iterate and learn from realistic feedback loops.Our founding team boasts experience from Anthropic’s data division, where we built data infrastructure, tokenizers, and datasets for Claude. Collaborating with leading AI labs, we aim to bring AI closer to its transformative potential, supported by a16z.About the RoleEvery RL environment we deploy must withstand a model actively attempting to exploit it. A task with a weak evaluation or an easily exploitable reward signal is counterproductive; it teaches the model to cheat instead of reason. We seek an individual dedicated to identifying these vulnerabilities before the model does.We have learned that domain knowledge alone does not make an effective reviewer. The ideal candidate is someone who has engaged in adversarial thinking: designing challenging problems that are difficult to exploit, dismantling others’ tasks, or directly researching reward hacking.Your ResponsibilitiesReview RL environments and training tasks for accuracy, robustness, and resistance to reward hacking.Identify potential ways a model could exploit grading systems, manipulate evaluation criteria, or bypass intended reasoning.Collaborate with environment authors to enhance grading systems, rectify reward signals, and redesign ineffective tasks.Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks monthly.Provide guidance on grader design during the planning phase of environments, ensuring quality before task construction.Who We Are Looking ForYou think like an attacker and have spent considerable time crafting problems that are challenging to exploit or deconstructing seemingly solid issues. A fundamental understanding of machine learning is essential, enabling you to anticipate model strategies, combined with enough engineering insight to assess whether a grader effectively tests its criteria.

Mar 18, 2026

Apply

Senior Engineering Manager, Reinforcement Learning Environments

Handshake AI

Full-time|On-site|San Francisco, CA

About HandshakeHandshake is the premier career network tailored for the AI economy, serving over 20 million knowledge workers, 1,600 educational institutions, and 1 million employers, including all Fortune 50 companies. Our platform is trusted for career discovery, recruitment, and professional development, facilitating opportunities ranging from freelance AI training roles to full-time positions. Our unique value proposition is driving remarkable growth, with an expectation to triple our Annual Recurring Revenue (ARR) by 2025.Why is now the best time to join Handshake?Be a key player in shaping the future of careers within the AI economy, creating tangible impacts for your community.Collaborate closely with leading AI research labs, Fortune 500 partners, and top-tier educational institutions.Join a team enriched by leaders from renowned organizations such as Scale AI, Meta, xAI, Notion, Coinbase, and Palantir.Contribute to building a rapidly growing business projected to generate billions in revenue.About the RoleWe are looking for an experienced Senior Engineering Manager to lead our dynamic Reinforcement Learning Environments (RLE) team.The RLE team creates innovative sandbox environments where cutting-edge AI models can learn comprehensive, end-to-end workflows. These environments replicate real-world professional fields such as software engineering, finance, and legal research, complete with realistic tools, constraints, and feedback mechanisms. Rather than relying on static examples, models engage in practical tasks: navigating multi-step processes, utilizing domain-specific tools, managing uncertainty, and optimizing for real-world results.Researchers leverage these environments and the data produced to train state-of-the-art models using reinforcement learning based on execution—focusing not just on predictions but on task fulfillment, quality, and resilience in complex workflows.As a Senior Engineering Manager, you will define the technical direction and long-term strategy of this vital platform. You will lead a growing team of 8-9 engineers and are expected to manage an Engineering Manager in the near future. This strategic role intersects platform engineering, applied AI infrastructure, research tooling, and human-in-the-loop operational systems.Location: San Francisco, CA | 5 days/week in-office

Feb 18, 2026

Apply

Senior Software Engineer - Reinforcement Learning Environments

Preference Model

Full-time|On-site|San Francisco

Preference Model develops reinforcement learning environments that mirror the complexity of real-world tasks. The company focuses on building diverse RL tasks and detailed reward structures, aiming to push the boundaries of artificial intelligence. The founding team brings experience from developing data infrastructure and datasets for Claude at Anthropic, and Preference Model works closely with top AI research labs. Role overview The Senior Software Engineer - Reinforcement Learning Environments position centers on designing and delivering RL environments that challenge and improve current AI models. This role involves leading complex projects, including multi-step workflows and realistic stakeholder interactions, within a large codebase. Engineers work directly with the founders and a small, collaborative team, delivering environments used for training advanced models at partner labs. The position provides significant autonomy, regular feedback, and support for professional development. What you will do Design, build, and iterate on reinforcement learning tasks, taking them from concept through evaluation. Lead the development of sophisticated environments, focusing on complex workflows and coding standards. Interact with coding agents, review their outputs, and identify subtle failures. Analyze whether issues stem from model limitations or environment design, then redesign tasks to reveal deeper failure modes. Contribute to building and maintaining the core infrastructure and tools for the environments team. Mentor junior engineers as the team expands. Location This role is based in San Francisco.

Apr 24, 2026

Apply

Software Engineer II - Reinforcement Learning Environments

Handshake

Full-time|On-site|San Francisco, CA

Handshake connects over 20 million professionals, 1,600 educational institutions, and 1 million employers, including every Fortune 50 company, through its career network focused on the AI economy. The platform supports everything from freelance AI training projects to full-time positions and is on a path to triple annual recurring revenue by 2025. The Software Engineer II - Reinforcement Learning Environments role centers on building and evolving Handshake’s RLE platform. This platform enables advanced AI models to learn and tackle real-world challenges. The position is based in San Francisco, CA, and requires in-office work five days a week. What you will do Develop and enhance core components for reinforcement learning environments and their infrastructure. Design and implement backend systems and efficient data pipelines. Translate complex product and research requirements into reliable, working systems. Create modular, reusable workflow domains to streamline processes. Focus on improving system reliability, observability, and performance tracking. Requirements 4–6 years of experience in backend development, distributed systems, or machine learning infrastructure. Strong skills with ReactJS and TypeScript. Experience working with relational databases, especially PostgreSQL, and data modeling. Familiarity with AWS or GCP and CI/CD practices. Ability to manage projects independently from planning through production. Bonus points Background in simulation systems or performance optimization. Why join Handshake? Help shape the future of careers in the AI sector and make a measurable impact. Work directly with top AI labs, Fortune 500 companies, and leading universities. Join a team with alumni from Scale AI, Meta, xAI, Notion, Coinbase, and Palantir. Be part of a business experiencing significant revenue growth.

Apr 28, 2026

Apply

Research Engineer with a Focus on Reinforcement Learning

firecrawl

Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)

Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.

Mar 18, 2026

Apply

Reinforcement Learning Software Engineer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.

Mar 18, 2026

Apply

Machine Learning Engineer - Imitation & Reinforcement Learning for Robotics

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.

Jan 31, 2026

Apply

Research Engineer - Reinforcement Learning Infrastructure

primeintellect

Full-time|On-site|San Francisco

Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.

Mar 27, 2026

Apply

AI Engineer - Specializing in Reinforcement Learning

yutori

Full-time|On-site|San Francisco, California, United States

At Yutori, we are revolutionizing the way individuals engage with the online world by developing AI agents that can seamlessly manage everyday digital tasks. Our mission is to create a fully integrated agent-first ecosystem, encompassing everything from training proprietary models to designing intuitive generative product interfaces.We invite a passionate and skilled AI Engineer to join our founding team and contribute to our vision of building superhuman AI agents capable of performing actions across the web.Our founders—Devi Parikh, Abhishek Das, and Dhruv Batra—bring decades of expertise in AI research and product development from their tenure at Meta, focusing on generative, multimodal, and embodied AI. Our diverse team blends advanced AI knowledge with innovative product design to execute Yutori's ambitious mission.Supported by an exceptional group of visionary investors—including Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, and others—Yutori is poised for remarkable growth and development.

Mar 26, 2025

Apply

Senior Deep Reinforcement Learning Engineer

DoorDash, Inc.

Full-time|On-site|San Francisco, CA

Join DoorDash as a Senior Deep Reinforcement Learning Engineer and play a pivotal role in revolutionizing the logistics and delivery industry through cutting-edge AI solutions. In this position, you will leverage your expertise in deep reinforcement learning to develop advanced algorithms that optimize our delivery processes and enhance customer experience.

Mar 26, 2026

Apply

Senior Staff Engineer - Reinforcement Learning Infrastructure (Cybersecurity)

Bugcrowd

Full-time|$176.4K/yr - $242.6K/yr|Remote|Remote - US

At Bugcrowd, we are redefining the landscape of cybersecurity. Since our inception in 2012, we have been committed to empowering organizations to regain control and stay ahead of cyber threats. By harnessing the collective creativity and expertise of our clients and an elite network of hackers, we leverage our patented AI-driven Security Knowledge Platform™. Our diverse community of hackers excels in uncovering vulnerabilities, swiftly adapting to the evolving threat landscape, including zero-day exploits. With our innovative CrowdMatch™ technology, we provide scalable, tailored solutions to enhance your security posture. Join us as we usher in a new era of crowdsourced security that outpaces cyber adversaries. For more information, visit www.bugcrowd.com. Headquartered in San Francisco and New Hampshire, Bugcrowd is supported by leading investors including General Catalyst, Rally Ventures, and Costanoa Ventures.Job SummaryThe Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.

Mar 16, 2026

Apply

Research Engineer - Reinforcement Learning

Prime Intellect

FullTime|On-site|San Francisco

Pioneering the Future of Open SuperintelligenceAt Prime Intellect, we are on a mission to construct the open superintelligence ecosystem, encompassing cutting-edge agentic models alongside the infrastructure that empowers individuals to create, train, and deploy them seamlessly. We unify global computational resources into an intuitive control plane, complemented by a comprehensive reinforcement learning post-training suite, including dynamic environments, secure sandboxes, verifiable evaluations, and our innovative asynchronous RL trainer. Our platform empowers researchers, startups, and enterprises to execute end-to-end reinforcement learning at unprecedented scales, allowing for the adaptation of models to diverse tools, workflows, and deployment scenarios.As a Research Engineer within our Reasoning team, you will be instrumental in driving our technological vision, particularly in the area of test-time compute scaling research. If you thrive on harnessing synthetic data to enhance LLM reasoning capabilities, we want to hear from you!Discover more about our exciting project by visiting our insight on decentralized training in the inference-compute paradigm.

Aug 19, 2024

Apply

Research Engineer, Reinforcement Learning Infrastructure and Reliability

Anthropic

Full-time|On-site|San Francisco, CA

Anthropic is hiring a Research Engineer focused on Reinforcement Learning Infrastructure and Reliability. This role is based in San Francisco, CA. Role overview This position centers on building and maintaining systems essential to AI research. The work supports Anthropic’s reinforcement learning efforts, with an emphasis on infrastructure stability and performance. What you will do Collaborate with a team of specialists to develop and support key systems for AI research. Improve the reliability and efficiency of infrastructure supporting reinforcement learning projects. Apply technical expertise to advance Anthropic’s AI capabilities. Team environment Work alongside engineers and researchers dedicated to advancing AI reliability and performance. The team values collaboration and aims to enable new research while maintaining the stability of Anthropic’s core systems.

Apr 23, 2026

Apply

Full-Stack Software Engineer specializing in Reinforcement Learning

Anthropic

Full-time|On-site|San Francisco, CA | New York City, NY

Role Overview Anthropic is looking for a Full-Stack Software Engineer with a focus on reinforcement learning. This role centers on building applications that use reinforcement learning to advance artificial intelligence. Work closely with skilled teammates to create software that makes a real impact. Location San Francisco, CA or New York City, NY

Apr 14, 2026

Apply

Infrastructure Research Engineer - Reinforcement Learning Systems

Thinking Machines Lab

Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We're dedicated to crafting a future where everyone can harness the power of AI to meet their unique needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, including ChatGPT and Character.ai, as well as open-weight models like Mistral, in addition to renowned open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented Infrastructure Research Engineer to architect and develop the foundational systems that facilitate the scalable and efficient training of large models using reinforcement learning.This position exists at the crossroads of research and large-scale systems engineering, requiring a professional who not only comprehends the algorithms behind reinforcement learning but also appreciates the practicalities of distributed training and inference at scale. You will have a diverse set of responsibilities, from optimizing rollout and reward pipelines to enhancing the reliability, observability, and orchestration of systems. Collaboration with researchers and infrastructure teams will be essential to ensure reinforcement learning is stable, rapid, and production-ready.Note: This is an evergreen role that we maintain on an ongoing basis to express interest. Due to the high volume of applications we receive, there may not always be an immediate position that aligns perfectly with your skills and experience. We encourage you to apply, as we continuously review applications and reach out to candidates when new opportunities arise. You may reapply after gaining more experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles that cater to unique project or team needs; in those circumstances, you are welcome to apply directly alongside this evergreen role.What You’ll DoDesign, implement, and optimize the infrastructure that supports large-scale reinforcement learning and post-training workloads.Enhance the reliability and scalability of the RL training pipeline, including distributed RL workloads and training throughput.Create shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility of RL systems.Work closely with researchers to translate algorithmic concepts into production-quality training pipelines.Develop evaluation and benchmarking infrastructure to assess model performance based on helpfulness, safety, and factual accuracy.Publish and disseminate insights through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.

Nov 27, 2025

Apply

Reinforcement Learning Engineer at Code Metal AI | Remote

Code Metal AI

Full-time|Remote|Remote — San Francisco, California, United States

Join Code Metal AI's elite team, comprised of talents from MIT, OpenAI, and other esteemed organizations, as we lead the charge in pioneering large language models (LLMs) and advanced code generation techniques. Our innovative projects engage with top-tier chip manufacturers, leveraging cutting-edge AI to tackle significant, real-world challenges.This position serves as a critical link between two essential domains:Production Responsibilities:Establish and uphold resilient distributed training systems utilizing PyTorch (2+ years of experience required).Design and execute scalable data curation and quality assurance pipelines to ensure high-quality training datasets.Create orchestration tools that streamline complex workflows for large-scale AI model training and evaluation.Research Responsibilities:Lead the innovation in developing evaluation frameworks and reinforcement learning solutions, emphasizing recent advancements in Reinforcement Learning with Human Feedback (RLHF).Engage with cutting-edge research through open-source contributions and potential publications, focusing on applying RLHF to LLMs, particularly in code generation tasks.Qualifications:Minimum of 2 years of experience in distributed training, preferably using PyTorch.Strong foundation in reinforcement learning, with recent RLHF experience being highly preferred.Demonstrated ability to construct data curation and quality assurance pipelines.Experience in developing evaluation frameworks.Ideally, familiarity with both data pipeline and orchestration aspects.Eligibility for TS/SCI clearance.Preferred Qualifications:Contributions to open-source AI or ML initiatives.Published research or experience in relevant fields.Hands-on experience implementing RLHF to LLMs, especially for code generation.Experience in large-scale synthetic data generation.Benefits:Comprehensive healthcare plan with 100% premium coverage, including medical, dental, and vision.401k plan with 5% matching contribution.Unlimited Paid Time Off, along with Sick leave and Public Holidays.Flexible hybrid work arrangement.Relocation assistance for eligible employees.

Aug 11, 2025

Create account — see all 5,428 results