Senior Staff Engineer - Reinforcement Learning Infrastructure (Cybersecurity)

BugcrowdRemote - US

Remote Full-time $176.4K/yr - $242.6K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

The ideal candidate will thrive in building high-performance systems that support innovative AI research. This position emphasizes the creation of systems that produce reinforcement learning environments, rather than solely developing the environments themselves. You will architect pipelines that ingest software projects, analyze them using Bugcrowd’s Mayhem platform, and automatically generate training environments utilized by premier AI labs including Anthropic, OpenAI, and Cohere. To excel in this role, candidates should possess:Strong understanding of reinforcement learning workflowsExperience in constructing clean, reproducible Linux ML environments (containers, MCP, etc.)Expertise in systems engineering and software development

About the job

Job Summary

The Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.

In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.

Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.

About Bugcrowd

Bugcrowd is a pioneering leader in cybersecurity solutions, committed to helping organizations protect their digital assets through innovative crowdsourced security. Our extensive network of skilled ethical hackers collaborates with clients to identify vulnerabilities and provide actionable insights. As we continue to grow, we remain focused on leveraging advanced technology to stay ahead of emerging threats.

Similar jobs

1 - 20 of 5,619 Jobs

Search for Research Engineer Reinforcement Learning

5,619 results

Select all on this page (20)

Apply

Research Engineer with a Focus on Reinforcement Learning

firecrawl

Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)

Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.

Mar 18, 2026

Apply

Research Engineer - Reinforcement Learning Infrastructure

primeintellect

Full-time|On-site|San Francisco

Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.

Mar 27, 2026

Apply

Research Engineer - Reinforcement Learning

Prime Intellect

FullTime|On-site|San Francisco

Pioneering the Future of Open SuperintelligenceAt Prime Intellect, we are on a mission to construct the open superintelligence ecosystem, encompassing cutting-edge agentic models alongside the infrastructure that empowers individuals to create, train, and deploy them seamlessly. We unify global computational resources into an intuitive control plane, complemented by a comprehensive reinforcement learning post-training suite, including dynamic environments, secure sandboxes, verifiable evaluations, and our innovative asynchronous RL trainer. Our platform empowers researchers, startups, and enterprises to execute end-to-end reinforcement learning at unprecedented scales, allowing for the adaptation of models to diverse tools, workflows, and deployment scenarios.As a Research Engineer within our Reasoning team, you will be instrumental in driving our technological vision, particularly in the area of test-time compute scaling research. If you thrive on harnessing synthetic data to enhance LLM reasoning capabilities, we want to hear from you!Discover more about our exciting project by visiting our insight on decentralized training in the inference-compute paradigm.

Aug 19, 2024

Apply

Research Engineer, Reinforcement Learning Infrastructure and Reliability

Anthropic

Full-time|On-site|San Francisco, CA

Anthropic is hiring a Research Engineer focused on Reinforcement Learning Infrastructure and Reliability. This role is based in San Francisco, CA. Role overview This position centers on building and maintaining systems essential to AI research. The work supports Anthropic’s reinforcement learning efforts, with an emphasis on infrastructure stability and performance. What you will do Collaborate with a team of specialists to develop and support key systems for AI research. Improve the reliability and efficiency of infrastructure supporting reinforcement learning projects. Apply technical expertise to advance Anthropic’s AI capabilities. Team environment Work alongside engineers and researchers dedicated to advancing AI reliability and performance. The team values collaboration and aims to enable new research while maintaining the stability of Anthropic’s core systems.

Apr 23, 2026

Apply

Infrastructure Research Engineer - Reinforcement Learning Systems

Thinking Machines Lab

Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We're dedicated to crafting a future where everyone can harness the power of AI to meet their unique needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, including ChatGPT and Character.ai, as well as open-weight models like Mistral, in addition to renowned open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented Infrastructure Research Engineer to architect and develop the foundational systems that facilitate the scalable and efficient training of large models using reinforcement learning.This position exists at the crossroads of research and large-scale systems engineering, requiring a professional who not only comprehends the algorithms behind reinforcement learning but also appreciates the practicalities of distributed training and inference at scale. You will have a diverse set of responsibilities, from optimizing rollout and reward pipelines to enhancing the reliability, observability, and orchestration of systems. Collaboration with researchers and infrastructure teams will be essential to ensure reinforcement learning is stable, rapid, and production-ready.Note: This is an evergreen role that we maintain on an ongoing basis to express interest. Due to the high volume of applications we receive, there may not always be an immediate position that aligns perfectly with your skills and experience. We encourage you to apply, as we continuously review applications and reach out to candidates when new opportunities arise. You may reapply after gaining more experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles that cater to unique project or team needs; in those circumstances, you are welcome to apply directly alongside this evergreen role.What You’ll DoDesign, implement, and optimize the infrastructure that supports large-scale reinforcement learning and post-training workloads.Enhance the reliability and scalability of the RL training pipeline, including distributed RL workloads and training throughput.Create shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility of RL systems.Work closely with researchers to translate algorithmic concepts into production-quality training pipelines.Develop evaluation and benchmarking infrastructure to assess model performance based on helpfulness, safety, and factual accuracy.Publish and disseminate insights through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.

Nov 27, 2025

Apply

Software Engineer - Reinforcement Learning Research & Environments

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.Role OverviewAs a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.Key ResponsibilitiesDesign and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.Develop and sustain evaluation frameworks that identify long-context failure modes.Create reward signals and training environments aimed at specific capability advancements.Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.Enhance the reliability and observability of post-training data and environment pipelines.Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.Ideal Candidate ProfileSolid foundation in software engineering principles.Proven experience in developing or managing large-scale data or machine learning systems.Strong analytical skills and a passion for tackling complex problems.

Nov 8, 2024

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Reinforcement Learning Software Engineer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.

Mar 18, 2026

Apply

Anthropic Fellows Program - Focus on Reinforcement Learning

Anthropic

Full-time|Hybrid|London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA

Join the prestigious Anthropic Fellows Program, where you'll have the opportunity to delve into cutting-edge research in Reinforcement Learning. This program is designed for individuals passionate about advancing AI safety and developing innovative solutions. As a fellow, you will collaborate with a team of experts, engage in impactful projects, and contribute to a progressive research environment.

Apr 10, 2026

Apply

Research Engineer in Performance Reinforcement Learning

Anthropic

Full-time|On-site|San Francisco, CA

Join the innovative team at Anthropic as a Research Engineer specializing in Performance Reinforcement Learning. In this role, you will contribute to cutting-edge research that directly influences the development of advanced AI systems. Collaborate with a talented group of engineers and researchers, leveraging your expertise to enhance our algorithms and improve overall performance.

Mar 23, 2026

Apply

Machine Learning Engineer - Imitation & Reinforcement Learning for Robotics

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.

Jan 31, 2026

Apply

AI Engineer - Specializing in Reinforcement Learning

yutori

Full-time|On-site|San Francisco, California, United States

At Yutori, we are revolutionizing the way individuals engage with the online world by developing AI agents that can seamlessly manage everyday digital tasks. Our mission is to create a fully integrated agent-first ecosystem, encompassing everything from training proprietary models to designing intuitive generative product interfaces.We invite a passionate and skilled AI Engineer to join our founding team and contribute to our vision of building superhuman AI agents capable of performing actions across the web.Our founders—Devi Parikh, Abhishek Das, and Dhruv Batra—bring decades of expertise in AI research and product development from their tenure at Meta, focusing on generative, multimodal, and embodied AI. Our diverse team blends advanced AI knowledge with innovative product design to execute Yutori's ambitious mission.Supported by an exceptional group of visionary investors—including Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, and others—Yutori is poised for remarkable growth and development.

Mar 26, 2025

Apply

Senior Deep Reinforcement Learning Engineer

DoorDash, Inc.

Full-time|On-site|San Francisco, CA

Join DoorDash as a Senior Deep Reinforcement Learning Engineer and play a pivotal role in revolutionizing the logistics and delivery industry through cutting-edge AI solutions. In this position, you will leverage your expertise in deep reinforcement learning to develop advanced algorithms that optimize our delivery processes and enhance customer experience.

Mar 26, 2026

Apply

Reinforcement Learning Environments Engineer (Contractor, Remote)

Preference Model

Contract|Remote|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.About the RoleWe are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.

Mar 18, 2026

Apply

Research Engineer/Research Scientist, RL/Reasoning

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamJoin the forefront of AI innovation with the RL and Reasoning team at OpenAI. Our team is dedicated to advancing reinforcement learning research and has pioneered transformative projects, including o1 and o3. We are committed to pushing the limits of generative models while ensuring their scalable deployment.About the RoleAs a Research Engineer/Research Scientist at OpenAI, you will play a pivotal role in enhancing AI alignment and capabilities through state-of-the-art reinforcement learning techniques. Your contributions will be essential in training intelligent, aligned, and versatile agents that power various AI models.We seek individuals with a solid foundation in reinforcement learning research, agile coding skills, and a passion for rapid iteration.This position is located in San Francisco, CA, and follows a hybrid work model of three days in the office per week. We also provide relocation assistance for new hires.You may excel in this role if:You are enthusiastic about being at the cutting edge of RL and language model research.You take initiative, owning ideas and driving them to fruition.You value principled methodologies, conducting simple experiments in controlled environments to draw trustworthy conclusions.You thrive in a fast-paced, complex technical environment where rapid iteration is essential.You are adept at navigating extensive ML codebases to troubleshoot and enhance them.You possess a profound understanding of machine learning and its applications.About OpenAIOpenAI is a pioneering AI research and deployment organization committed to ensuring that general-purpose artificial intelligence serves the greater good for humanity. We strive to push the boundaries of AI system capabilities while prioritizing safe deployment through our innovative products. We recognize AI as a powerful tool that must be developed with safety and human-centric principles, embracing diverse perspectives to reflect the full spectrum of humanity.We are proud to be an equal opportunity employer, welcoming applicants from all backgrounds without discrimination based on race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or any other legally protected characteristic.

May 14, 2025

Apply

Reinforcement Learning Environments Engineer Summer Intern

Preference Model

Internship|Remote|San Francisco

Location: Preference for San Francisco, but remote candidates are welcome to apply.Duration: This internship will last for 10-12 weeks during Summer 2026.Compensation: This is a paid internship opportunity.About UsAt Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.About the RoleWe are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.What You'll DoDesign and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.Produce clean, production-quality Python code (not just notebooks).Utilize Docker to create reproducible environments and troubleshoot issues as they arise.Translate ML research papers and concepts into actionable training tasks.Who We're Looking ForYou are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.Must-Have Qualifications:Proficient in Python programming.Understanding of large language models (LLMs), their strengths, and limitations.Self-motivated and capable of taking feedback to iterate quickly.Preferred Qualifications:Familiarity with transformer architecture and experience with training or inference code.Experience in writing CUDA kernels or engaging in low-level GPU programming.Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).A passion for continuous learning and research in the field of AI.

Mar 18, 2026

Apply

Staff Machine Learning Research Scientist/Engineer, Agents

Scale AI

Full-time|$275K/yr - $350K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY

About Scale AI At Scale AI, we are dedicated to propelling the advancement of AI applications. Over the past eight years, we have established ourselves as the premier AI data foundry, supporting groundbreaking innovations in fields such as generative AI, defense technologies, and autonomous vehicles. Following our recent Series F funding round, we are intensifying our efforts to harness frontier data, paving the way toward achieving Artificial General Intelligence (AGI). Our work with enterprise clients and governments has enhanced our model evaluation capabilities, allowing us to expand our offerings for both public and private evaluations. About the ACE Team The Agent Capabilities & Environments (ACE) team, a vital part of Scale’s Research organization, unites customer-focused Researchers and Applied AI Engineers. Our primary mission is to conduct research on agent environments and reinforcement learning reward signals, benchmark autonomous agent performance in real-world contexts, and develop robust data programs aimed at enhancing the capabilities of Large Language Models (LLMs). We are committed to creating foundational tools and frameworks for evaluating models as agents, focusing on autonomous agents that interact dynamically with a wide range of external environments, including code repositories and GUI interfaces. About This Role This position sits at the cutting edge of AI research and its practical applications, concentrating on the data types necessary for the development of state-of-the-art agents, including browser and software engineering agents. The ideal candidate will investigate the data landscape required to propel intelligent and adaptable AI agents, steering the data strategy at Scale to foster innovation. This role demands not only expertise in LLM agents and planning algorithms but also creative problem-solving skills to tackle novel challenges pertaining to data, interaction, and evaluation. You will contribute to influential research publications on agents, collaborate with customer researchers, and partner with the engineering team to transform these advancements into scalable real-world solutions.

Mar 26, 2026

Apply

Senior Staff Engineer - Reinforcement Learning Infrastructure (Cybersecurity)

Bugcrowd

Full-time|$176.4K/yr - $242.6K/yr|Remote|Remote - US

At Bugcrowd, we are redefining the landscape of cybersecurity. Since our inception in 2012, we have been committed to empowering organizations to regain control and stay ahead of cyber threats. By harnessing the collective creativity and expertise of our clients and an elite network of hackers, we leverage our patented AI-driven Security Knowledge Platform™. Our diverse community of hackers excels in uncovering vulnerabilities, swiftly adapting to the evolving threat landscape, including zero-day exploits. With our innovative CrowdMatch™ technology, we provide scalable, tailored solutions to enhance your security posture. Join us as we usher in a new era of crowdsourced security that outpaces cyber adversaries. For more information, visit www.bugcrowd.com. Headquartered in San Francisco and New Hampshire, Bugcrowd is supported by leading investors including General Catalyst, Rally Ventures, and Costanoa Ventures.Job SummaryThe Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.

Mar 16, 2026

Apply

Senior Engineering Manager, Reinforcement Learning Environments

Handshake AI

Full-time|On-site|San Francisco, CA

About HandshakeHandshake is the premier career network tailored for the AI economy, serving over 20 million knowledge workers, 1,600 educational institutions, and 1 million employers, including all Fortune 50 companies. Our platform is trusted for career discovery, recruitment, and professional development, facilitating opportunities ranging from freelance AI training roles to full-time positions. Our unique value proposition is driving remarkable growth, with an expectation to triple our Annual Recurring Revenue (ARR) by 2025.Why is now the best time to join Handshake?Be a key player in shaping the future of careers within the AI economy, creating tangible impacts for your community.Collaborate closely with leading AI research labs, Fortune 500 partners, and top-tier educational institutions.Join a team enriched by leaders from renowned organizations such as Scale AI, Meta, xAI, Notion, Coinbase, and Palantir.Contribute to building a rapidly growing business projected to generate billions in revenue.About the RoleWe are looking for an experienced Senior Engineering Manager to lead our dynamic Reinforcement Learning Environments (RLE) team.The RLE team creates innovative sandbox environments where cutting-edge AI models can learn comprehensive, end-to-end workflows. These environments replicate real-world professional fields such as software engineering, finance, and legal research, complete with realistic tools, constraints, and feedback mechanisms. Rather than relying on static examples, models engage in practical tasks: navigating multi-step processes, utilizing domain-specific tools, managing uncertainty, and optimizing for real-world results.Researchers leverage these environments and the data produced to train state-of-the-art models using reinforcement learning based on execution—focusing not just on predictions but on task fulfillment, quality, and resilience in complex workflows.As a Senior Engineering Manager, you will define the technical direction and long-term strategy of this vital platform. You will lead a growing team of 8-9 engineers and are expected to manage an Engineering Manager in the near future. This strategic role intersects platform engineering, applied AI infrastructure, research tooling, and human-in-the-loop operational systems.Location: San Francisco, CA | 5 days/week in-office

Feb 18, 2026

Apply

Senior Software Engineer - Reinforcement Learning Environments

Preference Model

Full-time|On-site|San Francisco

Preference Model develops reinforcement learning environments that mirror the complexity of real-world tasks. The company focuses on building diverse RL tasks and detailed reward structures, aiming to push the boundaries of artificial intelligence. The founding team brings experience from developing data infrastructure and datasets for Claude at Anthropic, and Preference Model works closely with top AI research labs. Role overview The Senior Software Engineer - Reinforcement Learning Environments position centers on designing and delivering RL environments that challenge and improve current AI models. This role involves leading complex projects, including multi-step workflows and realistic stakeholder interactions, within a large codebase. Engineers work directly with the founders and a small, collaborative team, delivering environments used for training advanced models at partner labs. The position provides significant autonomy, regular feedback, and support for professional development. What you will do Design, build, and iterate on reinforcement learning tasks, taking them from concept through evaluation. Lead the development of sophisticated environments, focusing on complex workflows and coding standards. Interact with coding agents, review their outputs, and identify subtle failures. Analyze whether issues stem from model limitations or environment design, then redesign tasks to reveal deeper failure modes. Contribute to building and maintaining the core infrastructure and tools for the environments team. Mentor junior engineers as the team expands. Location This role is based in San Francisco.

Apr 24, 2026

Create account — see all 5,619 results