Reinforcement Learning Environment Reviewer

Preference ModelSan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Required QualificationsProven experience in adversarial or constructive problem design: such as authoring competitive programming problems (ICPC, Codeforces, etc.), designing CTF challenges, or similar. Familiarity with reinforcement learning, reward mechanisms, and evaluation strategies. Ability to think critically and creatively to identify vulnerabilities in existing tasks.

About the job

About Us

At Preference Model, we are pioneering the next generation of training data to fuel the evolution of AI technology. Although today's models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning (RL) environments where models face research and engineering challenges, allowing them to iterate and learn from realistic feedback loops.

Our founding team boasts experience from Anthropic’s data division, where we built data infrastructure, tokenizers, and datasets for Claude. Collaborating with leading AI labs, we aim to bring AI closer to its transformative potential, supported by a16z.

About the Role

Every RL environment we deploy must withstand a model actively attempting to exploit it. A task with a weak evaluation or an easily exploitable reward signal is counterproductive; it teaches the model to cheat instead of reason. We seek an individual dedicated to identifying these vulnerabilities before the model does.

We have learned that domain knowledge alone does not make an effective reviewer. The ideal candidate is someone who has engaged in adversarial thinking: designing challenging problems that are difficult to exploit, dismantling others’ tasks, or directly researching reward hacking.

Your Responsibilities

Review RL environments and training tasks for accuracy, robustness, and resistance to reward hacking.
Identify potential ways a model could exploit grading systems, manipulate evaluation criteria, or bypass intended reasoning.
Collaborate with environment authors to enhance grading systems, rectify reward signals, and redesign ineffective tasks.
Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks monthly.
Provide guidance on grader design during the planning phase of environments, ensuring quality before task construction.

Who We Are Looking For

You think like an attacker and have spent considerable time crafting problems that are challenging to exploit or deconstructing seemingly solid issues. A fundamental understanding of machine learning is essential, enabling you to anticipate model strategies, combined with enough engineering insight to assess whether a grader effectively tests its criteria.

About Preference Model

Preference Model is at the forefront of AI training data innovation, developing environments that facilitate learning and adaptive feedback for models through realistic challenges.

Similar jobs

1 - 20 of 5,517 Jobs

Search for Machine Learning Engineer Imitation Reinforcement Learning For Robotics

5,517 results

Select all on this page (20)

Apply

Machine Learning Engineer - Imitation & Reinforcement Learning for Robotics

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.

Jan 31, 2026

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Research Engineer with a Focus on Reinforcement Learning

firecrawl

Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)

Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.

Mar 18, 2026

Apply

Research Engineer - Reinforcement Learning Infrastructure

primeintellect

Full-time|On-site|San Francisco

Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.

Mar 27, 2026

Apply

Reinforcement Learning Environments Engineer Summer Intern

Preference Model

Internship|Remote|San Francisco

Location: Preference for San Francisco, but remote candidates are welcome to apply.Duration: This internship will last for 10-12 weeks during Summer 2026.Compensation: This is a paid internship opportunity.About UsAt Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.About the RoleWe are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.What You'll DoDesign and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.Produce clean, production-quality Python code (not just notebooks).Utilize Docker to create reproducible environments and troubleshoot issues as they arise.Translate ML research papers and concepts into actionable training tasks.Who We're Looking ForYou are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.Must-Have Qualifications:Proficient in Python programming.Understanding of large language models (LLMs), their strengths, and limitations.Self-motivated and capable of taking feedback to iterate quickly.Preferred Qualifications:Familiarity with transformer architecture and experience with training or inference code.Experience in writing CUDA kernels or engaging in low-level GPU programming.Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).A passion for continuous learning and research in the field of AI.

Mar 18, 2026

Apply

Software Engineer - Reinforcement Learning Research & Environments

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.Role OverviewAs a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.Key ResponsibilitiesDesign and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.Develop and sustain evaluation frameworks that identify long-context failure modes.Create reward signals and training environments aimed at specific capability advancements.Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.Enhance the reliability and observability of post-training data and environment pipelines.Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.Ideal Candidate ProfileSolid foundation in software engineering principles.Proven experience in developing or managing large-scale data or machine learning systems.Strong analytical skills and a passion for tackling complex problems.

Nov 8, 2024

Apply

Reinforcement Learning Environments Engineer (Contractor, Remote)

Preference Model

Contract|Remote|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.About the RoleWe are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.

Mar 18, 2026

Apply

AI Engineer - Specializing in Reinforcement Learning

yutori

Full-time|On-site|San Francisco, California, United States

At Yutori, we are revolutionizing the way individuals engage with the online world by developing AI agents that can seamlessly manage everyday digital tasks. Our mission is to create a fully integrated agent-first ecosystem, encompassing everything from training proprietary models to designing intuitive generative product interfaces.We invite a passionate and skilled AI Engineer to join our founding team and contribute to our vision of building superhuman AI agents capable of performing actions across the web.Our founders—Devi Parikh, Abhishek Das, and Dhruv Batra—bring decades of expertise in AI research and product development from their tenure at Meta, focusing on generative, multimodal, and embodied AI. Our diverse team blends advanced AI knowledge with innovative product design to execute Yutori's ambitious mission.Supported by an exceptional group of visionary investors—including Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, and others—Yutori is poised for remarkable growth and development.

Mar 26, 2025

Apply

Senior Deep Reinforcement Learning Engineer

DoorDash, Inc.

Full-time|On-site|San Francisco, CA

Join DoorDash as a Senior Deep Reinforcement Learning Engineer and play a pivotal role in revolutionizing the logistics and delivery industry through cutting-edge AI solutions. In this position, you will leverage your expertise in deep reinforcement learning to develop advanced algorithms that optimize our delivery processes and enhance customer experience.

Mar 26, 2026

Apply

Reinforcement Learning Environment Reviewer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are pioneering the next generation of training data to fuel the evolution of AI technology. Although today's models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning (RL) environments where models face research and engineering challenges, allowing them to iterate and learn from realistic feedback loops.Our founding team boasts experience from Anthropic’s data division, where we built data infrastructure, tokenizers, and datasets for Claude. Collaborating with leading AI labs, we aim to bring AI closer to its transformative potential, supported by a16z.About the RoleEvery RL environment we deploy must withstand a model actively attempting to exploit it. A task with a weak evaluation or an easily exploitable reward signal is counterproductive; it teaches the model to cheat instead of reason. We seek an individual dedicated to identifying these vulnerabilities before the model does.We have learned that domain knowledge alone does not make an effective reviewer. The ideal candidate is someone who has engaged in adversarial thinking: designing challenging problems that are difficult to exploit, dismantling others’ tasks, or directly researching reward hacking.Your ResponsibilitiesReview RL environments and training tasks for accuracy, robustness, and resistance to reward hacking.Identify potential ways a model could exploit grading systems, manipulate evaluation criteria, or bypass intended reasoning.Collaborate with environment authors to enhance grading systems, rectify reward signals, and redesign ineffective tasks.Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks monthly.Provide guidance on grader design during the planning phase of environments, ensuring quality before task construction.Who We Are Looking ForYou think like an attacker and have spent considerable time crafting problems that are challenging to exploit or deconstructing seemingly solid issues. A fundamental understanding of machine learning is essential, enabling you to anticipate model strategies, combined with enough engineering insight to assess whether a grader effectively tests its criteria.

Mar 18, 2026

Apply

Machine Learning Engineer: Evaluation at Bedrock Robotics | San Francisco

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Join the forefront of technology as we revolutionize the construction industry with advanced autonomy.Be a Part of Innovation at Bedrock RoboticsAt Bedrock Robotics, we are transforming artificial intelligence from theoretical concepts into practical applications that enhance the world’s infrastructure. Our team comprises seasoned professionals who have been pivotal in launching industry leaders like Waymo, scaling Segment to a $3.2 billion acquisition, and driving Uber Freight to $5 billion in revenue. We are currently implementing autonomous systems in heavy construction machinery nationwide, expediting the timelines of multi-billion dollar infrastructure developments while significantly enhancing job site safety.With a robust funding of $350 million, we are rapidly addressing the escalating demand for housing, data centers, and manufacturing facilities, all while tackling the construction sector's increasing labor shortages. This is where innovative algorithms meet the realities of heavy machinery.If you are passionate about leveraging cutting-edge technology to tackle real-world challenges and wish to collaborate with a talented team of industry experts, we invite you to apply.Role: Machine Learning Engineer: EvaluationBedrock Robotics is on a mission to integrate autonomy into construction processes! We are seeking a driven engineer with substantial experience in evaluating complex machine learning systems in real-world scenarios. Your objective will be to convert the intricate dynamics of the built environment into actionable, AI-driven evaluations that enhance the adoption of our Bedrock Operators.The ideal candidate will have a proven track record in developing evaluation systems and executing statistical analyses to assess performance variations across system iterations. Your experience in iterating on complex machine learning systems in production environments will be crucial, as you navigate the intricacies involved.Your Responsibilities:Design and Maintain Evaluation Systems:Develop and maintain pipelines for performance measurement—encompassing both open-loop and closed-loop simulations, hardware-in-the-loop systems, and field data from Bedrock Operator-equipped machinery. Collaborate with other teams to glean insights earlier in the development cycle through optimized workflows.Develop Metrics:Align product objectives with system behaviors by translating real-world specifications into measurable indicators derived from logged data. Enable data-driven decision-making for everything from parameter adjustments to strategic program planning.

Jan 31, 2026

Apply

Machine Learning Systems Engineer - Robotics

Scale AI

Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA

At Scale AI, our Physical AI division is at the forefront of addressing data challenges in Robotics, Autonomous Vehicles, and Computer Vision. We invite you to join our team as a Machine Learning Systems Engineer, where you will play a pivotal role in applied research and the development of machine learning pipelines. Your focus will be on enhancing algorithms and pipelines for optimal performance on cloud-based GPU systems, empowering advancements in Physical AI research and applications.Your Role:As a Machine Learning Systems Engineer within the Physical AI team, you will design and implement robust platforms that ensure the scalable and efficient deployment of foundational models for physical agents. Your contributions will support groundbreaking research and production systems, facilitating internal discoveries and external applications in the fields of robotics and autonomous technology.We seek candidates who possess a strong foundation in machine learning coupled with extensive backend system design expertise. You will thrive in a collaborative environment, bridging the gap between Physical AI research and production engineering to expedite innovation across Scale AI.

Mar 26, 2026

Apply

Research Engineer - Reinforcement Learning

Prime Intellect

FullTime|On-site|San Francisco

Pioneering the Future of Open SuperintelligenceAt Prime Intellect, we are on a mission to construct the open superintelligence ecosystem, encompassing cutting-edge agentic models alongside the infrastructure that empowers individuals to create, train, and deploy them seamlessly. We unify global computational resources into an intuitive control plane, complemented by a comprehensive reinforcement learning post-training suite, including dynamic environments, secure sandboxes, verifiable evaluations, and our innovative asynchronous RL trainer. Our platform empowers researchers, startups, and enterprises to execute end-to-end reinforcement learning at unprecedented scales, allowing for the adaptation of models to diverse tools, workflows, and deployment scenarios.As a Research Engineer within our Reasoning team, you will be instrumental in driving our technological vision, particularly in the area of test-time compute scaling research. If you thrive on harnessing synthetic data to enhance LLM reasoning capabilities, we want to hear from you!Discover more about our exciting project by visiting our insight on decentralized training in the inference-compute paradigm.

Aug 19, 2024

Apply

Anthropic Fellows Program - Focus on Reinforcement Learning

Anthropic

Full-time|Hybrid|London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA

Join the prestigious Anthropic Fellows Program, where you'll have the opportunity to delve into cutting-edge research in Reinforcement Learning. This program is designed for individuals passionate about advancing AI safety and developing innovative solutions. As a fellow, you will collaborate with a team of experts, engage in impactful projects, and contribute to a progressive research environment.

Apr 10, 2026

Apply

Software Engineer: Machine Learning Robotics Systems

Generalist

Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)

About the RoleIn this exciting position, you will address comprehensive challenges to enhance the performance of our AI models deployed on robotic systems. Your responsibilities will include adding new features to our video processing data pipeline, updating our machine learning data loaders, training models to validate your modifications, and testing these changes in real-world robotic applications. This role requires the integration of numerous distributed Python services to achieve specific data processing and application tasks, alongside managing substantial cloud infrastructure for efficient business logic processing at scale.Your responsibilities will include:Conceptualizing and implementing innovative solutions to enhance system robustness, scalability, and speed.Revamping existing systems and services to accommodate significant future growth.Developing business logic to ensure our robots access the necessary data and that customers receive appropriate access to our robotic solutions.You may excel in this role if you:Possess extensive experience in building complex distributed applications or data pipelines at scale.Have a background in processing petabytes of data, especially video data.Demonstrate expertise in Python, with foundational knowledge in distributed infrastructure and solid understanding of modern machine learning principles.Have a robust foundation in contemporary ML techniques with experience in large-scale ML training and production deployments.Have familiarity with distributed cloud infrastructure and a deep understanding of cloud networking, permissions, and container orchestration (Kubernetes).About GeneralistAt Generalist, our mission is to realize the potential of general-purpose robots. We envision a future where industries and homes thrive on the collaboration between humans and machines. Our robots are designed to enhance productivity and efficiency.We focus on developing embodied foundation models, starting with dexterity, which necessitates pushing the boundaries of data, models, and hardware to enable robots to intelligently interact with their environments.Our company is deeply rooted in large-scale AI and robotics, with a team drawn from leading organizations like OpenAI, Boston Dynamics, and Google DeepMind, all committed to delivering groundbreaking advancements in AI technology.

Feb 12, 2026

Apply

Robot Learning Generalist

Generalist

Full-time|Remote|San Francisco Bay Area (San Mateo) or Boston (Somerville)

Are you passionate about the intersection of robotics and artificial intelligence? We are seeking a Robot Learning Generalist to join our innovative team. In this role, you will leverage your expertise to develop and implement advanced learning algorithms that enhance robotic capabilities. You will collaborate with a diverse group of engineers and researchers to push the boundaries of what robots can achieve in real-world scenarios.As a part of our team, you will have the opportunity to work on cutting-edge projects that have a tangible impact on the future of robotics. Your contributions will help shape the development of intelligent systems that can learn from their environment and adapt to new challenges.

Mar 24, 2026

Apply

Reinforcement Learning Software Engineer

Preference Model

Full-time|On-site|San Francisco

About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.

Mar 18, 2026

Apply

Infrastructure Research Engineer - Reinforcement Learning Systems

Thinking Machines Lab

Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We're dedicated to crafting a future where everyone can harness the power of AI to meet their unique needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, including ChatGPT and Character.ai, as well as open-weight models like Mistral, in addition to renowned open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented Infrastructure Research Engineer to architect and develop the foundational systems that facilitate the scalable and efficient training of large models using reinforcement learning.This position exists at the crossroads of research and large-scale systems engineering, requiring a professional who not only comprehends the algorithms behind reinforcement learning but also appreciates the practicalities of distributed training and inference at scale. You will have a diverse set of responsibilities, from optimizing rollout and reward pipelines to enhancing the reliability, observability, and orchestration of systems. Collaboration with researchers and infrastructure teams will be essential to ensure reinforcement learning is stable, rapid, and production-ready.Note: This is an evergreen role that we maintain on an ongoing basis to express interest. Due to the high volume of applications we receive, there may not always be an immediate position that aligns perfectly with your skills and experience. We encourage you to apply, as we continuously review applications and reach out to candidates when new opportunities arise. You may reapply after gaining more experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles that cater to unique project or team needs; in those circumstances, you are welcome to apply directly alongside this evergreen role.What You’ll DoDesign, implement, and optimize the infrastructure that supports large-scale reinforcement learning and post-training workloads.Enhance the reliability and scalability of the RL training pipeline, including distributed RL workloads and training throughput.Create shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility of RL systems.Work closely with researchers to translate algorithmic concepts into production-quality training pipelines.Develop evaluation and benchmarking infrastructure to assess model performance based on helpfulness, safety, and factual accuracy.Publish and disseminate insights through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.

Nov 27, 2025

Apply

Research Scientist in Machine Learning and Robotics

Physical Intelligence

Full-time|On-site|San Francisco

Join us at Physical Intelligence as a Research Scientist, where you will be at the forefront of innovation in machine learning and robotics. We are in search of exceptional researchers across all experience levels who demonstrate a strong track record of impactful research results. Ideal candidates will possess a solid foundation in both practical implementation and theoretical frameworks, showcasing a blend of system-building capabilities and significant conceptual, algorithmic, or theoretical advancements. We value diverse backgrounds and encourage applications from both traditional academic researchers and those with unique, unconventional experiences.We are committed to fostering a diverse and inclusive workplace. In accordance with the San Francisco Fair Chance Ordinance, we welcome applications from qualified individuals with arrest and conviction records.

Aug 24, 2024

Apply

Machine Learning Engineer

Orchard

Full-time|On-site|San Francisco

Join Orchard as a Machine Learning Engineer and play a pivotal role in transforming data into actionable insights. In this dynamic position, you will leverage your expertise in machine learning algorithms and data analysis to develop innovative solutions that enhance our products and services.We are looking for a proactive team player who thrives in a fast-paced environment and possesses strong problem-solving skills. You will collaborate with cross-functional teams, engage with large datasets, and contribute to the design and implementation of machine learning models.

Mar 14, 2026

Create account — see all 5,517 results