Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Ideal candidates will possess a strong background in computer science or a related field, with proficiency in reinforcement learning algorithms, software development, and system architecture. Experience with AI frameworks and a deep understanding of machine learning principles are essential. Strong problem-solving skills and the ability to work collaboratively in a fast-paced environment are also crucial.
About the job
Anthropic is hiring a Research Engineer focused on Reinforcement Learning Infrastructure and Reliability. This role is based in San Francisco, CA.
Role overview
This position centers on building and maintaining systems essential to AI research. The work supports Anthropic’s reinforcement learning efforts, with an emphasis on infrastructure stability and performance.
What you will do
Collaborate with a team of specialists to develop and support key systems for AI research.
Improve the reliability and efficiency of infrastructure supporting reinforcement learning projects.
Apply technical expertise to advance Anthropic’s AI capabilities.
Team environment
Work alongside engineers and researchers dedicated to advancing AI reliability and performance. The team values collaboration and aims to enable new research while maintaining the stability of Anthropic’s core systems.
About Anthropic
Anthropic is at the forefront of AI development, dedicated to ensuring the safety and reliability of advanced AI systems. Our team is comprised of leading researchers and engineers who are passionate about pushing the boundaries of technology while prioritizing ethical considerations. Join us in shaping the future of AI!
Similar jobs
1 - 20 of 5,979 Jobs
Search for Infrastructure Research Engineer Reinforcement Learning Systems
Full-time|$350K/yr - $475K/yr|On-site|San Francisco
At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We're dedicated to crafting a future where everyone can harness the power of AI to meet their unique needs and aspirations.Our team comprises scientists, engineers, and innovators who have developed some of the most widely utilized AI products, including ChatGPT and Character.ai, as well as open-weight models like Mistral, in addition to renowned open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking a talented Infrastructure Research Engineer to architect and develop the foundational systems that facilitate the scalable and efficient training of large models using reinforcement learning.This position exists at the crossroads of research and large-scale systems engineering, requiring a professional who not only comprehends the algorithms behind reinforcement learning but also appreciates the practicalities of distributed training and inference at scale. You will have a diverse set of responsibilities, from optimizing rollout and reward pipelines to enhancing the reliability, observability, and orchestration of systems. Collaboration with researchers and infrastructure teams will be essential to ensure reinforcement learning is stable, rapid, and production-ready.Note: This is an evergreen role that we maintain on an ongoing basis to express interest. Due to the high volume of applications we receive, there may not always be an immediate position that aligns perfectly with your skills and experience. We encourage you to apply, as we continuously review applications and reach out to candidates when new opportunities arise. You may reapply after gaining more experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles that cater to unique project or team needs; in those circumstances, you are welcome to apply directly alongside this evergreen role.What You’ll DoDesign, implement, and optimize the infrastructure that supports large-scale reinforcement learning and post-training workloads.Enhance the reliability and scalability of the RL training pipeline, including distributed RL workloads and training throughput.Create shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility of RL systems.Work closely with researchers to translate algorithmic concepts into production-quality training pipelines.Develop evaluation and benchmarking infrastructure to assess model performance based on helpfulness, safety, and factual accuracy.Publish and disseminate insights through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.
Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.
Anthropic is hiring a Research Engineer focused on Reinforcement Learning Infrastructure and Reliability. This role is based in San Francisco, CA. Role overview This position centers on building and maintaining systems essential to AI research. The work supports Anthropic’s reinforcement learning efforts, with an emphasis on infrastructure stability and performance. What you will do Collaborate with a team of specialists to develop and support key systems for AI research. Improve the reliability and efficiency of infrastructure supporting reinforcement learning projects. Apply technical expertise to advance Anthropic’s AI capabilities. Team environment Work alongside engineers and researchers dedicated to advancing AI reliability and performance. The team values collaboration and aims to enable new research while maintaining the stability of Anthropic’s core systems.
Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)
Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.
Pioneering the Future of Open SuperintelligenceAt Prime Intellect, we are on a mission to construct the open superintelligence ecosystem, encompassing cutting-edge agentic models alongside the infrastructure that empowers individuals to create, train, and deploy them seamlessly. We unify global computational resources into an intuitive control plane, complemented by a comprehensive reinforcement learning post-training suite, including dynamic environments, secure sandboxes, verifiable evaluations, and our innovative asynchronous RL trainer. Our platform empowers researchers, startups, and enterprises to execute end-to-end reinforcement learning at unprecedented scales, allowing for the adaptation of models to diverse tools, workflows, and deployment scenarios.As a Research Engineer within our Reasoning team, you will be instrumental in driving our technological vision, particularly in the area of test-time compute scaling research. If you thrive on harnessing synthetic data to enhance LLM reasoning capabilities, we want to hear from you!Discover more about our exciting project by visiting our insight on decentralized training in the inference-compute paradigm.
At Magic, we are committed to creating safe AGI that propels humanity forward in addressing the world’s most pressing challenges. We believe that the key to achieving safe AGI is through the automation of research and code generation, which enhances model performance and ensures alignment more reliably than human efforts alone. Our innovative approach integrates cutting-edge pre-training techniques, domain-specific reinforcement learning, ultra-long context, and advanced inference-time computation.Role OverviewAs a Software Engineer on our RL Research & Environments team, you will be instrumental in designing and managing data systems, evaluation frameworks, and environmental setups that enhance model capabilities following pre-training.This position emphasizes post-training processes, where your responsibilities will include identifying capability gaps, creating specialized datasets, designing reward structures, and executing iterative training cycles that lead to significant improvements in user interactions. You will oversee the infrastructure and experimental workflows that bridge product objectives with measurable capability enhancements.Our long-context models present unique post-training challenges, including long-horizon reasoning, maintaining coherence over extended tasks, optimizing context utilization, and enabling tool-assisted behaviors. You will develop systems that reveal failure modes, produce high-value training data, and facilitate rapid reinforcement learning iterations at scale.This role offers the potential for growth, allowing you to take ownership of key capability domains, delve deeper into RL systems, or influence post-training strategies as we enhance the performance and reliability of long-context models.Key ResponsibilitiesDesign and construct post-training datasets leveraging synthetic generation, targeted data collection, and self-play methodologies.Implement filtering, scoring, and mixture strategies for reinforcement learning and post-training datasets.Develop and sustain evaluation frameworks that identify long-context failure modes.Create reward signals and training environments aimed at specific capability advancements.Conduct ablation studies across various data sources, reward configurations, and long-horizon task designs.Enhance the reliability and observability of post-training data and environment pipelines.Collaborate closely with Product and Research teams to translate capability objectives into quantifiable iteration cycles.Ideal Candidate ProfileSolid foundation in software engineering principles.Proven experience in developing or managing large-scale data or machine learning systems.Strong analytical skills and a passion for tackling complex problems.
Full-time|$176.4K/yr - $242.6K/yr|Remote|Remote - US
At Bugcrowd, we are redefining the landscape of cybersecurity. Since our inception in 2012, we have been committed to empowering organizations to regain control and stay ahead of cyber threats. By harnessing the collective creativity and expertise of our clients and an elite network of hackers, we leverage our patented AI-driven Security Knowledge Platform™. Our diverse community of hackers excels in uncovering vulnerabilities, swiftly adapting to the evolving threat landscape, including zero-day exploits. With our innovative CrowdMatch™ technology, we provide scalable, tailored solutions to enhance your security posture. Join us as we usher in a new era of crowdsourced security that outpaces cyber adversaries. For more information, visit www.bugcrowd.com. Headquartered in San Francisco and New Hampshire, Bugcrowd is supported by leading investors including General Catalyst, Rally Ventures, and Costanoa Ventures.Job SummaryThe Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.
OpenAI's research infrastructure group creates and maintains the backbone systems for advanced machine learning model training. This team often goes beyond conventional training methods, developing new infrastructure to support novel research at scale. Their work closely connects systems engineering with research progress, making it possible to run experiments that would otherwise be too slow or complex. Role overview The Research Infrastructure Engineer for Training Systems designs and improves the platforms that power large-scale ML training. This role bridges research concepts and the practical systems that make large model training possible. The work has a direct impact on model release timelines and requires building systems that perform reliably in demanding, real-world scenarios. What you will do Build and maintain infrastructure for large-scale model training and experimentation Design APIs and interfaces to simplify complex training workflows and prevent misuse Enhance reliability, debuggability, and performance across training and data pipelines Troubleshoot issues involving Python, PyTorch, distributed systems, GPUs, networking, and storage Create tests, benchmarks, and diagnostic tools to catch regressions early Requirements Interest in building systems that support new training methods, not just optimizing existing ones Strong instincts in systems engineering, especially regarding performance, reliability, and clean abstractions Experience designing APIs and interfaces for researchers and engineers Ability to work across ML research code and production infrastructure Enjoys evidence-based debugging using profiles, traces, logs, tests, and reproducible cases
WHO WE AREAt Applied Compute, we specialize in creating Specific Intelligence for enterprises—agents that continually learn from a company's processes, data, expertise, and goals. Our mission is to develop a continual learning layer and platform that captures context, memory, and decision traces across organizations, fostering an environment where specialized agents perform real work effectively.Why Join Us: We operate at a unique intersection of product development and advanced research. Our product team is building the platform for a new generation of digital coworkers, while our research team is pioneering advancements in post-training and reinforcement learning to enrich product experiences. Our applied research engineers collaborate closely with customers, deploying agents into production seamlessly. This blend of robust product focus, in-depth research, and real-world application is our approach to integrating AI into enterprises. We pride ourselves on being product-led, research-enabled, and forward-deployed.Our Team: We are a diverse group of engineers, researchers, and operators, many of whom are former founders with experience in RL infrastructure at OpenAI, data foundations at Scale AI, and various systems across renowned firms like Two Sigma and Watershed. We collaborate with Fortune 50 clients and are proudly backed by reputable investors including Kleiner Perkins, Benchmark, Sequoia, Lux, and Greenoaks.Who Thrives Here: We seek individuals passionate about applying innovative research and complex systems to solve real-world challenges. You should be adept at navigating new environments swiftly, whether it's a fresh codebase, a customer's data architecture, or an unfamiliar problem domain. Our team values collaboration with customers, emphasizing active listening and understanding their workflows. We find that former founders, individuals with extensive side projects, and those who demonstrate end-to-end ownership excel in our culture.THE ROLEIn the role of Research Systems Engineer, you will train frontier-scale models and devise methodologies to implement continual learning in enterprise settings. Your responsibilities will include designing and executing large-scale experiments, investigating cutting-edge reinforcement learning techniques, and developing tools to gain insights into training processes. This position lies at the crossroads of research and systems engineering, where you will innovate algorithms alongside researchers and collaborate with infrastructure engineers to implement them on GPUs.
About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.
At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.
About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.
Full-time|Hybrid|London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA
Join the prestigious Anthropic Fellows Program, where you'll have the opportunity to delve into cutting-edge research in Reinforcement Learning. This program is designed for individuals passionate about advancing AI safety and developing innovative solutions. As a fellow, you will collaborate with a team of experts, engage in impactful projects, and contribute to a progressive research environment.
Join the innovative team at Anthropic as a Research Engineer specializing in Performance Reinforcement Learning. In this role, you will contribute to cutting-edge research that directly influences the development of advanced AI systems. Collaborate with a talented group of engineers and researchers, leveraging your expertise to enhance our algorithms and improve overall performance.
Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.
Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA; New York, NY
As AI continues to play a crucial role across various sectors, Scale AI is committed to accelerating the evolution of AI applications. For nearly a decade, we have been at the forefront of AI data solutions, driving significant innovations such as generative AI, defense technologies, and autonomous vehicles. With recent funding from Meta, we are intensifying our efforts to develop cutting-edge post-training algorithms essential for enhancing the performance of complex enterprise agents globally. The Enterprise ML Research Lab is at the forefront of this AI transformation. Our team is dedicated to creating a suite of proprietary research and resources tailored for our enterprise clientele. As a Machine Learning Systems Research Engineer, you will play a pivotal role in developing algorithms for our next-generation Agent Reinforcement Learning (RL) training platform, support large-scale training operations, and integrate state-of-the-art technologies to optimize our machine learning systems. You will collaborate with other ML Research Engineers and AI Architects on the Enterprise AI team to apply these training algorithms to various client use cases, from next-gen AI cybersecurity firewalls to foundational healthtech search models. If you are passionate about shaping the future of AI, we want to hear from you!
Full-time|$350K/yr - $475K/yr|On-site|San Francisco
At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We envision a future where everyone has access to the knowledge and tools necessary to harness AI for their unique needs and goals.Our team comprises scientists, engineers, and builders who have developed some of the most widely utilized AI products, such as ChatGPT and Character.ai, alongside open-weight models like Mistral, and popular open-source initiatives like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the PositionWe are seeking an Infrastructure Research Engineer to design and construct the foundational systems that facilitate the scalable and efficient training of large models for both deployment and research purposes. Your primary objective will be to streamline experimentation and training at Thinking Machines, enabling our research teams to concentrate on scientific advancements rather than system limitations.This role is a perfect match for an individual who possesses a strong blend of deep systems expertise and a keen interest in machine learning at scale. You will take full ownership of the training stack, ensuring that every GPU cycle contributes to scientific progress.Note: This is an evergreen role that we keep open continuously to express interest. We receive numerous applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. However, we encourage you to apply. We regularly review applications and reach out to candidates as new opportunities arise. Feel free to reapply as you gain more experience, but please avoid applying more than once every six months. We may also post specific roles for individual projects or team needs, in which case you are welcome to apply directly alongside this evergreen role.Key ResponsibilitiesDesign, implement, and optimize distributed training systems that scale across thousands of GPUs and nodes for extensive training workloads.Develop high-performance optimizations to maximize throughput and efficiency.Create reusable frameworks and libraries that enhance training reproducibility, reliability, and scalability for new model architectures.Establish standards for reliability, maintainability, and security, ensuring systems remain robust under rapid iterations.Collaborate with researchers and engineers to construct scalable infrastructure.Publish and disseminate findings through internal documentation, open-source libraries, or technical reports that contribute to the advancement of scalable AI infrastructure.
Full-time|On-site|San Francisco, California, United States
At Yutori, we are revolutionizing the way individuals engage with the online world by developing AI agents that can seamlessly manage everyday digital tasks. Our mission is to create a fully integrated agent-first ecosystem, encompassing everything from training proprietary models to designing intuitive generative product interfaces.We invite a passionate and skilled AI Engineer to join our founding team and contribute to our vision of building superhuman AI agents capable of performing actions across the web.Our founders—Devi Parikh, Abhishek Das, and Dhruv Batra—bring decades of expertise in AI research and product development from their tenure at Meta, focusing on generative, multimodal, and embodied AI. Our diverse team blends advanced AI knowledge with innovative product design to execute Yutori's ambitious mission.Supported by an exceptional group of visionary investors—including Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, and others—Yutori is poised for remarkable growth and development.
Join DoorDash as a Senior Deep Reinforcement Learning Engineer and play a pivotal role in revolutionizing the logistics and delivery industry through cutting-edge AI solutions. In this position, you will leverage your expertise in deep reinforcement learning to develop advanced algorithms that optimize our delivery processes and enhance customer experience.
About UsAt Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.About the RoleWe are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.
Mar 18, 2026
Sign in to browse more jobs
Create account — see all 5,979 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.