Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Minimum Qualifications:Proficient in Python programming with a focus on engineering-quality code. Experience with Docker and a production-oriented mindset, including debugging and ensuring reliability and speed of iteration. Solid understanding of LLMs and their current limitations. Ability to meet throughput expectations and respond efficiently to feedback. You May Be a Good Fit If:You possess a deep understanding of transformer architecture, as well as training and inference processes for modern LLMs, and have experience with inference libraries (e.g., vLLM, SGLang). Expertise in CUDA or Pallas kernel development, with a track record of optimizing complex neural modules for specific hardware. You have published research or contributed public code in an active deep learning or machine learning research area. You have strong foundational knowledge and diverse research interests, regularly reading and deeply understanding academic papers, and possess the creativity to apply them to RLVR challenges. Experience building complex interactive RL environments and insights into open-ended RL-based learning systems.
About the job
About Us
At Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.
Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.
About the Role
We are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.
About Preference Model
Preference Model is at the forefront of AI innovation, dedicated to enhancing the capabilities of machine learning systems through advanced training data methodologies. Our experienced team is committed to collaboration with leading AI labs to push the boundaries of technology.
Similar jobs
1 - 20 of 7,990 Jobs
Search for Reinforcement Learning Engineer At Code Metal Ai Remote
Full-time|Remote|Remote — San Francisco, California, United States
Join Code Metal AI's elite team, comprised of talents from MIT, OpenAI, and other esteemed organizations, as we lead the charge in pioneering large language models (LLMs) and advanced code generation techniques. Our innovative projects engage with top-tier chip manufacturers, leveraging cutting-edge AI to tackle significant, real-world challenges.This position serves as a critical link between two essential domains:Production Responsibilities:Establish and uphold resilient distributed training systems utilizing PyTorch (2+ years of experience required).Design and execute scalable data curation and quality assurance pipelines to ensure high-quality training datasets.Create orchestration tools that streamline complex workflows for large-scale AI model training and evaluation.Research Responsibilities:Lead the innovation in developing evaluation frameworks and reinforcement learning solutions, emphasizing recent advancements in Reinforcement Learning with Human Feedback (RLHF).Engage with cutting-edge research through open-source contributions and potential publications, focusing on applying RLHF to LLMs, particularly in code generation tasks.Qualifications:Minimum of 2 years of experience in distributed training, preferably using PyTorch.Strong foundation in reinforcement learning, with recent RLHF experience being highly preferred.Demonstrated ability to construct data curation and quality assurance pipelines.Experience in developing evaluation frameworks.Ideally, familiarity with both data pipeline and orchestration aspects.Eligibility for TS/SCI clearance.Preferred Qualifications:Contributions to open-source AI or ML initiatives.Published research or experience in relevant fields.Hands-on experience implementing RLHF to LLMs, especially for code generation.Experience in large-scale synthetic data generation.Benefits:Comprehensive healthcare plan with 100% premium coverage, including medical, dental, and vision.401k plan with 5% matching contribution.Unlimited Paid Time Off, along with Sick leave and Public Holidays.Flexible hybrid work arrangement.Relocation assistance for eligible employees.
Full-time|On-site|San Francisco, California, United States
Join code-metal as a Senior Platform DevOps Engineer, where you will play a pivotal role in enhancing our cloud and on-premises infrastructure. You will be responsible for deploying, managing, and optimizing systems to ensure high availability and performance. This position offers an exciting opportunity to work with cutting-edge technologies and collaborate within a dynamic team.
Full-time|Remote|San Francisco, California, United States
About Code MetalAt Code Metal, we are pioneering the transformation of code translation in critical sectors, empowering partners in defense, automotive, and semiconductor industries to accelerate their journey from algorithm to silicon with unmatched reliability. We seek an innovative marketing leader to take charge of our positioning and demand generation efforts.The Role:In this pivotal position, you will be responsible for building our brand identity, shaping our narrative, enhancing our visibility, and formulating actionable product marketing strategies. Collaborating closely with our executive team, you will oversee product and content marketing, translating the complexities of Code Metal’s technology into compelling value propositions.This position is ideal for a hands-on strategist who excels at the intersection of advanced technology and storytelling. As the first dedicated marketing leader, you will initially function as a solo marketing team, developing scalable programs without the support of a large team.Key Responsibilities:Formulate and implement Code Metal’s marketing strategy in collaboration with sales leaders across the defense, automotive, aerospace, and semiconductor sectors.Work closely with senior leadership on branding and storytelling, ensuring these elements are reflected across our web presence, branding materials, and messaging.Design and execute a measurable product marketing strategy to drive engagement.Conduct thorough research to uncover potential customers, market opportunities, and industry trends.Establish partnerships and channel relationships, as well as define our event strategy.Lead inbound and outbound marketing campaigns aligned with our growth objectives.Why Choose Code Metal?Purpose-Driven Mission: Join us in accelerating innovation in mission-critical industries with proven AI solutions.Agile Environment: Work in tight feedback loops with small teams — set a strategy in the morning and execute it by evening.Ownership and Impact: Take the reins of our marketing, branding, and storytelling efforts without any spectators.
Full-time|On-site|San Francisco, California, United States
At Yutori, we are revolutionizing the way individuals engage with the online world by developing AI agents that can seamlessly manage everyday digital tasks. Our mission is to create a fully integrated agent-first ecosystem, encompassing everything from training proprietary models to designing intuitive generative product interfaces.We invite a passionate and skilled AI Engineer to join our founding team and contribute to our vision of building superhuman AI agents capable of performing actions across the web.Our founders—Devi Parikh, Abhishek Das, and Dhruv Batra—bring decades of expertise in AI research and product development from their tenure at Meta, focusing on generative, multimodal, and embodied AI. Our diverse team blends advanced AI knowledge with innovative product design to execute Yutori's ambitious mission.Supported by an exceptional group of visionary investors—including Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, and others—Yutori is poised for remarkable growth and development.
About UsAt Preference Model, we are pioneering the next generation of training data to unlock the full potential of artificial intelligence. While today's models show remarkable capabilities, they often fall short of their potential across diverse applications due to out-of-distribution tasks. We create Reinforcement Learning environments that allow models to tackle real-world research and engineering challenges, iterating and learning through realistic feedback loops.Our founding team comprises seasoned professionals from Anthropic’s data team, where we developed data infrastructure, tokenizers, and datasets for Claude. We collaborate with leading AI laboratories to drive AI closer to its transformative potential and are backed by a16z.About the RoleWe are seeking talented Reinforcement Learning Environments Engineers to design and implement MLE environments. Your primary mission will be to enable Large Language Models (LLMs) to acquire improved reasoning and advanced understanding of modern machine learning concepts. This role is fully remote with a requirement for at least 4 hours of overlap with PST and proficiency in English at a C1/C2 level.
About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.
About UsAt Preference Model, we are at the forefront of developing advanced training data essential for the evolution of artificial intelligence. While today's AI models exhibit significant power, they often fall short in diverse applications due to limitations in their training data. We specialize in creating reinforcement learning environments that present AI with authentic research and engineering challenges, enabling them to iterate and learn through realistic feedback loops.Our founding team boasts experience from Anthropic’s data department, where we established the data infrastructure, tokenizers, and datasets that supported Claude. We collaborate with top-tier AI research labs to bring AI closer to its groundbreaking potential and are proudly backed by a16z.About the RoleAs a Software Engineer on our team, your responsibilities will include:Designing and Developing Reinforcement Learning Environments: Architect comprehensive simulation platforms that encompass environmental context, task definitions, and reward functions to facilitate AI agents' learning and performance of intricate tasks.Building Robust Training Infrastructure: Create scalable systems for post-training AI models, focusing on orchestration, performance optimization, and monitoring capabilities.Implementing Realistic Model Evaluations: Develop metrics for evaluating AI agent performance and establish the infrastructure and tools necessary for conducting these evaluations.Influencing Technical Strategy: Take charge of architectural decisions, impact product roadmaps, and contribute significantly to our engineering culture as an early-stage team member.About YouYou might be a great fit for this role if you possess the following qualities:Adept at leveraging language models effectively.Ability to innovate and think outside the box.A minimum of 4 years of software engineering experience, showcasing your ability to take ownership of projects.Proficiency in Python, Rust, or TypeScript, with the capability to work across the entire software stack.Hands-on experience with modern deployment practices, containerization, and cloud infrastructure (such as Kubernetes, AWS, or GCP).Strong problem-solving skills demonstrated through algorithmic challenges or complex system design tasks.Nice-to-HavesPreferred candidates will have experience in:Machine learning infrastructure or reinforcement learning.
Join primeintellect as a Research Engineer focused on Reinforcement Learning Infrastructure. In this role, you will be instrumental in advancing our cutting-edge AI technologies. You will collaborate with interdisciplinary teams to develop robust frameworks that enhance machine learning capabilities and drive innovation.As a key player in our engineering team, you will work on designing, implementing, and optimizing systems that support reinforcement learning algorithms. Your contributions will directly impact the efficiency and effectiveness of our AI solutions.
Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.
Location: Preference for San Francisco, but remote candidates are welcome to apply.Duration: This internship will last for 10-12 weeks during Summer 2026.Compensation: This is a paid internship opportunity.About UsAt Preference Model, we are pioneering the next era of training data to fuel the advancement of AI technologies. While current models are impressive, they often struggle with diverse applications due to out-of-distribution tasks. Our focus is on developing reinforcement learning (RL) environments where models can engage with complex research and engineering challenges, iterating and learning from realistic feedback mechanisms.Our founding team boasts extensive experience from Anthropic's data division, where we built data infrastructure, tokenizers, and datasets that powered Claude. We collaborate with top AI labs to accelerate AI's journey toward its transformative potential and are proudly supported by a16z.About the RoleWe are seeking talented PhD students and exceptional undergraduate candidates to join us this summer in developing RL training environments tailored for large language models.What You'll DoDesign and implement RL environments to assess LLM reasoning across various ML, systems, and research problems.Produce clean, production-quality Python code (not just notebooks).Utilize Docker to create reproducible environments and troubleshoot issues as they arise.Translate ML research papers and concepts into actionable training tasks.Who We're Looking ForYou are either an undergraduate or a PhD student in Computer Science, Machine Learning, Mathematics, Physics, or a related discipline. You have a knack for writing real code beyond mere research prototypes and you enjoy reading ML literature in your spare time.Must-Have Qualifications:Proficient in Python programming.Understanding of large language models (LLMs), their strengths, and limitations.Self-motivated and capable of taking feedback to iterate quickly.Preferred Qualifications:Familiarity with transformer architecture and experience with training or inference code.Experience in writing CUDA kernels or engaging in low-level GPU programming.Deep knowledge in a particular research area (demonstrated by publications, public code, or strong coursework).A passion for continuous learning and research in the field of AI.
Join DoorDash as a Senior Deep Reinforcement Learning Engineer and play a pivotal role in revolutionizing the logistics and delivery industry through cutting-edge AI solutions. In this position, you will leverage your expertise in deep reinforcement learning to develop advanced algorithms that optimize our delivery processes and enhance customer experience.
Full-time|$176.4K/yr - $242.6K/yr|Remote|Remote - US
At Bugcrowd, we are redefining the landscape of cybersecurity. Since our inception in 2012, we have been committed to empowering organizations to regain control and stay ahead of cyber threats. By harnessing the collective creativity and expertise of our clients and an elite network of hackers, we leverage our patented AI-driven Security Knowledge Platform™. Our diverse community of hackers excels in uncovering vulnerabilities, swiftly adapting to the evolving threat landscape, including zero-day exploits. With our innovative CrowdMatch™ technology, we provide scalable, tailored solutions to enhance your security posture. Join us as we usher in a new era of crowdsourced security that outpaces cyber adversaries. For more information, visit www.bugcrowd.com. Headquartered in San Francisco and New Hampshire, Bugcrowd is supported by leading investors including General Catalyst, Rally Ventures, and Costanoa Ventures.Job SummaryThe Bugcrowd Reinforcement Learning and Reasoning Team is dedicated to advancing autonomous cybersecurity through the creation of authentic reinforcement learning environments tailored for foundational model applications. As a Staff Engineer, you will be at the forefront of AI Reinforcement Learning development and implementation. Your primary responsibility will be to design and build the infrastructure and tools that convert real-world vulnerability research into extensive reinforcement learning environments for training state-of-the-art AI systems.In this unique role, you will develop training environments that instruct AI systems on hacking and defending software. Your contributions will directly impact the capabilities of next-generation AI models. Rather than focusing on a single application, you will create the underlying infrastructure that generates thousands of environments for training leading-edge AI technologies.Our team operates at the intersection of AI, security research, and systems engineering, crafting environments that enable models to acquire essential skills such as vulnerability detection, exploitation, and remediation.
About HandshakeHandshake is the premier career network tailored for the AI economy, serving over 20 million knowledge workers, 1,600 educational institutions, and 1 million employers, including all Fortune 50 companies. Our platform is trusted for career discovery, recruitment, and professional development, facilitating opportunities ranging from freelance AI training roles to full-time positions. Our unique value proposition is driving remarkable growth, with an expectation to triple our Annual Recurring Revenue (ARR) by 2025.Why is now the best time to join Handshake?Be a key player in shaping the future of careers within the AI economy, creating tangible impacts for your community.Collaborate closely with leading AI research labs, Fortune 500 partners, and top-tier educational institutions.Join a team enriched by leaders from renowned organizations such as Scale AI, Meta, xAI, Notion, Coinbase, and Palantir.Contribute to building a rapidly growing business projected to generate billions in revenue.About the RoleWe are looking for an experienced Senior Engineering Manager to lead our dynamic Reinforcement Learning Environments (RLE) team.The RLE team creates innovative sandbox environments where cutting-edge AI models can learn comprehensive, end-to-end workflows. These environments replicate real-world professional fields such as software engineering, finance, and legal research, complete with realistic tools, constraints, and feedback mechanisms. Rather than relying on static examples, models engage in practical tasks: navigating multi-step processes, utilizing domain-specific tools, managing uncertainty, and optimizing for real-world results.Researchers leverage these environments and the data produced to train state-of-the-art models using reinforcement learning based on execution—focusing not just on predictions but on task fulfillment, quality, and resilience in complex workflows.As a Senior Engineering Manager, you will define the technical direction and long-term strategy of this vital platform. You will lead a growing team of 8-9 engineers and are expected to manage an Engineering Manager in the near future. This strategic role intersects platform engineering, applied AI infrastructure, research tooling, and human-in-the-loop operational systems.Location: San Francisco, CA | 5 days/week in-office
Pioneering the Future of Open SuperintelligenceAt Prime Intellect, we are on a mission to construct the open superintelligence ecosystem, encompassing cutting-edge agentic models alongside the infrastructure that empowers individuals to create, train, and deploy them seamlessly. We unify global computational resources into an intuitive control plane, complemented by a comprehensive reinforcement learning post-training suite, including dynamic environments, secure sandboxes, verifiable evaluations, and our innovative asynchronous RL trainer. Our platform empowers researchers, startups, and enterprises to execute end-to-end reinforcement learning at unprecedented scales, allowing for the adaptation of models to diverse tools, workflows, and deployment scenarios.As a Research Engineer within our Reasoning team, you will be instrumental in driving our technological vision, particularly in the area of test-time compute scaling research. If you thrive on harnessing synthetic data to enhance LLM reasoning capabilities, we want to hear from you!Discover more about our exciting project by visiting our insight on decentralized training in the inference-compute paradigm.
Full-time|Hybrid|San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)
Join firecrawl as a Research Engineer specializing in Reinforcement Learning (RL). In this role, you will leverage your expertise to conduct innovative research and develop advanced RL algorithms that push the boundaries of technology. Collaborate with a talented team of engineers and researchers to solve complex problems and contribute to groundbreaking projects.
Join Our TeamAt Liquid AI, we are not just creating AI models; we are revolutionizing the very fabric of intelligence. Originating from MIT, our objective is to develop efficient AI systems across all scales. Our Liquid Foundation Models (LFMs) excel in environments where others falter—on-device, at the edge, and under real-time constraints. We are not simply refining existing concepts; we are pioneering the future of AI.We recognize that exceptional talent drives remarkable technology. The Liquid team is a collective of elite engineers, researchers, and innovators dedicated to crafting the next generation of AI solutions. Whether you are designing model architectures, enhancing our development platforms, or facilitating enterprise integrations, your contributions will significantly influence the evolution of intelligent systems.While San Francisco and Boston are preferred locations, we welcome applicants from other regions within the United States.
Preference Model develops reinforcement learning environments that mirror the complexity of real-world tasks. The company focuses on building diverse RL tasks and detailed reward structures, aiming to push the boundaries of artificial intelligence. The founding team brings experience from developing data infrastructure and datasets for Claude at Anthropic, and Preference Model works closely with top AI research labs. Role overview The Senior Software Engineer - Reinforcement Learning Environments position centers on designing and delivering RL environments that challenge and improve current AI models. This role involves leading complex projects, including multi-step workflows and realistic stakeholder interactions, within a large codebase. Engineers work directly with the founders and a small, collaborative team, delivering environments used for training advanced models at partner labs. The position provides significant autonomy, regular feedback, and support for professional development. What you will do Design, build, and iterate on reinforcement learning tasks, taking them from concept through evaluation. Lead the development of sophisticated environments, focusing on complex workflows and coding standards. Interact with coding agents, review their outputs, and identify subtle failures. Analyze whether issues stem from model limitations or environment design, then redesign tasks to reveal deeper failure modes. Contribute to building and maintaining the core infrastructure and tools for the environments team. Mentor junior engineers as the team expands. Location This role is based in San Francisco.
About UsAt Preference Model, we are pioneering the next generation of training data to fuel the evolution of AI technology. Although today's models demonstrate significant capabilities, they often fall short in diverse applications due to many tasks being out of distribution. We create reinforcement learning (RL) environments where models face research and engineering challenges, allowing them to iterate and learn from realistic feedback loops.Our founding team boasts experience from Anthropic’s data division, where we built data infrastructure, tokenizers, and datasets for Claude. Collaborating with leading AI labs, we aim to bring AI closer to its transformative potential, supported by a16z.About the RoleEvery RL environment we deploy must withstand a model actively attempting to exploit it. A task with a weak evaluation or an easily exploitable reward signal is counterproductive; it teaches the model to cheat instead of reason. We seek an individual dedicated to identifying these vulnerabilities before the model does.We have learned that domain knowledge alone does not make an effective reviewer. The ideal candidate is someone who has engaged in adversarial thinking: designing challenging problems that are difficult to exploit, dismantling others’ tasks, or directly researching reward hacking.Your ResponsibilitiesReview RL environments and training tasks for accuracy, robustness, and resistance to reward hacking.Identify potential ways a model could exploit grading systems, manipulate evaluation criteria, or bypass intended reasoning.Collaborate with environment authors to enhance grading systems, rectify reward signals, and redesign ineffective tasks.Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks monthly.Provide guidance on grader design during the planning phase of environments, ensuring quality before task construction.Who We Are Looking ForYou think like an attacker and have spent considerable time crafting problems that are challenging to exploit or deconstructing seemingly solid issues. A fundamental understanding of machine learning is essential, enabling you to anticipate model strategies, combined with enough engineering insight to assess whether a grader effectively tests its criteria.
Anthropic is hiring a Research Engineer focused on Reinforcement Learning Infrastructure and Reliability. This role is based in San Francisco, CA. Role overview This position centers on building and maintaining systems essential to AI research. The work supports Anthropic’s reinforcement learning efforts, with an emphasis on infrastructure stability and performance. What you will do Collaborate with a team of specialists to develop and support key systems for AI research. Improve the reliability and efficiency of infrastructure supporting reinforcement learning projects. Apply technical expertise to advance Anthropic’s AI capabilities. Team environment Work alongside engineers and researchers dedicated to advancing AI reliability and performance. The team values collaboration and aims to enable new research while maintaining the stability of Anthropic’s core systems.
Full-time|On-site|San Francisco, CA | New York City, NY
Role Overview Anthropic is looking for a Full-Stack Software Engineer with a focus on reinforcement learning. This role centers on building applications that use reinforcement learning to advance artificial intelligence. Work closely with skilled teammates to create software that makes a real impact. Location San Francisco, CA or New York City, NY
Apr 14, 2026
Sign in to browse more jobs
Create account — see all 7,990 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.