Research Engineer Frontier Evals Environments jobs in San Francisco – Browse 5,456 openings on RoboApply Jobs

Research Engineer, Frontier Evals & Environments

OpenAISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

We are seeking candidates with a strong background in artificial intelligence, machine learning, or related fields, along with demonstrated experience in research engineering. Ideal applicants should possess proficiency in reinforcement learning, a solid understanding of AI evaluation methodologies, and the ability to work collaboratively in a fast-paced environment. A passion for advancing AI safety and ethics is essential.

About the job

About Our Team
Join the innovative Frontier Evaluations & Environments team at OpenAI, where we are dedicated to building transformative model environments that pave the way for safe artificial general intelligence (AGI) and artificial superintelligence (ASI). Our team constructs ambitious evaluation environments that not only measure but also enhance the capabilities of our models, creating self-improvement loops that inform our training, safety, and deployment strategies. Some of our notable open-source evaluations include GDPval, SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. We have also executed frontier evaluations for groundbreaking models like GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about experiencing firsthand the rapid advancements of our models and guiding them toward a positive impact, this is the opportunity for you.

Your Role
We are in search of exceptional research engineers who are eager to push the limits of our frontier models. Our ideal candidates will play a vital role in shaping our empirical understanding of AI capabilities across a broad spectrum and will take ownership of specific projects from conception to execution.

Key Responsibilities:

Design and implement ambitious reinforcement learning environments to maximize our models' potential.
Conduct assessments of frontier model capabilities, skills, and behaviors.
Create innovative methodologies for the automated exploration of model behaviors.
Guide training processes for our most extensive model training initiatives, gaining insights into the future of AI.
Collaborate with cross-functional teams to align model evaluations with organizational objectives.

About OpenAI

OpenAI is at the forefront of AI research and development, committed to creating safe and beneficial artificial general intelligence. Our mission is to ensure that AGI benefits all of humanity. We foster a collaborative and innovative environment where groundbreaking ideas can thrive, and our team members are empowered to make a lasting impact.

Similar jobs

1 - 20 of 5,456 Jobs

Select all on this page (20)

Apply

Research Engineer, Frontier Evals & Environments

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the innovative Frontier Evaluations & Environments team at OpenAI, where we are dedicated to building transformative model environments that pave the way for safe artificial general intelligence (AGI) and artificial superintelligence (ASI). Our team constructs ambitious evaluation environments that not only measure but also enhance the capabilities of our models, creating self-improvement loops that inform our training, safety, and deployment strategies. Some of our notable open-source evaluations include GDPval, SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. We have also executed frontier evaluations for groundbreaking models like GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about experiencing firsthand the rapid advancements of our models and guiding them toward a positive impact, this is the opportunity for you.Your RoleWe are in search of exceptional research engineers who are eager to push the limits of our frontier models. Our ideal candidates will play a vital role in shaping our empirical understanding of AI capabilities across a broad spectrum and will take ownership of specific projects from conception to execution.Key Responsibilities:Design and implement ambitious reinforcement learning environments to maximize our models' potential.Conduct assessments of frontier model capabilities, skills, and behaviors.Create innovative methodologies for the automated exploration of model behaviors.Guide training processes for our most extensive model training initiatives, gaining insights into the future of AI.Collaborate with cross-functional teams to align model evaluations with organizational objectives.

Apr 13, 2025

Apply

Research Engineer, Frontier Evals & Environments - Finance

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Frontier Evaluations team is dedicated to developing pioneering model assessments that propel advancements toward safe Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). This innovative group crafts ambitious evaluations to quantitatively assess and guide our models while establishing self-improvement cycles that influence our training, safety, and deployment strategies. Among our open-source evaluations are SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. The team has also executed frontier evaluations for significant models such as GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about being at the forefront of AI advancements and guiding their ethical development, this is the ideal team for you.About YouWe are on the lookout for exceptional research engineers who are eager to challenge the boundaries of frontier models in the finance sector. We seek individuals who will contribute to shaping AI evaluations focused on financial reasoning and associated competencies while managing distinct threads of this initiative from conception to execution.In This Role, You Will:Identify vital model capabilities, skills, and behaviors essential to financial operations, and develop methods to accurately measure performance in these areas.Take ownership of a research agenda aimed at uncovering significant model capabilities, particularly related to financial reasoning, and design evaluations to quantify them.Continuously enhance evaluations of frontier AI models to gauge the extent of cutting-edge capabilities.We Expect You To:Demonstrate a strong background in research engineering, particularly in AI and finance.Exhibit a collaborative spirit, working effectively within a cross-functional team environment.Showcase exceptional analytical and problem-solving skills.

Aug 30, 2025

Apply

Research Scientist - Frontier Data

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.

Apr 14, 2026

Apply

Software Engineer, Frontier Systems

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Frontier Systems team at OpenAI is at the forefront of technology, responsible for creating, deploying, and maintaining some of the world's largest supercomputers. These supercomputers are pivotal for training our most advanced AI models, pushing the boundaries of innovation.We transform sophisticated data center designs into operational systems and develop the software infrastructure necessary for extensive frontier model training. Our goal is to ensure these hyperscale supercomputers operate reliably and efficiently, supporting groundbreaking AI research.About the RoleAs a key member of the Frontier Systems team, you will be instrumental in designing the critical infrastructure that ensures our supercomputers function seamlessly for pioneering AI research. In this role, you'll address system-level challenges and implement automation solutions that minimize disruptions during large-scale training processes.Your responsibilities will encompass end-to-end ownership of your projects, allowing you to make significant contributions to our mission. This position is ideal for individuals who excel in diagnosing complex system issues and crafting automation strategies to proactively resolve problems across a vast network of machines.Your Responsibilities Include:Enhancing system health checks to maintain the stability of our hyperscale supercomputers during model training.Conducting in-depth investigations into hardware failures and system-level bugs to uncover root causes.Developing automation tools that monitor and resolve issues across thousands of systems, enabling uninterrupted research progress.You May Be a Great Fit If You Possess:7+ years of hands-on experience in software engineering.Strong proficiency in Python and shell scripting.Expertise in analyzing complex data sets using SQL, PromQL, Pandas, or other relevant tools.Experience in creating reproducible analyses.A solid balance of skills in both building and operationalizing systems.Prior experience with hardware is not a prerequisite for this role.Preferred Qualifications:Familiarity with the intricacies of hardware components, protocols, and Linux tools (e.g., PCIe, Infiniband, networking, power management, kernel performance tuning).Experience with system optimization and performance tuning.

May 9, 2025

Apply

Research Engineer, Evals

Intrinsic Safety

Full-time|On-site|San Francisco

Role OverviewAt Intrinsic Safety, we are pioneering the development of AI systems capable of making critical decisions in high-stakes environments such as risk investigations, fraud detection, and identity verification. Our dedicated team in San Francisco is at the forefront of tackling complex challenges where traditional AI solutions often fall short.We are in search of a Research Engineer to play a pivotal role in shaping our model evaluation strategies. You will be responsible for creating benchmarks, datasets, and evaluation frameworks that accurately assess our systems’ performance in real-world scenarios. This position bridges research, product development, and engineering, focusing on rigorous evaluations that reflect actual customer workflows and identify key failure points to propel the next generation of AI advancements.

Mar 31, 2026

Apply

Senior Software Engineer, AI Evals

Sentry

Full-time|$240K/yr - $280K/yr|Hybrid|San Francisco, California

About SentryAt Sentry, we are committed to transforming the way developers build software. With a mission to eradicate poor software experiences, we empower developers to create better applications more efficiently, ensuring a seamless encounter with technology.Backed by over $217 million in funding and trusted by more than 100,000 organizations, including industry giants like Disney, Microsoft, and Atlassian, we are at the forefront of performance monitoring and error tracking solutions. Our innovative tools enable companies to focus on product development rather than bug fixes.We embrace a hybrid work environment across our global offices, designating Mondays, Tuesdays, and Thursdays as in-office collaboration days to foster meaningful team interactions. If you are passionate about creating solutions that enhance the digital experience, join us in developing the next wave of software monitoring tools.About the RoleAs a Senior Software Engineer on Sentry’s AI/ML team, you will play a pivotal role in constructing the evaluation infrastructure that assesses the accuracy, reliability, and performance of our AI systems in real-world scenarios. This position is essential for ensuring that our debugging agents and AI-driven features operate correctly, safely, and predictably as they scale. You will design datasets, benchmarks, and test harnesses that convert vague AI behavior into quantifiable metrics, enabling the team to deploy AI solutions with confidence.In This Role You WillDevelop and implement robust evaluation frameworks to assess accuracy, reliability, regressions, and edge cases within AI systems.Generate and manage high-quality datasets, golden test cases, and benchmarks based on real production data.Create automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and workflows.Collaborate closely with applied AI engineers and product leaders to establish clear definitions of success and translate them into measurable criteria.Oversee the evaluation lifecycle for significant AI projects, from initial experimentation to ongoing production monitoring.You'll Love This Job If YouHave a strong commitment to accuracy, rigor, and measurement in AI systems.Enjoy transforming ambiguous product objectives and model behaviors into precise tests and metrics.Take pleasure in building foundational infrastructure that facilitates rapid iteration and boosts team confidence.Thrive in collaborative environments and relish the opportunity to influence model design through effective evaluation.

Jan 28, 2026

Apply

FullStack Engineer - Observability & Evals Platform at LangSmith | San Francisco

LangChain

Full-time|$125K/yr - $145K/yr|On-site|San Francisco, CA

About Us:At LangChain, we are dedicated to making intelligent agents a fundamental part of everyday technology. Our mission is to provide the essential tools for agent engineering in practical applications, enabling developers to transition seamlessly from initial prototypes to production-ready AI agents that organizations can depend on. Starting as a suite of widely adopted open-source tools, we have expanded to offer a comprehensive platform for building, evaluating, deploying, and managing AI agents at scale.Currently, our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are trusted by teams developing real AI solutions in both startups and established enterprises. Our technology powers AI initiatives for renowned companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are at an exciting juncture where we continue to innovate, grow rapidly, and every team member can make a significant impact on our products and collaboration. Join us at LangChain, where your contributions can reshape the technology landscape.About the Role:In-person, 5 days a week in San FranciscoWe are seeking a Fullstack Engineer to join our LangSmith product team, focusing on our commercial AI observability and evaluation platform. In this position, you will have the opportunity to develop new features and capabilities for our platform while collaborating closely with enterprise clients, developer end-users, and internal stakeholders.Your Responsibilities:Design and implement critical product features utilizing our Go, Python, and TypeScript stackWork in close partnership with product and design teams to refine features and enhance the product roadmapDrive project timelines effectively while maintaining high engineering standards through clean, maintainable, and well-tested codeTo Succeed in This Role:2+ years of experience in software engineering, particularly with complex platform productsFullstack engineering experience with Go or Python on the backend and React + TypeScript on the frontendStrong understanding of database systems, especially Postgres and RedisExperience in designing and scaling APIs, ideally in high-performance environments

Aug 15, 2025

Apply

Simulation Environments Engineer

OpenAI

Full-time|Hybrid|San Francisco

Join Our Innovative Robotics TeamAt OpenAI, our Robotics team is dedicated to pioneering general-purpose robotics and striving towards artificial general intelligence (AGI) within dynamic, real-world environments. By integrating advanced hardware and software, we explore a diverse array of robotic configurations, aiming to harmoniously combine high-level AI capabilities with the physical constraints of real-world systems to enhance everyday lives.Your Role as a Simulation Environments EngineerWe are seeking a talented Simulation Environments Engineer to develop the tools and infrastructure that facilitate the creation of comprehensive, realistic virtual environments for robotics research and evaluation. This position emphasizes building systems that empower researchers and engineers to define, visualize, generate, and validate task environments efficiently and at scale. You will design processes for importing and validating third-party content, develop procedural and randomized scenario generators, and produce user-friendly tools to streamline environment creation, making it swift, repeatable, and testable. This role sits at the convergence of game-engine expertise, asset engineering, and extensive simulation infrastructure.Location and Work ScheduleThis position is based in San Francisco, CA, requiring in-person collaboration three days a week.Key ResponsibilitiesDevelop interactive and programmatic tools for researchers to describe, preview, and validate scenes and tasks, enabling quick and consistent scenario authoring.Establish content pipelines to curate, convert, optimize, and quality-check visual and collision assets from both external collections and internal resources, ensuring consistent behavior across engines and tasks.Create robust importers and adapters to integrate environments and setups from platforms such as Isaac, Unity, Unreal, and Omniverse into our simulation pipelines, maintaining fidelity and performance.Develop frameworks for procedural generation and controlled randomization (covering visual, physical, and kinematic aspects) to provide models with systematic, measurable variations in conditions.Define and uphold quality standards for environments (including visual fidelity, collision accuracy, and physical realism) and implement validation tools to ensure environments meet established realism and coverage objectives.Link environment tools to CI/CD processes, presubmit checks, large-scale simulation farms, and model evaluation pipelines to facilitate automatic testing and large-scale execution of environments.

Aug 22, 2025

Apply

Research Engineer, Frontier Red Team (Autonomy)

Anthropic

On-site|On-site|San Francisco, CA

About AnthropicAt Anthropic, we are dedicated to developing AI systems that are reliable, interpretable, and controllable. Our mission is to ensure that AI benefits individuals and society as a whole. We are a rapidly expanding team of passionate researchers, engineers, policy experts, and business leaders united in our efforts to create safe and beneficial AI systems.About the TeamThe Frontier Red Team (FRT) is a specialized technical research group within Anthropic's Policy division. Our mission is to enhance global safety in the age of advanced AI by thoroughly understanding the capabilities of these systems and developing effective defenses against potential threats.In 2026, we are concentrating on research aimed at ensuring the safety of self-improving, highly autonomous AI systems, particularly those with cyberphysical capabilities. Explore our previous work on cyberdefense, robotics, and Project Vend. This is groundbreaking research with the potential for significant impact.About the RoleAs a Research Engineer on our team, you will tackle the critical challenge of defending against the potential adversarial use of powerful, autonomous, self-improving AI systems.Your role will involve constructing and evaluating model organisms of autonomous systems and developing the defensive mechanisms necessary to counteract them. This work lies at the intersection of AI capabilities research, security, and policy—what we discover will directly influence how Anthropic and the wider world prepares for advanced AI.This is applied research with substantial implications. Your contributions will inform decisions at the highest echelons of the company, aid in public demonstrations that shape policy discussions, and help develop technical defenses that could be crucial as AI systems evolve.What You Will DoDesign and construct autonomous AI systems capable of utilizing tools and operating in varied environments, creating model organisms that enhance our understanding and defenses against advanced adversarial AI.Develop evaluations and training environments to influence agent behavior in beneficial ways.Create defensive agents that can detect, disrupt, or outmaneuver adversarial AI systems in realistic scenarios.Integrate Claude with hardware platforms (e.g., robotics, physical systems) to assess cyberphysical risks and defenses.

Jan 29, 2026

Apply

Research Engineer at Roam | San Francisco

Roam

Full-time|$165.8K/yr - $353.3K/yr|On-site|San Francisco

About RoamRoam is an innovative Applied AI laboratory dedicated to developing World Models for immersive 3D environments. Our mission is to elevate models from the traditional 2D pixel space into the vibrant, fully interactive dimensions of virtual worlds. Our extensive work encompasses all aspects necessary for perception, reasoning, and creation in 3D, with a wide range of applications in simulation, design, and entertainment.Our primary focus is on gaming: crafting and understanding 3D environments for interactive entertainment. We aim to empower millions of users to explore, create, and share captivating worlds together.Much like TikTok has democratized content creation through user-friendly tools, we aspire to enable everyone to transform their ideas into games without requiring any technical expertise.Since our founding earlier this year, we have achieved:Secured partnerships with over 1,000 gaming and livestream creators (with follower counts ranging from 100K to 10M) who will introduce Roam games to an impressive combined audience of over 250 million fans.Built a talented 12-person applied AI team consisting of former Moonshot engineers (specializing in post-training agents and diffusion model training), seasoned technical founders, and engineers who have developed over 100 games for companies like Voodoo and Supersonic, as well as mobile applications from inception to 2 million daily active users.Successfully raised an undisclosed investment round from backers of Google DeepMind, Bytedance, Anduril, Together AI, and Polymarket, along with angel investors from OpenAI, Anthropic, xAI, DeepMind, and Cognition, as well as prominent content creators.Acquired a consumer product studio that has achieved over 11.5 million downloads.About the Role:In this position, you will operate at the crossroads of applied AI and large-scale systems, concentrating on world generation pipelines, data-efficient training methodologies, and optimization techniques for real-time generation. Your contributions will facilitate the creation, enjoyment, and monetization of highly engaging games and 3D environments. Our ongoing research areas include:Distributed multi-agent orchestration systems for cohesive world synthesis, encompassing terrain generation, asset population, and behavior trees.Reinforcement learning pipelines aimed at adaptive, open-ended game generation featuring hierarchical action spaces and dynamic difficulty adjustments.Agentic code generation and diffusion-based tools for producing studio-quality simulated environments.We’re looking for engineers who possess:A strong background in applied AI and systems engineering.

Jun 1, 2025

Apply

Researcher in Frontier Cybersecurity Risks

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Safety Systems organization at OpenAI is dedicated to ensuring that our most advanced AI models are developed and deployed in a responsible manner. We engineer evaluations, safeguards, and safety frameworks to help our models operate as intended in real-world applications.The Preparedness team plays a crucial role within the Safety Systems organization, guided by OpenAI’s Preparedness Framework.While frontier AI models have the potential to benefit humanity, they also introduce significant risks. The Preparedness team is essential in anticipating and preparing for catastrophic risks associated with advanced AI models to ensure that AI fosters positive change.Our mission includes:Monitoring and predicting the evolving capabilities of frontier AI systems, particularly regarding risks that could have catastrophic consequences.Establishing concrete procedures, infrastructure, and partnerships to effectively mitigate these risks and safely manage the development of powerful AI systems.Preparedness integrates capability assessment, evaluations, internal red teaming, and mitigations for frontier models, along with overall coordination on AGI preparedness. This fast-paced and impactful work holds significant importance for both the company and society.About the RoleAs models become increasingly capable—transitioning from tools that assist humans to agents that can autonomously plan, execute, and adapt in the real world—cybersecurity emerges as a critical frontier. The same systems that boost productivity can also lead to increased exploitation.In the role of Researcher focusing on cybersecurity risks, you will be instrumental in designing and implementing a comprehensive mitigation strategy to address severe cyber misuse across OpenAI’s products. This position demands strong technical expertise and extensive collaboration across teams to ensure that safeguards are enforceable, scalable, and effective. You will contribute to the development of robust protections that evolve alongside our products, model capabilities, and attacker behaviors.Key ResponsibilitiesDevelop and implement mitigation strategies for model-enabled cybersecurity threats.Collaborate with cross-functional teams to ensure effective risk management.Continuously assess and iterate on security measures to adapt to new challenges.

Feb 19, 2026

Apply

Evaluations Engineer, Applied AI

Scale AI

Full-time|$179.4K/yr - $224.3K/yr|On-site|San Francisco, CA; New York, NY

Join Scale AI as a passionate and technically adept AI Research Engineer within our Enterprise Evaluations team. This pivotal role is integral to our goal of providing the industry's leading Generative AI Evaluation Suite. You will actively contribute to the foundational systems that guarantee the safety, dependability, and ongoing enhancement of LLM-driven workflows and agents for enterprise clients. The perfect candidate will possess a robust understanding of large language models, a fervor for addressing intricate evaluation dilemmas, and the ability to excel in a fast-evolving research atmosphere. We seek an engineer who can innovate, remains informed about the latest studies in AI evaluation, and is enthusiastic about incorporating cutting-edge research concepts into our workflows to create top-tier evaluation systems.

Mar 26, 2026

Apply

Software Engineer, Frontier Clusters Infrastructure

OpenAI

Full-time|On-site|San Francisco

About the TeamJoin the innovative Frontier Systems team at OpenAI, where we design, implement, and maintain the world's largest supercomputers, essential for advancing our most groundbreaking model training initiatives.We transform data center blueprints into operational systems while crafting the software necessary for executing large-scale frontier model trainings.Our mission is to establish, stabilize, and ensure the reliability and efficiency of these hyperscale supercomputers throughout the training of our frontier models.About the RoleWe are seeking passionate engineers to manage the next generation of compute clusters that underpin OpenAI’s frontier research.This position merges distributed systems engineering with practical infrastructure work across our expansive data centers. You will scale Kubernetes clusters to unprecedented levels, automate bare-metal setups, and create the software layer that simplifies the complexity of numerous nodes across various data centers.Your work will be at the crossroads of hardware and software, where speed and reliability are paramount. Be prepared to oversee dynamic operations, swiftly identify and resolve pressing issues, and constantly elevate the standards for automation and uptime.In this role, you will:Provision and scale extensive Kubernetes clusters, including automation for deployment, bootstrapping, and lifecycle managementCreate software abstractions that integrate multiple clusters and provide a cohesive interface for training workloadsOversee node deployment from bare metal to firmware upgrades, ensuring rapid, repeatable setups at scaleEnhance operational metrics by reducing cluster restart times (e.g., from hours to minutes) and expediting firmware and OS upgrade cyclesIntegrate networking and hardware health systems to ensure end-to-end reliability across servers, switches, and data center infrastructureDevelop monitoring and observability systems to identify issues early and maintain cluster stability under high loadsYou might thrive in this role if you:Have extensive experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environmentsPossess strong programming skills in languages relevant to cloud and infrastructure management

Nov 7, 2024

Apply

Research Scientist, Frontier Risk Evaluations

Scale AI, Inc.

Full-time|$197.4K/yr - $246.8K/yr|On-site|San Francisco, CA; New York, NY

Join Scale AI as a Research Scientist — Frontier Risk EvaluationsAt Scale AI, we are at the forefront of data and evaluation services for pioneering AI technologies. Our mission is to ensure the safe and effective deployment of AI systems by bridging the gap between advanced AI research and global policy frameworks. With the launch of Scale Labs, we are assembling a dedicated team focused on policy research to empower governments and industry leaders with scientific insights regarding AI risks and functionalities.This team addresses complex challenges in agent robustness, AI control mechanisms, and risk assessments to facilitate a comprehensive understanding of AI risks, while promoting its responsible adoption across various sectors. We are eager to welcome skilled researchers who are passionate about shaping the future of AI.As a Research Scientist specializing in Frontier Risk Evaluations, you will be responsible for designing evaluation metrics, harnesses, and datasets to assess the risks associated with cutting-edge AI systems. Your role may involve:Developing harnesses to evaluate AI models for potential security vulnerabilities and other high-risk behaviors.Collaborating with government entities and research labs to design evaluations that mitigate risks posed by advanced AI technologies.Publishing evaluation methodologies and drafting technical reports aimed at informing policymakers.

Mar 26, 2026

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Research Engineer in Economic Research

Anthropic

Full-time|On-site|San Francisco, CA

Join Anthropic as a Research Engineer focusing on Economic Research. In this role, you will leverage your analytical skills to conduct in-depth economic analysis and contribute to innovative projects aimed at enhancing our understanding of economic models and their implications.

Mar 12, 2026

Apply

Research Engineer / Research Scientist, Post-Training

OpenAI

Full-time|Hybrid|San Francisco

About the TeamJoin the innovative Post-Training team at OpenAI, where we focus on refining and elevating pre-trained models for deployment in ChatGPT, our API, and future products. Collaborating closely with various research and product teams, we conduct crucial research that prepares our models for real-world deployment to millions of users, ensuring they are safe, efficient, and reliable.About the RoleAs a Research Engineer / Scientist, you will spearhead the research and development of enhancements to our models. Our work intersects reinforcement learning and product development, aiming to create cutting-edge solutions.We seek passionate individuals with robust machine learning engineering skills and research experience, particularly with innovative and powerful models. The ideal candidate will be driven by a commitment to product-oriented research.This position is located in San Francisco, CA, and follows a hybrid work model requiring three days in the office each week. Relocation assistance is available for new employees.In this role, you will:Lead and execute a research agenda aimed at enhancing model capabilities and performance.Work collaboratively with research and product teams to empower customers to optimize their models.Develop robust evaluation frameworks to monitor and assess modeling advancements.Design, implement, test, and debug code across our research stack.You may excel in this role if you:Possess a deep understanding of machine learning and its applications.Have experience with relevant models and methodologies for evaluating model improvements.Are adept at navigating large ML codebases for debugging purposes.Thrive in a fast-paced and technically intricate environment.About OpenAIOpenAI is a pioneering AI research and deployment organization dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We are committed to pushing the boundaries of AI capabilities while prioritizing safety and human-centric values in our products. Our mission is to embrace diverse perspectives, voices, and experiences that represent the full spectrum of humanity, as we strive for a future where AI is a powerful ally for everyone.

Dec 1, 2025

Apply

Research Engineer/Research Scientist, RL/Reasoning

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamJoin the forefront of AI innovation with the RL and Reasoning team at OpenAI. Our team is dedicated to advancing reinforcement learning research and has pioneered transformative projects, including o1 and o3. We are committed to pushing the limits of generative models while ensuring their scalable deployment.About the RoleAs a Research Engineer/Research Scientist at OpenAI, you will play a pivotal role in enhancing AI alignment and capabilities through state-of-the-art reinforcement learning techniques. Your contributions will be essential in training intelligent, aligned, and versatile agents that power various AI models.We seek individuals with a solid foundation in reinforcement learning research, agile coding skills, and a passion for rapid iteration.This position is located in San Francisco, CA, and follows a hybrid work model of three days in the office per week. We also provide relocation assistance for new hires.You may excel in this role if:You are enthusiastic about being at the cutting edge of RL and language model research.You take initiative, owning ideas and driving them to fruition.You value principled methodologies, conducting simple experiments in controlled environments to draw trustworthy conclusions.You thrive in a fast-paced, complex technical environment where rapid iteration is essential.You are adept at navigating extensive ML codebases to troubleshoot and enhance them.You possess a profound understanding of machine learning and its applications.About OpenAIOpenAI is a pioneering AI research and deployment organization committed to ensuring that general-purpose artificial intelligence serves the greater good for humanity. We strive to push the boundaries of AI system capabilities while prioritizing safe deployment through our innovative products. We recognize AI as a powerful tool that must be developed with safety and human-centric principles, embracing diverse perspectives to reflect the full spectrum of humanity.We are proud to be an equal opportunity employer, welcoming applicants from all backgrounds without discrimination based on race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or any other legally protected characteristic.

May 14, 2025

Apply

Software Engineer, Frontier Systems - Power Management

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the innovative Frontier Systems team at OpenAI, where we develop, deploy, and maintain some of the world’s largest supercomputers used for pioneering model training. Our expertise in transforming data center designs into fully operational systems enables us to build the necessary software to facilitate expansive frontier model trainings.Our mission is to establish, stabilize, and ensure the dependability and efficiency of these hyperscale supercomputers throughout the training of our advanced models.About This PositionAs a Software Engineer on the Frontier Systems team with a focus on power management, you will play a pivotal role in enhancing our groundbreaking research capabilities. Given the significant power demands of large-scale supercomputers, your expertise will be essential in optimizing power management to maximize computational efficiency. This role is vital for maintaining smooth operations within our cutting-edge research supercomputing framework, ensuring both reliability and grid-level power consistency.Our team fosters an environment that empowers talented engineers with substantial autonomy and ownership, allowing for impactful contributions. You will be challenged to conduct thorough system-level investigations and develop automated solutions, tackling complex issues with depth and precision while creating scalable automation for detection and remediation.Your Responsibilities Will Include:Design and implement both system-level and software-level solutions to optimize power consumption in large-scale supercomputers, ensuring efficient and reliable operations.Develop automation tools to monitor power consumption patterns during training workloads and create algorithms to stabilize these fluctuations, safeguarding grid reliability.Collaborate with researchers and engineers to create tools for real-time monitoring, detection, and resolution of power-related hardware and system issues.Work cross-functionally to translate complex electrical system requirements into executable code, driving ongoing enhancements in our power management strategies.Lead the creation of power throttling mechanisms at the IT system level, dynamically adjusting power usage based on workload demands and infrastructure constraints.Partner with hardware design teams to integrate system-level power control requirements into hardware design, ensuring seamless collaboration between software-driven power management and hardware functionalities.

Oct 31, 2024

Apply

Backend Software Engineer (Evals)

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the dynamic Support Automation team at OpenAI, where we utilize cutting-edge AI technologies to tackle real-world challenges and automate processes across our organization. Our mission is to enhance productivity from customer operations to engineering by building a suite of automation tools that empower our team members. We are dedicated to creating innovative products that prioritize quality and reliability through rapid prototyping and reusable solutions applicable across various sectors within OpenAI.In summary, our team harnesses OpenAI's technology to improve our internal processes, and you will have the unique opportunity to access both public and pre-released technologies to achieve this goal.About the RoleWe are seeking a talented Backend Software Engineer with substantial experience in machine learning and large language models to help us design and implement an evaluation infrastructure that measures the effectiveness of OpenAI's support automation efforts. This role is highly technical and collaborative, requiring you to build resilient systems and backend services that underpin knowledge creation, access, and application throughout OpenAI. You will work closely with Data Science and Research teams to scale evaluations effectively.Key Responsibilities:Design and implement evaluation pipelines that are dependable, reproducible, and scalable.Develop infrastructure for continuous evaluation monitoring, including regression and drift monitoring, and establish robust feedback loops to enhance support automation.Create, maintain, and support backend services and APIs that facilitate intelligent automation and knowledge systems.Integrate and organize data across internal platforms, optimizing it for downstream systems and AI workflows.Collaborate closely with data, research, and engineering teams to effectively integrate OpenAI models into impactful workflows.Oversee the full development lifecycle of new backend systems and internal platform features.Design with scalability and maintainability in mind while iterating rapidly on innovative ideas.Ideal Candidate Profile:4+ years of experience in backend engineering within product-focused companies (excluding internships).Proficient in designing and building reliable backend systems, with a strong understanding of machine learning principles.Experience collaborating across teams to drive project success and impact.

Dec 23, 2025

Create account — see all 5,456 results

1 - 20 of 5,456 Jobs

Select all on this page (20)

Apply

Research Engineer, Frontier Evals & Environments

OpenAI

Full-time|On-site|San Francisco

Apr 13, 2025

Apply

Research Engineer, Frontier Evals & Environments - Finance

OpenAI