Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
Master's degree in Education, Social Sciences, or a related field.5+ years of experience in research and evaluation within educational contexts. Strong analytical skills with proficiency in statistical software. Excellent communication and presentation skills.
About the job
Join aiedu as a Senior Lead in Research & Evaluation, where you will drive impactful research initiatives that shape educational practices and policies. In this role, you will lead a team of researchers in designing and executing comprehensive evaluations that inform our strategic direction. Your expertise will be critical in analyzing data, generating insights, and communicating findings to stakeholders.
About aiedu
aiedu is an innovative leader in educational research, committed to enhancing learning experiences through data-driven insights. Our team of experts is dedicated to making a positive impact in the education sector.
Join aiedu as a Senior Lead in Research & Evaluation, where you will drive impactful research initiatives that shape educational practices and policies. In this role, you will lead a team of researchers in designing and executing comprehensive evaluations that inform our strategic direction. Your expertise will be critical in analyzing data, generating insights, and communicating findings to stakeholders.
About CartesiaAt Cartesia, we are on a mission to revolutionize artificial intelligence by creating interactive, ubiquitous intelligence that operates seamlessly wherever you are. Current AI models struggle to continuously process and reason over extensive streams of data, including a year’s worth of audio, video, and text. Our innovative team is developing advanced model architectures to overcome these challenges.Founded by PhDs from the Stanford AI Lab who pioneered State Space Models, we blend deep expertise in model innovation with a design-focused engineering approach. With backing from top-tier investors such as Index Ventures and Lightspeed Venture Partners, along with a network of industry-leading advisors, we are pushing the boundaries of AI.About the RoleJoin our New Horizons Evaluations team as the Evaluations Lead, where you will redefine how we measure progress in interactive machine intelligence. You will create evaluation frameworks that assess not only what models know but also how they reason, remember, and engage over time. This multifaceted role bridges research, product development, and infrastructure to establish metrics and systems that articulate the essence of “intelligence” in the next wave of AI. Ideal candidates will possess a blend of scientific rigor and technical prowess, alongside a genuine curiosity about user interactions with intelligent systems. Your contributions will be pivotal in shaping Cartesia’s model development, focusing on deeper qualities such as understanding, naturalness, and adaptability in real-world applications.Your ImpactDefine and identify essential model capabilities and behaviors for next-generation evaluations.Develop and implement comprehensive evaluation pipelines with robust statistical analysis and transparent reporting.Collaborate closely with model training and research teams to integrate evaluation systems into the model development process.Design and prototype user studies and behavioral experiments to ground evaluations in practical use.
Full-time|Remote|Remote-Friendly (Travel-Required) | San Francisco, CA | New York City, NY
Anthropic is looking for a Research Engineer focused on model evaluations. This position involves research and development to assess and strengthen the performance of AI models. Teams are based in San Francisco and New York City, and the role supports remote work with required travel. Key responsibilities Design and implement evaluations for Anthropic's AI models Collaborate with team members to enhance model performance Contribute to research that pushes the boundaries of AI systems Location Remote-friendly (travel required) San Francisco, CA New York City, NY
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.
abundant seeks a Research Lead based in San Francisco. This position steers research activities that help shape the company’s direction. The Research Lead partners with colleagues to analyze data, draw meaningful insights, and support projects where research has a clear business impact. Key responsibilities Plan, manage, and execute research initiatives from start to finish Work with team members to analyze data and spot important trends Turn research results into practical recommendations for the business Support projects that guide company strategy Collaboration and impact This role involves close teamwork and communication across departments. Research findings directly inform business decisions and contribute to the company’s ongoing growth.
Full-time|$197.4K/yr - $246.8K/yr|On-site|San Francisco, CA; New York, NY
Join Scale AI as a Research Scientist — Frontier Risk EvaluationsAt Scale AI, we are at the forefront of data and evaluation services for pioneering AI technologies. Our mission is to ensure the safe and effective deployment of AI systems by bridging the gap between advanced AI research and global policy frameworks. With the launch of Scale Labs, we are assembling a dedicated team focused on policy research to empower governments and industry leaders with scientific insights regarding AI risks and functionalities.This team addresses complex challenges in agent robustness, AI control mechanisms, and risk assessments to facilitate a comprehensive understanding of AI risks, while promoting its responsible adoption across various sectors. We are eager to welcome skilled researchers who are passionate about shaping the future of AI.As a Research Scientist specializing in Frontier Risk Evaluations, you will be responsible for designing evaluation metrics, harnesses, and datasets to assess the risks associated with cutting-edge AI systems. Your role may involve:Developing harnesses to evaluate AI models for potential security vulnerabilities and other high-risk behaviors.Collaborating with government entities and research labs to design evaluations that mitigate risks posed by advanced AI technologies.Publishing evaluation methodologies and drafting technical reports aimed at informing policymakers.
Our MissionAt Reflection AI, we are committed to creating open superintelligence that is accessible to everyone. Our team is dedicated to developing open weight models tailored for individuals, agents, enterprises, and nation states. Our diverse group of AI experts comes from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.About the RoleAs a Research Program Manager (RPM) at Reflection AI, you will play a pivotal role in leading and collaborating with our research and infrastructure teams to expedite the advancement of cutting-edge model development. You will not merely track projects; you will be a catalyst for clarity in uncertain situations, facilitate decision-making processes, and ensure cohesive integration across multiple teams.This is a crucial position where you will spearhead the establishment of model evaluations and safety protocols from the ground up. You will define evaluation frameworks, construct the operational infrastructure for model safety, and create processes that seamlessly connect evaluations within the model development lifecycle. You will be laying the foundation for how Reflection AI interacts with the broader safety ecosystem. This is quintessential 0-to-1 work.Possessing a proactive, first-responder mindset, you will take initiative to address challenges head-on, assess situations, and drive resolutions collaboratively.What You'll DoDevelop the essential infrastructure for model evaluations and safety. Formulate evaluation frameworks, outline tooling requirements, and establish operational processes that will guide our assessment of model capabilities, risks, and readiness for deployment.Establish model safety operations as a core function, including setting workflows, review schedules, and decision-making frameworks that link safety evaluations to the model development and release processes.Collaborate with research and engineering leads throughout the pre-training, mid-training, and post-training phases to integrate safety and evaluation checkpoints into the development workflow in a manner that is thorough yet efficient.Lead the scoping and prioritization of evaluation science and infrastructure investments, partnering with technical leads to determine which aspects to develop internally and which to adopt from external sources.
Full-time|$225K/yr - $275K/yr|Hybrid|London, England, United Kingdom; New York, New York, United States; San Francisco, California, United States
Who We AreLightning AI, the innovative force behind PyTorch Lightning, was established in 2019 to create a seamless end-to-end platform for developing, training, and deploying artificial intelligence systems. Our mission is to facilitate the transition from research to production effortlessly.In partnership with Voltage Park, a leading neocloud and AI Factory, Lightning AI merges developer-centric software with optimized, large-scale computing solutions. We empower teams with the necessary tools for experimentation, training, and production inference while ensuring built-in security, observability, and control.We cater to individual researchers, emerging startups, and large enterprises alike. With a global presence, our offices are located in New York City, San Francisco, Seattle, and London, backed by top-tier investors including Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.
Pluralis Research is at the forefront of Protocol Learning—an innovative decentralized approach to training and deploying AI models that democratizes access to this technology for individuals, rather than just large corporations. By aggregating computing resources from numerous contributors, incentivizing participation, and ensuring no single entity can dominate the model's complete weights, we are forging a truly open and collaborative pathway to cutting-edge AI.Role OverviewWe are seeking a passionate Developer Relations Lead to serve as the crucial technical liaison between Pluralis's research initiatives and the broader machine learning and systems communities. In this role, you will transform complex, groundbreaking research (including distributed training, communication-efficient model parallelism, and fault-tolerant optimization) into clear, engaging, and accessible content for researchers, engineers, and innovators.This position is not merely a traditional marketing role. We are looking for an individual who can digest our research papers, grasp the underlying architecture, and convey these insights effectively through blog posts, conference presentations, or social media updates. You will shape our technical narrative and become the face of Pluralis's contributions within the community.
About UsAt 1mind, we are revolutionizing the way revenue teams engage with clients through our innovative platform that deploys multimodal Superhumans. These advanced digital agents combine human-like appearance and communication skills with extensive technical and product knowledge, allowing them to conduct unlimited conversations around the clock. Our Superhumans are designed to qualify leads, schedule meetings, deliver presentations, provide interactive demos, address objections, identify pain points, construct value models, and facilitate customer onboarding. They seamlessly integrate into existing workflows, enhancing productivity and driving measurable revenue growth while improving the overall buyer experience.Job DescriptionWe are seeking a dynamic AI Research Lead to spearhead our research initiatives at 1mind. This pivotal position will significantly influence our company's direction and success. In this role, you will conduct exploratory research focused on post-training methodologies for sales and go-to-market (GTM) strategies, developing models that elucidate human selling behaviors and relationship-building processes. Collaborating closely with our CTO, you will have the autonomy to shape the research agenda, build a dedicated team, and disseminate your findings. This is an applied research role with immediate impact; our agents are currently operational in customer environments, producing high-quality data from thousands of authentic buyer interactions.If you have a background in cutting-edge research and post-training work at a leading lab, and you are eager to create solutions that will be implemented in production environments utilizing unique datasets, we want to hear from you.
About Our TeamJoin the Safety Systems team at OpenAI, where we are dedicated to ensuring that our cutting-edge models are deployed safely in the real world, positively impacting society. We are at the forefront of OpenAI's mission to develop and implement safe Artificial General Intelligence (AGI), emphasizing a culture of trust, transparency, and responsibility in AI.The Safety Research team is focused on advancing our capabilities to implement robust and safe behaviors in AI models and systems. As we make strides in AI capabilities, our safety approaches must evolve to effectively address the changing landscape of risks. This vigilance is essential not only for preventing harmful misuse but also for ensuring that potential misalignments do not result in adverse outcomes. Our research is grounded in current methodologies while also being adaptable to future systems.As we expand our team, we are looking for innovative research methods that enhance safety for AGI and beyond. This includes exploratory research into improving safety common sense and generalizable reasoning, developing evaluations to identify misalignment or hidden objectives of AI, and creating new strategies to support human oversight during long-term tasks.About the RoleIn your capacity as a Technical Lead, you will spearhead our strategic initiatives aimed at mitigating potential risks arising from misalignment or significant errors. Your responsibilities will encompass:Establishing visionary goals and milestones for new research endeavors, alongside crafting rigorous evaluations to monitor progress.Leading or driving research into new exploratory areas to validate the feasibility and scalability of our safety approaches.Collaborating across safety research and related teams to ensure that diverse technical strategies converge to deliver robust safety outcomes.We seek individuals with a proven track record in practical research concerning safety and alignment, particularly within the realms of AI and large language models (LLMs), who have successfully led substantial research initiatives in the past.This role is situated in San Francisco, CA, with a hybrid work model of three days in the office each week. We also offer relocation assistance to new employees.
Full-time|Hybrid|Remote-Friendly (Travel Required) | San Francisco, CA; San Francisco, CA | New York City, NY
Join Anthropic as a Research Lead for Training Insights, where you'll spearhead innovative research initiatives that shape the future of AI training methodologies. As part of our dynamic team, you will collaborate with cross-functional experts to extract meaningful insights from training data, driving improvements in AI models. Your expertise will be vital in enhancing our understanding of AI performance and guiding strategic decisions.
The Center for AI Safety (CAIS) is at the forefront of research and advocacy dedicated to addressing the pressing risks posed by artificial intelligence on a societal scale. Our mission encompasses tackling the most challenging aspects of AI through rigorous technical research, fostering field-building initiatives, and engaging in impactful policy discussions, in collaboration with our sister organization in Washington D.C., the Center for AI Safety Action Fund.As public awareness regarding AI safety is the most significant obstacle we face, we are excited to see a rapid increase in public interest in this critical area. Content related to AI safety—including videos, articles, and social media posts—is now reaching millions and fueling a growing curiosity. To capitalize on this momentum, CAIS is assembling a dedicated public engagement team aimed at maximizing the visibility and importance of AI safety.We are seeking a Communications Lead who will serve as the vital link between complex research and public communication. In this role, you will transform intricate technical work into accessible public communication materials such as explainers, FAQs, and briefs. Your responsibilities will include ensuring the accuracy and clarity of our claims while upholding high editorial standards across all outputs. You will report directly to the Head of Public Engagement.
Role OverviewAt Intrinsic Safety, we are pioneering the development of AI systems capable of making critical decisions in high-stakes environments such as risk investigations, fraud detection, and identity verification. Our dedicated team in San Francisco is at the forefront of tackling complex challenges where traditional AI solutions often fall short.We are in search of a Research Engineer to play a pivotal role in shaping our model evaluation strategies. You will be responsible for creating benchmarks, datasets, and evaluation frameworks that accurately assess our systems’ performance in real-world scenarios. This position bridges research, product development, and engineering, focusing on rigorous evaluations that reflect actual customer workflows and identify key failure points to propel the next generation of AI advancements.
About SunoAt Suno, we are revolutionizing the way original music is created and discovered. Our innovative platform features Suno Studio, a cutting-edge generative audio workstation, and Hooks, an exciting new method for creating and sharing captivating videos that bring music to life. Fueled by a leading music-creation model, our platform empowers individuals to turn their inspiration into music, making creativity accessible to all. With Suno, your ideas transform into sound, and every song is uniquely yours — allowing everyone to experience the joy of creation.Recognized as one of the fastest-growing AI startups globally, we are supported by prestigious investors including Menlo Ventures, Lightspeed Venture Partners, Matrix, NVentures, and Hallwood Media.Founded in 2023 by passionate Harvard alumni, Suno has already empowered over 100 million users to craft their original songs. In 2025, we successfully raised a $250 million Series C funding round to fuel our product innovation and accelerate our international expansion.About the RoleWe are seeking a dedicated Senior Qualitative UX Researcher to champion the voice of the user within Suno's diverse user base, particularly focusing on our power users and professional creators. In this role, you will delve into how musicians and serious creators engage with Suno, identifying what captivates them and what gaps need to be filled. You will collaborate closely with Product Managers, designers, and engineers, ensuring user feedback is integrated into product development from the ground up.As part of our agile teams, you will be responsible for planning, executing, and synthesizing research swiftly, aligning with the dynamic pace of our product organization. We value timely insights that can directly influence decision-making, so this opportunity is perfect for those who thrive in a high-ownership, fast-paced environment.Explore the song we created about this role: https://suno.com/s/6utW9leBXUGxAVqBWhat You’ll DoConduct comprehensive qualitative research — including interviews, usability tests, and diary studies — across mobile, web, and platform experiences.Create and distribute surveys that complement qualitative insights, providing necessary validation for research hypotheses.Work collaboratively with Product, Design, and Engineering teams to define research objectives, identify key learning goals, and quickly transform findings into actionable recommendations.Deliver research outcomes promptly to inform product development decisions.
About StravaStrava is the ultimate app for active individuals, boasting a community of over 180 million athletes across more than 185 countries. More than just a fitness tracker, Strava is a platform where people come together to achieve their goals, from forming new habits to breaking personal records. Whether you're into cycling, running, or any other sport, Strava is your go-to app. Join us and embark on your journey with Strava today.Our mission is straightforward: to inspire people to lead their most active lives. We believe in the transformative power of movement to connect and propel individuals forward.About This RoleThe Research & Insights team at Strava is dedicated to guiding user-centered decision-making that drives our business and product innovation. We convert user signals into strategic insights, validating opportunities and creating experiences that empower our users to thrive, ensuring they remain loyal members of the Strava community.We are seeking a Senior UX/Product Researcher to spearhead the comprehensive research program for User Lifecycle. The User Lifecycle team is responsible for the entire Strava membership experience, from the first app download to lifelong community engagement. Your role will involve collaborating with cross-functional teams to conduct impactful research that informs product strategy and ensures decisions are grounded in data-driven insights. You will report directly to the Director of Research and Insights.What You’ll Do:Collaborate with stakeholders and the research team to identify, prioritize, and lead research initiatives that significantly impact company and product strategy, user satisfaction, and overall business outcomes.Lead high-impact research projects while working closely with cross-functional partners, including Design, Product Management, Data Science, Engineering, and Marketing.Identify optimal methods for sharing research insights, establishing trust-based relationships with colleagues to foster collaboration and innovation. You will influence cross-functional partners to act on research findings by providing clear, actionable recommendations and relevant business context.Embrace a diversity of perspectives while maintaining a strong personal viewpoint that effectively cuts through complexity.
Full-time|$166K/yr - $166K/yr|Remote|San Francisco, CA or Remote, US
About DescriptDescript is a pioneering, AI-driven collaborative platform designed to simplify video creation, editing, and sharing for everyone. Our mission is to empower teams to edit videos effortlessly and provide powerful tools for our dedicated video teams.As a product-focused and customer-centric organization, we prioritize craftsmanship, speed, and clarity in thinking. User research is integral to our process, helping us understand our users deeply and enabling teams to cultivate better product intuition, ultimately leading to superior products for our customers.About the RoleWe are seeking a Lead UX Researcher to influence product decisions and establish user research practices throughout Descript.In this senior individual contributor role, you will take ownership of complex problem spaces from inception to conclusion, while also enhancing how teams connect with users by providing improved tools, systems, and practices. This encompasses foundational and ethnographic research, enhancing our methods of synthesizing and disseminating insights, and empowering PMs and designers to conduct high-quality research independently.Descript caters to a diverse array of creators and enterprises, each with unique workflows and objectives. This position is crucial for fostering a deep understanding of our users and ensuring that this understanding consistently informs product decisions.If you are enthusiastic about joining a customer-focused culture, driving impactful research, and amplifying its influence across a product organization, we would love to connect with you.
At World Labs, we are pioneering the development of Large World Models—advanced AI systems designed to comprehend, reason, and engage with the physical environment. Our initiatives are at the cutting-edge of spatial intelligence, robotics, and multimodal AI, with an objective to empower machines to perceive and operate effectively in intricate real-world settings.We are curating a global team of researchers, engineers, and innovators dedicated to transcending the existing boundaries of artificial intelligence. If you are eager to work on transformative technology that will redefine machine perception and enhance human-AI interaction, this role is tailored for you.About World Labs:World Labs is an AI research and development company committed to creating spatially intelligent systems capable of modeling, reasoning, and acting in the real world. We envision a future where AI transcends text or pixels to thrive in three-dimensional, dynamic environments, and we are constructing the foundational models that will make this a reality.Our team unites expertise in machine learning, robotics, computer vision, simulation, and systems engineering. We operate with the agility of a startup combined with the vision of a research lab, tackling long-term challenges that require creativity, rigor, and resilience.Our mission is to develop the most advanced world models and leverage them to empower individuals, industries, and society.Role Overview:We are seeking a Tech Lead for 3D Modeling & Reconstruction to establish technical direction and drive execution for our essential 3D modeling initiatives. This position is suited for someone with a strong background in research science (RS) or research engineering (RE) who has made significant contributions to the field of 3D reconstruction and/or modeling—evidenced by academic publications, widely adopted open-source projects, or large-scale production systems.This is a hands-on leadership role where you will merge profound technical expertise with the ability to guide a high-impact team, influencing both the research roadmap and the production systems that integrate contemporary 3D modeling techniques into tangible products. You will collaborate closely with research, engineering, and product teams to convert innovative concepts into dependable, scalable solutions.
Join aiedu as a Senior Lead in Research & Evaluation, where you will drive impactful research initiatives that shape educational practices and policies. In this role, you will lead a team of researchers in designing and executing comprehensive evaluations that inform our strategic direction. Your expertise will be critical in analyzing data, generating insights, and communicating findings to stakeholders.
About CartesiaAt Cartesia, we are on a mission to revolutionize artificial intelligence by creating interactive, ubiquitous intelligence that operates seamlessly wherever you are. Current AI models struggle to continuously process and reason over extensive streams of data, including a year’s worth of audio, video, and text. Our innovative team is developing advanced model architectures to overcome these challenges.Founded by PhDs from the Stanford AI Lab who pioneered State Space Models, we blend deep expertise in model innovation with a design-focused engineering approach. With backing from top-tier investors such as Index Ventures and Lightspeed Venture Partners, along with a network of industry-leading advisors, we are pushing the boundaries of AI.About the RoleJoin our New Horizons Evaluations team as the Evaluations Lead, where you will redefine how we measure progress in interactive machine intelligence. You will create evaluation frameworks that assess not only what models know but also how they reason, remember, and engage over time. This multifaceted role bridges research, product development, and infrastructure to establish metrics and systems that articulate the essence of “intelligence” in the next wave of AI. Ideal candidates will possess a blend of scientific rigor and technical prowess, alongside a genuine curiosity about user interactions with intelligent systems. Your contributions will be pivotal in shaping Cartesia’s model development, focusing on deeper qualities such as understanding, naturalness, and adaptability in real-world applications.Your ImpactDefine and identify essential model capabilities and behaviors for next-generation evaluations.Develop and implement comprehensive evaluation pipelines with robust statistical analysis and transparent reporting.Collaborate closely with model training and research teams to integrate evaluation systems into the model development process.Design and prototype user studies and behavioral experiments to ground evaluations in practical use.
Full-time|Remote|Remote-Friendly (Travel-Required) | San Francisco, CA | New York City, NY
Anthropic is looking for a Research Engineer focused on model evaluations. This position involves research and development to assess and strengthen the performance of AI models. Teams are based in San Francisco and New York City, and the role supports remote work with required travel. Key responsibilities Design and implement evaluations for Anthropic's AI models Collaborate with team members to enhance model performance Contribute to research that pushes the boundaries of AI systems Location Remote-friendly (travel required) San Francisco, CA New York City, NY
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.
Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.
abundant seeks a Research Lead based in San Francisco. This position steers research activities that help shape the company’s direction. The Research Lead partners with colleagues to analyze data, draw meaningful insights, and support projects where research has a clear business impact. Key responsibilities Plan, manage, and execute research initiatives from start to finish Work with team members to analyze data and spot important trends Turn research results into practical recommendations for the business Support projects that guide company strategy Collaboration and impact This role involves close teamwork and communication across departments. Research findings directly inform business decisions and contribute to the company’s ongoing growth.
Full-time|$197.4K/yr - $246.8K/yr|On-site|San Francisco, CA; New York, NY
Join Scale AI as a Research Scientist — Frontier Risk EvaluationsAt Scale AI, we are at the forefront of data and evaluation services for pioneering AI technologies. Our mission is to ensure the safe and effective deployment of AI systems by bridging the gap between advanced AI research and global policy frameworks. With the launch of Scale Labs, we are assembling a dedicated team focused on policy research to empower governments and industry leaders with scientific insights regarding AI risks and functionalities.This team addresses complex challenges in agent robustness, AI control mechanisms, and risk assessments to facilitate a comprehensive understanding of AI risks, while promoting its responsible adoption across various sectors. We are eager to welcome skilled researchers who are passionate about shaping the future of AI.As a Research Scientist specializing in Frontier Risk Evaluations, you will be responsible for designing evaluation metrics, harnesses, and datasets to assess the risks associated with cutting-edge AI systems. Your role may involve:Developing harnesses to evaluate AI models for potential security vulnerabilities and other high-risk behaviors.Collaborating with government entities and research labs to design evaluations that mitigate risks posed by advanced AI technologies.Publishing evaluation methodologies and drafting technical reports aimed at informing policymakers.
Our MissionAt Reflection AI, we are committed to creating open superintelligence that is accessible to everyone. Our team is dedicated to developing open weight models tailored for individuals, agents, enterprises, and nation states. Our diverse group of AI experts comes from prestigious organizations such as DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic.About the RoleAs a Research Program Manager (RPM) at Reflection AI, you will play a pivotal role in leading and collaborating with our research and infrastructure teams to expedite the advancement of cutting-edge model development. You will not merely track projects; you will be a catalyst for clarity in uncertain situations, facilitate decision-making processes, and ensure cohesive integration across multiple teams.This is a crucial position where you will spearhead the establishment of model evaluations and safety protocols from the ground up. You will define evaluation frameworks, construct the operational infrastructure for model safety, and create processes that seamlessly connect evaluations within the model development lifecycle. You will be laying the foundation for how Reflection AI interacts with the broader safety ecosystem. This is quintessential 0-to-1 work.Possessing a proactive, first-responder mindset, you will take initiative to address challenges head-on, assess situations, and drive resolutions collaboratively.What You'll DoDevelop the essential infrastructure for model evaluations and safety. Formulate evaluation frameworks, outline tooling requirements, and establish operational processes that will guide our assessment of model capabilities, risks, and readiness for deployment.Establish model safety operations as a core function, including setting workflows, review schedules, and decision-making frameworks that link safety evaluations to the model development and release processes.Collaborate with research and engineering leads throughout the pre-training, mid-training, and post-training phases to integrate safety and evaluation checkpoints into the development workflow in a manner that is thorough yet efficient.Lead the scoping and prioritization of evaluation science and infrastructure investments, partnering with technical leads to determine which aspects to develop internally and which to adopt from external sources.
Full-time|$225K/yr - $275K/yr|Hybrid|London, England, United Kingdom; New York, New York, United States; San Francisco, California, United States
Who We AreLightning AI, the innovative force behind PyTorch Lightning, was established in 2019 to create a seamless end-to-end platform for developing, training, and deploying artificial intelligence systems. Our mission is to facilitate the transition from research to production effortlessly.In partnership with Voltage Park, a leading neocloud and AI Factory, Lightning AI merges developer-centric software with optimized, large-scale computing solutions. We empower teams with the necessary tools for experimentation, training, and production inference while ensuring built-in security, observability, and control.We cater to individual researchers, emerging startups, and large enterprises alike. With a global presence, our offices are located in New York City, San Francisco, Seattle, and London, backed by top-tier investors including Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
About AfterQuery AfterQuery partners with leading AI labs to advance training data and evaluation frameworks. The team builds high-signal datasets and runs thorough evaluations that go beyond standard benchmarks. As a post-Series A, early-stage company in San Francisco, AfterQuery gives each team member room to shape the future of AI models. Role Overview: Research Scientist - Frontier Data This role focuses on designing datasets and developing evaluation systems that influence how top AI models are trained and assessed. Working closely with research teams at major AI labs, the scientist explores new data collection techniques, investigates where models fall short, and sets up metrics to track progress. The work is hands-on and experimental, moving quickly from hypothesis to live testing and directly impacting large-scale model training. Key Responsibilities Design data slides and analyze data structures to uncover model weaknesses in areas like finance, software development, and enterprise operations. Build and refine evaluation rubrics and reward signals for RLHF and RLVR training approaches. Study annotator behavior and run experiments to improve model capabilities across different domains. Develop quantitative frameworks to measure dataset quality, diversity, and their effect on model alignment and performance. Work with research teams to turn training objectives into concrete data and evaluation needs. What We Look For Experience as an undergraduate or master’s research student (PhD not required). Background or internships with RL environments or AI safety and benchmarking organizations (e.g., METR, Artificial Analysis) is a strong plus. Genuine interest in how data structure, selection, and quality affect model outcomes. Demonstrated skill in designing experiments, acting quickly, and extracting insights from complex data. Comfort working across sectors such as finance, software engineering, and policy. Strong quantitative background and familiarity with LLM training pipelines, RLHF/RLVR methods, or evaluation frameworks. A hands-on mindset focused on building practical solutions.
Pluralis Research is at the forefront of Protocol Learning—an innovative decentralized approach to training and deploying AI models that democratizes access to this technology for individuals, rather than just large corporations. By aggregating computing resources from numerous contributors, incentivizing participation, and ensuring no single entity can dominate the model's complete weights, we are forging a truly open and collaborative pathway to cutting-edge AI.Role OverviewWe are seeking a passionate Developer Relations Lead to serve as the crucial technical liaison between Pluralis's research initiatives and the broader machine learning and systems communities. In this role, you will transform complex, groundbreaking research (including distributed training, communication-efficient model parallelism, and fault-tolerant optimization) into clear, engaging, and accessible content for researchers, engineers, and innovators.This position is not merely a traditional marketing role. We are looking for an individual who can digest our research papers, grasp the underlying architecture, and convey these insights effectively through blog posts, conference presentations, or social media updates. You will shape our technical narrative and become the face of Pluralis's contributions within the community.
About UsAt 1mind, we are revolutionizing the way revenue teams engage with clients through our innovative platform that deploys multimodal Superhumans. These advanced digital agents combine human-like appearance and communication skills with extensive technical and product knowledge, allowing them to conduct unlimited conversations around the clock. Our Superhumans are designed to qualify leads, schedule meetings, deliver presentations, provide interactive demos, address objections, identify pain points, construct value models, and facilitate customer onboarding. They seamlessly integrate into existing workflows, enhancing productivity and driving measurable revenue growth while improving the overall buyer experience.Job DescriptionWe are seeking a dynamic AI Research Lead to spearhead our research initiatives at 1mind. This pivotal position will significantly influence our company's direction and success. In this role, you will conduct exploratory research focused on post-training methodologies for sales and go-to-market (GTM) strategies, developing models that elucidate human selling behaviors and relationship-building processes. Collaborating closely with our CTO, you will have the autonomy to shape the research agenda, build a dedicated team, and disseminate your findings. This is an applied research role with immediate impact; our agents are currently operational in customer environments, producing high-quality data from thousands of authentic buyer interactions.If you have a background in cutting-edge research and post-training work at a leading lab, and you are eager to create solutions that will be implemented in production environments utilizing unique datasets, we want to hear from you.
About Our TeamJoin the Safety Systems team at OpenAI, where we are dedicated to ensuring that our cutting-edge models are deployed safely in the real world, positively impacting society. We are at the forefront of OpenAI's mission to develop and implement safe Artificial General Intelligence (AGI), emphasizing a culture of trust, transparency, and responsibility in AI.The Safety Research team is focused on advancing our capabilities to implement robust and safe behaviors in AI models and systems. As we make strides in AI capabilities, our safety approaches must evolve to effectively address the changing landscape of risks. This vigilance is essential not only for preventing harmful misuse but also for ensuring that potential misalignments do not result in adverse outcomes. Our research is grounded in current methodologies while also being adaptable to future systems.As we expand our team, we are looking for innovative research methods that enhance safety for AGI and beyond. This includes exploratory research into improving safety common sense and generalizable reasoning, developing evaluations to identify misalignment or hidden objectives of AI, and creating new strategies to support human oversight during long-term tasks.About the RoleIn your capacity as a Technical Lead, you will spearhead our strategic initiatives aimed at mitigating potential risks arising from misalignment or significant errors. Your responsibilities will encompass:Establishing visionary goals and milestones for new research endeavors, alongside crafting rigorous evaluations to monitor progress.Leading or driving research into new exploratory areas to validate the feasibility and scalability of our safety approaches.Collaborating across safety research and related teams to ensure that diverse technical strategies converge to deliver robust safety outcomes.We seek individuals with a proven track record in practical research concerning safety and alignment, particularly within the realms of AI and large language models (LLMs), who have successfully led substantial research initiatives in the past.This role is situated in San Francisco, CA, with a hybrid work model of three days in the office each week. We also offer relocation assistance to new employees.
Full-time|Hybrid|Remote-Friendly (Travel Required) | San Francisco, CA; San Francisco, CA | New York City, NY
Join Anthropic as a Research Lead for Training Insights, where you'll spearhead innovative research initiatives that shape the future of AI training methodologies. As part of our dynamic team, you will collaborate with cross-functional experts to extract meaningful insights from training data, driving improvements in AI models. Your expertise will be vital in enhancing our understanding of AI performance and guiding strategic decisions.
The Center for AI Safety (CAIS) is at the forefront of research and advocacy dedicated to addressing the pressing risks posed by artificial intelligence on a societal scale. Our mission encompasses tackling the most challenging aspects of AI through rigorous technical research, fostering field-building initiatives, and engaging in impactful policy discussions, in collaboration with our sister organization in Washington D.C., the Center for AI Safety Action Fund.As public awareness regarding AI safety is the most significant obstacle we face, we are excited to see a rapid increase in public interest in this critical area. Content related to AI safety—including videos, articles, and social media posts—is now reaching millions and fueling a growing curiosity. To capitalize on this momentum, CAIS is assembling a dedicated public engagement team aimed at maximizing the visibility and importance of AI safety.We are seeking a Communications Lead who will serve as the vital link between complex research and public communication. In this role, you will transform intricate technical work into accessible public communication materials such as explainers, FAQs, and briefs. Your responsibilities will include ensuring the accuracy and clarity of our claims while upholding high editorial standards across all outputs. You will report directly to the Head of Public Engagement.
Role OverviewAt Intrinsic Safety, we are pioneering the development of AI systems capable of making critical decisions in high-stakes environments such as risk investigations, fraud detection, and identity verification. Our dedicated team in San Francisco is at the forefront of tackling complex challenges where traditional AI solutions often fall short.We are in search of a Research Engineer to play a pivotal role in shaping our model evaluation strategies. You will be responsible for creating benchmarks, datasets, and evaluation frameworks that accurately assess our systems’ performance in real-world scenarios. This position bridges research, product development, and engineering, focusing on rigorous evaluations that reflect actual customer workflows and identify key failure points to propel the next generation of AI advancements.
About SunoAt Suno, we are revolutionizing the way original music is created and discovered. Our innovative platform features Suno Studio, a cutting-edge generative audio workstation, and Hooks, an exciting new method for creating and sharing captivating videos that bring music to life. Fueled by a leading music-creation model, our platform empowers individuals to turn their inspiration into music, making creativity accessible to all. With Suno, your ideas transform into sound, and every song is uniquely yours — allowing everyone to experience the joy of creation.Recognized as one of the fastest-growing AI startups globally, we are supported by prestigious investors including Menlo Ventures, Lightspeed Venture Partners, Matrix, NVentures, and Hallwood Media.Founded in 2023 by passionate Harvard alumni, Suno has already empowered over 100 million users to craft their original songs. In 2025, we successfully raised a $250 million Series C funding round to fuel our product innovation and accelerate our international expansion.About the RoleWe are seeking a dedicated Senior Qualitative UX Researcher to champion the voice of the user within Suno's diverse user base, particularly focusing on our power users and professional creators. In this role, you will delve into how musicians and serious creators engage with Suno, identifying what captivates them and what gaps need to be filled. You will collaborate closely with Product Managers, designers, and engineers, ensuring user feedback is integrated into product development from the ground up.As part of our agile teams, you will be responsible for planning, executing, and synthesizing research swiftly, aligning with the dynamic pace of our product organization. We value timely insights that can directly influence decision-making, so this opportunity is perfect for those who thrive in a high-ownership, fast-paced environment.Explore the song we created about this role: https://suno.com/s/6utW9leBXUGxAVqBWhat You’ll DoConduct comprehensive qualitative research — including interviews, usability tests, and diary studies — across mobile, web, and platform experiences.Create and distribute surveys that complement qualitative insights, providing necessary validation for research hypotheses.Work collaboratively with Product, Design, and Engineering teams to define research objectives, identify key learning goals, and quickly transform findings into actionable recommendations.Deliver research outcomes promptly to inform product development decisions.
About StravaStrava is the ultimate app for active individuals, boasting a community of over 180 million athletes across more than 185 countries. More than just a fitness tracker, Strava is a platform where people come together to achieve their goals, from forming new habits to breaking personal records. Whether you're into cycling, running, or any other sport, Strava is your go-to app. Join us and embark on your journey with Strava today.Our mission is straightforward: to inspire people to lead their most active lives. We believe in the transformative power of movement to connect and propel individuals forward.About This RoleThe Research & Insights team at Strava is dedicated to guiding user-centered decision-making that drives our business and product innovation. We convert user signals into strategic insights, validating opportunities and creating experiences that empower our users to thrive, ensuring they remain loyal members of the Strava community.We are seeking a Senior UX/Product Researcher to spearhead the comprehensive research program for User Lifecycle. The User Lifecycle team is responsible for the entire Strava membership experience, from the first app download to lifelong community engagement. Your role will involve collaborating with cross-functional teams to conduct impactful research that informs product strategy and ensures decisions are grounded in data-driven insights. You will report directly to the Director of Research and Insights.What You’ll Do:Collaborate with stakeholders and the research team to identify, prioritize, and lead research initiatives that significantly impact company and product strategy, user satisfaction, and overall business outcomes.Lead high-impact research projects while working closely with cross-functional partners, including Design, Product Management, Data Science, Engineering, and Marketing.Identify optimal methods for sharing research insights, establishing trust-based relationships with colleagues to foster collaboration and innovation. You will influence cross-functional partners to act on research findings by providing clear, actionable recommendations and relevant business context.Embrace a diversity of perspectives while maintaining a strong personal viewpoint that effectively cuts through complexity.
Full-time|$166K/yr - $166K/yr|Remote|San Francisco, CA or Remote, US
About DescriptDescript is a pioneering, AI-driven collaborative platform designed to simplify video creation, editing, and sharing for everyone. Our mission is to empower teams to edit videos effortlessly and provide powerful tools for our dedicated video teams.As a product-focused and customer-centric organization, we prioritize craftsmanship, speed, and clarity in thinking. User research is integral to our process, helping us understand our users deeply and enabling teams to cultivate better product intuition, ultimately leading to superior products for our customers.About the RoleWe are seeking a Lead UX Researcher to influence product decisions and establish user research practices throughout Descript.In this senior individual contributor role, you will take ownership of complex problem spaces from inception to conclusion, while also enhancing how teams connect with users by providing improved tools, systems, and practices. This encompasses foundational and ethnographic research, enhancing our methods of synthesizing and disseminating insights, and empowering PMs and designers to conduct high-quality research independently.Descript caters to a diverse array of creators and enterprises, each with unique workflows and objectives. This position is crucial for fostering a deep understanding of our users and ensuring that this understanding consistently informs product decisions.If you are enthusiastic about joining a customer-focused culture, driving impactful research, and amplifying its influence across a product organization, we would love to connect with you.
At World Labs, we are pioneering the development of Large World Models—advanced AI systems designed to comprehend, reason, and engage with the physical environment. Our initiatives are at the cutting-edge of spatial intelligence, robotics, and multimodal AI, with an objective to empower machines to perceive and operate effectively in intricate real-world settings.We are curating a global team of researchers, engineers, and innovators dedicated to transcending the existing boundaries of artificial intelligence. If you are eager to work on transformative technology that will redefine machine perception and enhance human-AI interaction, this role is tailored for you.About World Labs:World Labs is an AI research and development company committed to creating spatially intelligent systems capable of modeling, reasoning, and acting in the real world. We envision a future where AI transcends text or pixels to thrive in three-dimensional, dynamic environments, and we are constructing the foundational models that will make this a reality.Our team unites expertise in machine learning, robotics, computer vision, simulation, and systems engineering. We operate with the agility of a startup combined with the vision of a research lab, tackling long-term challenges that require creativity, rigor, and resilience.Our mission is to develop the most advanced world models and leverage them to empower individuals, industries, and society.Role Overview:We are seeking a Tech Lead for 3D Modeling & Reconstruction to establish technical direction and drive execution for our essential 3D modeling initiatives. This position is suited for someone with a strong background in research science (RS) or research engineering (RE) who has made significant contributions to the field of 3D reconstruction and/or modeling—evidenced by academic publications, widely adopted open-source projects, or large-scale production systems.This is a hands-on leadership role where you will merge profound technical expertise with the ability to guide a high-impact team, influencing both the research roadmap and the production systems that integrate contemporary 3D modeling techniques into tangible products. You will collaborate closely with research, engineering, and product teams to convert innovative concepts into dependable, scalable solutions.
Feb 17, 2026
Sign in to browse more jobs
Create account — see all 4,273 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.