Machine Learning Engineer - Post-Training and Evaluation

Reflection AISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Qualifications:Hands-on experience in applied machine learning, particularly with fine-tuning language models. You should have experience preparing datasets, executing training loops, evaluating outcomes, and deploying fine-tuned models. Familiarity with techniques like SFT, DPO, RLHF, or similar methodologies is required. A solid understanding of evaluation methodologies: the ability to design evaluations, interpret training graphs, and assess model performance accurately.

About the job

Role overview

This Machine Learning Engineer position focuses on post-training and evaluation within the Applied AI group in San Francisco. The main responsibility is to fine-tune and evaluate Reflection AI’s open-weight models for enterprise customers, adapting them to specific domains and tasks using real customer data. The work covers the entire process: preparing and cleaning datasets, running fine-tuning workflows, building evaluation systems, and deploying models into production. Collaboration is central, both with clients to understand their needs and with research colleagues to advance model capabilities.

What you will do

Fine-tune open-weight models for customer use cases, including dataset preparation, configuring training (such as SFT, preference optimization, and reinforcement fine-tuning), and iterating based on evaluation feedback.
Design and maintain evaluation infrastructure: create evaluation suites, curate test sets, set baselines, and measure improvements on key customer tasks.
Prepare training data from raw customer sources by assessing data quality, cleaning and formatting, identifying noisy or adversarial samples, and building reproducible data pipelines.
Troubleshoot training and inference by analyzing loss curves, diagnosing data issues, and identifying problematic training dynamics.
Deploy fine-tuned models in hybrid environments (public cloud, VPC, on-premises) to ensure reliable, high-performance inference in production.
Contribute to developing playbooks, evaluation benchmarks, and best practices for fine-tuning and evaluation as the team’s approach evolves.

Requirements

Hands-on experience in applied machine learning, especially fine-tuning language models. This includes preparing datasets, running training loops, evaluating results, and deploying models. Familiarity with SFT, DPO, RLHF, or related techniques is required.
Strong understanding of evaluation methods, with the ability to design evaluations, interpret training metrics, and accurately assess model performance.

Location

San Francisco

About Reflection AI

Reflection AI is dedicated to pioneering open superintelligence that everyone can access. Our team is composed of AI experts and innovators from top organizations, striving to deliver adaptable AI solutions for diverse sectors.

Similar jobs

1 - 20 of 5,483 Jobs

Search for Machine Learning Evaluation Engineer

5,483 results

Select all on this page (20)

Apply

Machine Learning Evaluations Engineer

Exa

Full-time|On-site|San Francisco, California

At Exa, we are pioneering the next generation of search engines designed for the era of artificial intelligence, starting from the foundational Silicon architecture. Our ambitious indexing operation is unparalleled, allowing us to crawl the vast open web at an extraordinary scale. We harness cutting-edge embedding models to comprehend this data and utilize our high-performance Rust-based vector database alongside a $5M H200 GPU cluster, which powers tens of thousands of machines simultaneously.The Machine Learning (ML) division is central to this mission, focusing on the training of foundational models that enhance search capabilities. Our vision is to create systems capable of swiftly filtering the world’s knowledge to deliver precisely what you need, regardless of the complexity of your inquiry—effectively transforming the web into a robust, searchable database.To achieve this ambitious goal, we must define what constitutes “effective search”. This is where your expertise will play a crucial role.We are seeking a talented Machine Learning Evaluations Engineer to develop and implement our evaluation framework at Exa. This position entails exploring methodologies to assess search engines in a world dominated by large language models (LLMs) and crafting the most thorough, innovative, and impactful evaluation suite. Your decisions will influence the future of search optimization and directly affect the research team’s focus, shaping the company’s strategic direction.

Oct 15, 2025

Apply

Machine Learning Engineer - LLM Evaluations and Observability

gleanwork

Full-time|Remote|San Francisco Bay Area

Join gleanwork as a Machine Learning Engineer specializing in LLM evaluations and observability. In this role, you will be instrumental in developing cutting-edge machine learning systems that enhance our understanding and effectiveness of language learning models. You will collaborate with cross-functional teams to drive the integration of advanced analytics and machine learning solutions.

Mar 16, 2026

Apply

Machine Learning Evaluation Engineer

Reducto

Full-time|On-site|San Francisco Office

Join Reducto as a Machine Learning Evaluation Engineer where you will play a critical role in assessing and enhancing machine learning models. You will collaborate closely with data scientists and engineers to ensure our systems are efficient and accurate, bringing innovative solutions to challenging problems in the machine learning space.

Mar 16, 2026

Apply

Staff Machine Learning Research Scientist - LLM Evaluations

Scale AI

Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY

At Scale AI, we are the premier partner for data and evaluation in the rapidly evolving field of artificial intelligence. Our commitment to advancing the assessment and benchmarking of large language models (LLMs) positions us at the forefront of AI innovation. We are dedicated to creating leading-edge LLM evaluation methodologies that set new benchmarks for model performance. Our research teams collaborate with the top AI laboratories in the industry to provide high-quality data, accelerate progress in generative AI research, and inform what excellence looks like in this domain. As a Staff Machine Learning Research Scientist on our LLM Evals team, you will spearhead the creation of novel evaluation methodologies, metrics, and benchmarks to assess the strengths and weaknesses of cutting-edge LLMs. Your work will shape our internal strategies and influence the broader AI research community, making this role essential for establishing best practices in data-driven AI development.

Mar 26, 2026

Apply

Senior Machine Learning Engineer - Model Evaluations for Public Sector

Scale AI

Full-time|$216.3K/yr - $300.3K/yr|On-site|San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC

Senior Machine Learning Engineer - Model Evaluations for the Public Sector The Public Sector Machine Learning team at Scale AI pioneers the deployment of cutting-edge AI systems, including Large Language Models (LLMs), agentic models, and comprehensive multimodal pipelines, within critical government operations. We establish robust evaluation frameworks that ensure these models function reliably, safely, and effectively in real-world scenarios. As a Senior Machine Learning Engineer, you will architect, implement, and enhance automated evaluation pipelines that empower our clients to trust and effectively utilize advanced AI systems in defense, intelligence, and federal missions. Your Responsibilities Include: Creating and maintaining automated evaluation pipelines for machine learning models, focusing on functional, performance, robustness, and safety metrics, including evaluations based on LLM judges. Designing test datasets and benchmarks to assess generalization, bias, explainability, and potential failure modes. Building evaluation frameworks for LLM agents, which includes the infrastructure for scenario-based and environment-based testing. Conducting comparative analyses of model architectures, training procedures, and evaluation results. Implementing tools for continuous monitoring, regression testing, and quality assurance of machine learning systems. Designing and executing stress tests and red-teaming workflows to identify vulnerabilities and edge cases. Collaborating with operations teams and subject matter experts to generate high-quality evaluation datasets. This position requires an active security clearance or the ability to obtain one.

Mar 26, 2026

Apply

Machine Learning Engineer

Orchard

Full-time|On-site|San Francisco

Join Orchard as a Machine Learning Engineer and play a pivotal role in transforming data into actionable insights. In this dynamic position, you will leverage your expertise in machine learning algorithms and data analysis to develop innovative solutions that enhance our products and services.We are looking for a proactive team player who thrives in a fast-paced environment and possesses strong problem-solving skills. You will collaborate with cross-functional teams, engage with large datasets, and contribute to the design and implementation of machine learning models.

Mar 14, 2026

Apply

Machine Learning Engineer - Post-Training and Evaluation

Reflection AI

Full-time|On-site|San Francisco

Reflection AI develops open-weight models with the goal of making superintelligence broadly accessible. The team draws on backgrounds from DeepMind, OpenAI, Google Brain, Meta, and Anthropic, and serves a wide range of users including individuals, enterprises, and government organizations. Role overview This Machine Learning Engineer position focuses on post-training and evaluation within the Applied AI group in San Francisco. The main responsibility is to fine-tune and evaluate Reflection AI’s open-weight models for enterprise customers, adapting them to specific domains and tasks using real customer data. The work covers the entire process: preparing and cleaning datasets, running fine-tuning workflows, building evaluation systems, and deploying models into production. Collaboration is central, both with clients to understand their needs and with research colleagues to advance model capabilities. What you will do Fine-tune open-weight models for customer use cases, including dataset preparation, configuring training (such as SFT, preference optimization, and reinforcement fine-tuning), and iterating based on evaluation feedback. Design and maintain evaluation infrastructure: create evaluation suites, curate test sets, set baselines, and measure improvements on key customer tasks. Prepare training data from raw customer sources by assessing data quality, cleaning and formatting, identifying noisy or adversarial samples, and building reproducible data pipelines. Troubleshoot training and inference by analyzing loss curves, diagnosing data issues, and identifying problematic training dynamics. Deploy fine-tuned models in hybrid environments (public cloud, VPC, on-premises) to ensure reliable, high-performance inference in production. Contribute to developing playbooks, evaluation benchmarks, and best practices for fine-tuning and evaluation as the team’s approach evolves. Requirements Hands-on experience in applied machine learning, especially fine-tuning language models. This includes preparing datasets, running training loops, evaluating results, and deploying models. Familiarity with SFT, DPO, RLHF, or related techniques is required. Strong understanding of evaluation methods, with the ability to design evaluations, interpret training metrics, and accurately assess model performance. Location San Francisco

Apr 22, 2026

Apply

Machine Learning Engineer

Hive

Full-time|On-site|San Francisco

Join Our Innovative Team at HiveHive is at the forefront of cloud-based AI solutions, revolutionizing how organizations understand, search for, and generate content. Trusted by many of the world's largest and most groundbreaking companies, we empower developers with premier pre-trained AI models that handle billions of API requests monthly. Our turnkey software applications leverage proprietary AI models and datasets, driving transformative advancements in content moderation, brand protection, sponsorship measurement, and context-based ad targeting.With over $120M in funding from prominent investors like General Catalyst, 8VC, Glynn Capital, Bain & Company, and Visa Ventures, Hive is rapidly expanding. Our dynamic team of over 250 employees operates from our San Francisco, Seattle, and Delhi offices. If you are passionate about shaping the future of AI, we invite you to explore opportunities with us!About the Machine Learning Engineer RoleAs we strive to achieve our ambitious vision, we seek exceptional machine learning engineers to join our team. We are looking for enthusiastic developers who are eager to remain at the cutting edge of deep learning technology, designing and deploying state-of-the-art neural network models into production. Our ideal candidates thrive in working with large-scale datasets and demonstrate a keen interest in mastering new technologies across the machine learning spectrum. We value individuals who are proactive and take ownership of their projects, contributing innovative ideas and practical implementations. Experience in building machine learning applications from the ground up and designing scalable, maintainable data pipelines is essential.

Jan 15, 2021

Apply

Machine Learning Engineer I

Handshake

Full-time|On-site|San Francisco, CA

Join Handshake as a Machine Learning Engineer I, where you will have the opportunity to work on cutting-edge machine learning projects that drive our innovative solutions. Collaborate with a talented team to develop algorithms and models that enhance our product offerings and improve user experiences.

Apr 6, 2026

Apply

Machine Learning Engineer

Boomtrain

Full-time|On-site|San Francisco

Join our dynamic Personalization team at Boomtrain as a Machine Learning Engineer. We are in search of a skilled engineer who will play a pivotal role in developing and enhancing our recommendation systems that cater to a variety of customers.In this role, you will collaborate with a talented team dedicated to designing and implementing innovative models and systems that deliver personalized recommendations. You will have the opportunity to work on complex engineering challenges and contribute to generating hundreds of millions of recommendations daily.This position offers a unique chance to engage in end-to-end project work and make a significant impact on our personalization initiatives.Key Responsibilities:Research and propose advanced recommendation and optimization models to enhance our personalization systems.Develop and maintain offline model generation pipelines.Design and maintain online recommendation serving systems.

Jul 21, 2016

Apply

Machine Learning Engineer

Pulse

Full-time|On-site|San Francisco

OverviewPulse is revolutionizing data infrastructure by addressing the critical challenge of extracting accurate, structured information from complex documents on a large scale. Our innovative approach to document understanding integrates intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.As a dynamic and rapidly growing team of engineers based in San Francisco, we empower Fortune 100 companies, Y Combinator startups, public investment firms, and growth-oriented businesses. With the backing of top-tier investors, we are on an exciting growth trajectory.What sets our technology apart is our cutting-edge multi-stage architecture:Layout comprehension with specialized component detection modelsLow-latency OCR models designed for targeted data extractionAdvanced algorithms for determining reading order in complex formatsProprietary table structure recognition and parsing capabilitiesFine-tuned vision-language models for interpreting charts, tables, and figuresIf you are passionate about the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will directly influence our customers and shape the future of document intelligence.

Jul 30, 2025

Apply

Associate Machine Learning Engineer

Handshake

Full-time|On-site|San Francisco, CA

Join Handshake as an Associate Machine Learning Engineer and embark on an exciting journey in the world of artificial intelligence and machine learning. In this role, you will collaborate with a talented team to develop innovative solutions that leverage cutting-edge technologies. You'll have the opportunity to contribute to real-world projects, enhancing your skills while driving impactful results.

Apr 2, 2026

Apply

Tech Lead/Manager, Machine Learning Research Scientist - LLM Evaluations

Scale AI, Inc.

Full-time|$280K/yr - $380K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY

As a premier data and evaluation partner for cutting-edge AI firms, Scale AI is committed to enhancing the evaluation and benchmarking of large language models (LLMs). We are developing industry-leading LLM evaluations that set new benchmarks for model performance assessment. Our mission is to create rigorous, scalable, and equitable evaluation methodologies that propel the next evolution of AI capabilities.Our Research teams collaborate with top AI laboratories to provide high-quality data and expedite advancements in Generative AI research. As the Tech Lead/Manager of the LLM Evaluations Research team, you will guide a skilled team of research scientists and engineers dedicated to crafting and applying innovative evaluation methodologies, metrics, and benchmarks that assess the strengths and weaknesses of our advanced LLMs. This pivotal role involves designing and executing a strategic roadmap that establishes best practices in data-driven AI development, thus accelerating the development of the next generation of generative AI models in collaboration with leading foundational model labs.

Mar 26, 2026

Apply

Reinforcement Learning Environment Engineer

AfterQuery

Full-time|On-site|San Francisco

About AfterQuery AfterQuery develops training data and evaluation frameworks that leading AI labs use to improve their models. The team partners with major research institutions to build datasets and run assessments that go beyond standard benchmarks. As a post-Series A company based in San Francisco, AfterQuery values contributions from every team member. Work here directly shapes the next generation of AI models. Role Overview The Reinforcement Learning Environment Engineer designs datasets and evaluation systems that influence how advanced AI models learn and improve. This role involves close collaboration with research teams, hands-on experimentation with new data collection methods, and the creation of metrics to track model progress. Work moves from theoretical analysis to practical experiments, feeding directly into large-scale model training efforts. What You Will Do Develop data segments that expose key failure modes in sectors such as finance, software engineering, and enterprise operations. Refine reward signals for Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Value Reinforcement (RLVR) systems. Define quantitative metrics for dataset quality, diversity, and their effects on model alignment and capability. Work closely with research teams to translate training objectives into concrete data requirements and evaluation criteria. This position is based in San Francisco.

Apr 14, 2026

Apply

Machine Learning Engineer

Axiom Bio

Full-time|On-site|SF Global HQ

Charter:Join us as a pivotal member of a groundbreaking team dedicated to revolutionizing the field of toxicology by developing advanced AI systems that will replace traditional lab and animal experiments.What We Seek:We are on the lookout for exceptional individuals who can inspire those around them and drive the team towards greatness. Our ideal candidate is someone with high agency—able to identify priorities and take action. We value unique passions and hobbies that may seem niche but reveal a deep commitment and curiosity when explored. Candidates should approach challenges with both intentionality and a sense of wonder, embodying the spirit of exploration akin to an immigrant in a new land or a self-taught coder. A strong desire to learn and grow, coupled with technical excellence and a commitment to mastering one’s craft, is essential. We want those who are willing to tackle daunting challenges and derive satisfaction from the journey as much as the outcome.Your Responsibilities:Establish the foundational end-to-end ML/AI system, including wetlab data generation, data cleaning/processing, model architecture, training, inference, and deployment strategies.Lead innovative research and development initiatives focused on elucidating the interplay between chemistry and biology.Design and scale large models that are pretrained on paired chemistry and biological imagery.Conduct applied research aimed at optimizing, aggregating, and pooling embeddings.Become a thought leader in emerging and underexplored domains, such as molecular graph representations and generative diffusion for biological applications.Develop entrepreneurial skills alongside engineering expertise by creating impactful solutions that deliver substantial value for scientists.Deliver outstanding technology and products that redefine industry standards.Preferred Attributes:...

Nov 14, 2025

Apply

Machine Learning Engineer

UnitX Labs

Full-time|On-site|HQ

Position: Machine Learning EngineerAbout Us:At UnitX, we are pioneering the development of cutting-edge physical AI systems designed to automate repetitive visual tasks within manufacturing environments. Our dynamic startup thrives on a diverse team of experts from renowned institutions such as Stanford, MIT, and Google. To date, we have successfully implemented over 1,000 mission-critical AI systems across more than 190 of the world's top manufacturing production lines. Annually, our AI inspection systems oversee the quality of products valued at $15 billion.Join us for a unique opportunity to contribute to groundbreaking computer vision technologies that are transforming global manufacturing efficiency.Your Responsibilities:Design and implement innovative algorithms to analyze raw sensor data for defect detection, focusing on pixel-level precision in high-resolution image and 3D data segmentation.Develop robust software solutions that operate continuously on production lines, executing our algorithms in real-time with decision-making latency under 20ms.Create metrics and tools for comprehensive model performance evaluation, enhancing system visibility and interpretability.Research and explore novel methodologies, pushing the boundaries of AI technology, including Stable Diffusion and SAM, to deliver critical applications in manufacturing.Who You Are:Bachelor's degree in Computer Science, Mathematics, Physics, or a related technical discipline, or equivalent experience showcasing solid mathematical foundations.A minimum of 2 years of experience developing machine learning models focused on computer vision applications in production settings.Deep understanding of Deep Learning theories and practical applications, with proficiency in frameworks such as PyTorch or TensorFlow. Strong Python programming skills for creating efficient, maintainable solutions within extensive codebases.Excellent communication and decision-making abilities, able to articulate experimental rationale and judiciously navigate between exploration and exploitation strategies.Demonstrated resilience and adaptability in complex, uncertain environments.Preferred Qualifications:Experience with large-scale data processing and algorithm optimization.Familiarity with tools for machine learning and data visualization.

Apr 3, 2026

Apply

Staff Machine Learning Engineer

Hive

Full-time|On-site|San Francisco

Join Hive as a Senior Machine Learning Engineer and help shape the future of AI! We are seeking passionate individuals who excel at developing and deploying cutting-edge deep learning models. In this role, you will work with large-scale datasets to create innovative machine learning solutions, collaborating closely with a talented team of engineers to push the boundaries of artificial intelligence. Ideal candidates will have a proven track record of building and scaling machine learning projects from conception to production, along with a strong commitment to continuous learning and personal ownership in their work.

Dec 10, 2021

Apply

Machine Learning Engineer - Imitation & Reinforcement Learning for Robotics

Bedrock Robotics

Full-time|On-site|San Francisco, CA

Be Part of the Future of Autonomous RoboticsAt Bedrock Robotics, we are pioneering the transition of AI from theoretical frameworks to practical applications in the built environment. Our team is comprised of seasoned professionals who have been instrumental in the success of innovative companies such as Waymo, Segment, and Uber Freight. We are at the forefront of deploying autonomous technologies in heavy construction machinery, significantly enhancing the efficiency and safety of multi-billion dollar infrastructure projects across the nation.With backing from $350 million in funding, our mission is to address the urgent need for housing, data centers, and manufacturing facilities, while simultaneously responding to the construction industry's labor shortages.This position is where cutting-edge algorithms meet the practical world of construction. You will work alongside industry experts and top-tier engineers to tackle complex real-world challenges that cannot be simulated. If you are eager to leverage advanced technology for impactful problem-solving within a skilled team, we encourage you to apply.

Jan 31, 2026

Apply

Machine Learning Infrastructure Engineer

Specter

Full-time|On-site|San Francisco

Company Overview At Specter, we are pioneering a software-defined "control plane" designed to enhance the real-world perception of physical assets. Our mission begins with safeguarding American businesses by providing them with comprehensive insights into their physical environments.To achieve this, we are developing a robust hardware-software ecosystem leveraging multi-modal wireless mesh sensing technology. This innovation allows us to significantly reduce the cost and time involved in sensor deployment by a factor of ten. Ultimately, our platform aims to serve as the perception engine for businesses, facilitating real-time visibility and autonomous management of their operational perimeters.Our co-founders, Xerxes and Philip, are deeply committed to empowering our partners in the rapidly evolving landscape of physical AI and robotics. We are a dynamic, rapidly expanding team comprised of talent from Anduril, Tesla, Uber, and the U.S. Special Forces.Position Overview Specter is seeking a dedicated Machine Learning Infrastructure Engineer to construct and optimize the ML systems that drive real-time perception and inference capabilities across our edge-cloud platform. This position will involve overseeing the training, deployment, and enhancement of computer vision and sensor fusion models, aimed at enabling autonomous monitoring and decision-making for our clients' physical assets.Key Responsibilities Include:Design and implement scalable ML training pipelines for computer vision applications, including object detection, tracking, classification, and segmentation.Develop efficient model serving infrastructures to facilitate real-time inference on edge devices with limited computational and power resources.Optimize models for deployment on embedded hardware, employing techniques such as quantization, pruning, TensorRT, ONNX, and CoreML.Create continuous training and evaluation systems to enhance model performance through feedback loops derived from production data.Establish data pipelines for the ingestion, labeling, versioning, and management of extensive multi-modal sensor datasets, including video, radar, lidar, and thermal data.Implement model monitoring frameworks, A/B testing methodologies, and performance analytics for deployed perception systems.Collaborate with perception researchers to transition models from research environments to scalable production across thousands of edge nodes.Construct tools and infrastructure for distributed training, hyperparameter optimization, and experiment tracking.

Oct 3, 2025

Apply

Machine Learning Engineering Lead

Saris AI

Full-time|On-site|San Francisco

Saris AI, based in San Francisco with teams in Montreal and Toronto, develops advanced agentic AI systems for the banking industry. The company focuses on automating complex workflows that require long-context reasoning, integration with legacy systems, and strict compliance. With live AI agents already supporting real customer operations, Saris AI is expanding quickly and seeking technical leaders who want to shape the future of work in banking. Role overview This is a hands-on leadership position within the core engineering team in San Francisco. The Machine Learning Engineering Lead will guide machine learning systems from initial concept through scaling, helping define both the technical vision and the supporting infrastructure. What you will do Oversee the ML/AI function end to end, setting technical direction and standards across the company. Design and supervise development of multi-modal, agentic AI systems that power live customer workflows. Build and manage evaluation frameworks, datasets, and metrics to improve agent performance. Drive productionization of ML systems with an emphasis on reliability, scalability, and compliance. Recruit, develop, and mentor a high-performing ML team, fostering strong practices in modeling, experimentation, and deployment. Requirements 8+ years of experience in machine learning or AI engineering, including time as a technical lead or manager. Proven track record leading ML projects from concept to production deployment. Expertise with large language models (LLMs) and/or agentic systems, especially in customer-facing products. Strong grasp of ML fundamentals: deep learning, transformers, model evaluation, and trade-offs. Hands-on experience scaling ML systems in production, with a focus on monitoring, iteration, and reliability. Ability to lead engineering teams, influence architecture, and set technical direction. Comfort working in early-stage, ambiguous, and rapidly changing environments.

Apr 21, 2026

Create account — see all 5,483 results