Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
The ideal candidate will have a strong background in data science or a related field, with proficiency in statistical analysis and machine learning. A Bachelor's degree is required, and experience with Python or R is preferred. You should possess excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment.
About the job
Join Baseten as a Post-Training Research Scientist, where you will play a vital role in advancing our machine learning capabilities. In this position, you will have the opportunity to conduct innovative research, analyze data, and contribute to the development of cutting-edge technologies. Your work will directly impact our projects and enhance the performance of our models.
About Baseten
Baseten is a leading technology company based in San Francisco, specializing in machine learning and artificial intelligence solutions. We are dedicated to innovation and providing our clients with the tools they need to succeed in a rapidly evolving digital landscape.
Similar jobs
1 - 20 of 6,500 Jobs
Search for Staff Machine Learning Research Engineer Agent Post Training Enterprise Genai
Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA; New York, NY
Artificial Intelligence is increasingly becoming a pivotal element across all sectors of society. At Scale AI, we are committed to accelerating the evolution of AI applications. For nearly a decade, we have been the premier AI data foundry, propelling groundbreaking advancements in areas such as generative AI, defense applications, and autonomous vehicles. Following our recent investment from Meta, we are intensifying our efforts to develop advanced post-training algorithms that are essential for sophisticated agents in enterprises worldwide.The Enterprise ML Research Lab is at the forefront of this AI revolution, leveraging a suite of proprietary research, tools, and resources to support our enterprise clients. As a Staff Machine Learning Research Engineer focusing on Agent Post-training, you will be instrumental in creating our next-generation Agent Reinforcement Learning training platform. Your work will enable the training of top-tier Agents that deliver state-of-the-art results in real-world enterprise applications.You will incorporate cutting-edge research into our training framework, empowering ML Research Engineers on the Enterprise AI team to deploy use cases ranging from next-generation AI cybersecurity firewalls to training foundational healthtech search models. If you are passionate about shaping the future of the GenAI movement, we welcome your application!
Full-time|$252K/yr - $315K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
At Scale AI, we collaborate with leading AI laboratories to supply high-quality data and foster advancements in Generative AI research. We seek innovative Research Scientists and Research Engineers with a strong focus on post-training techniques for Large Language Models (LLMs), including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and reward modeling. This position emphasizes optimizing data curation and evaluation processes to boost LLM performance across text and multimodal formats. In this pivotal role, you will pioneer new methods to enhance the alignment and generalization of extensive generative models. You will work closely with fellow researchers and engineers to establish best practices in data-driven AI development. Additionally, you will collaborate with top foundation model labs, providing critical technical and strategic insights for the evolution of next-generation generative AI models.
OpenAI is hiring a Software Engineer for Post-Training Research in San Francisco. This position centers on improving the performance and capabilities of advanced machine learning models after their initial training phase. Role overview Work closely with a skilled team to explore new ways of strengthening AI systems. The focus is on researching and developing methods that push the boundaries of what these models can achieve once training is complete. Collaboration Expect to contribute to ongoing research efforts and share insights with colleagues who are passionate about advancing AI. Teamwork and knowledge exchange are key parts of this role. Location This position is based in San Francisco.
Full-time|$275K/yr - $350K/yr|On-site|San Francisco, CA; Seattle, WA; New York, NY
About Scale AI At Scale AI, we are dedicated to propelling the advancement of AI applications. Over the past eight years, we have established ourselves as the premier AI data foundry, supporting groundbreaking innovations in fields such as generative AI, defense technologies, and autonomous vehicles. Following our recent Series F funding round, we are intensifying our efforts to harness frontier data, paving the way toward achieving Artificial General Intelligence (AGI). Our work with enterprise clients and governments has enhanced our model evaluation capabilities, allowing us to expand our offerings for both public and private evaluations. About the ACE Team The Agent Capabilities & Environments (ACE) team, a vital part of Scale’s Research organization, unites customer-focused Researchers and Applied AI Engineers. Our primary mission is to conduct research on agent environments and reinforcement learning reward signals, benchmark autonomous agent performance in real-world contexts, and develop robust data programs aimed at enhancing the capabilities of Large Language Models (LLMs). We are committed to creating foundational tools and frameworks for evaluating models as agents, focusing on autonomous agents that interact dynamically with a wide range of external environments, including code repositories and GUI interfaces. About This Role This position sits at the cutting edge of AI research and its practical applications, concentrating on the data types necessary for the development of state-of-the-art agents, including browser and software engineering agents. The ideal candidate will investigate the data landscape required to propel intelligent and adaptable AI agents, steering the data strategy at Scale to foster innovation. This role demands not only expertise in LLM agents and planning algorithms but also creative problem-solving skills to tackle novel challenges pertaining to data, interaction, and evaluation. You will contribute to influential research publications on agents, collaborate with customer researchers, and partner with the engineering team to transform these advancements into scalable real-world solutions.
Full-time|$350K/yr - $475K/yr|On-site|San Francisco
At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We strive to build a future where everyone has access to the knowledge and tools essential for making AI work effectively for their unique objectives.Our team comprises scientists, engineers, and innovators who have contributed to some of the most widely adopted AI products, including ChatGPT and Character.ai, as well as notable open-weight models like Mistral and popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleThe Post-Training Researcher position is pivotal to our roadmap. It serves as a crucial connection between raw model intelligence and a system that is genuinely beneficial, safe, and collaborative for human users.This role uniquely combines fundamental research with practical engineering, as we do not differentiate between these functions internally. Candidates will be expected to produce high-performance code and analyze technical reports. This position is ideal for individuals who relish both deep theoretical inquiry and hands-on experimentation, aiming to influence the foundational aspects of AI learning.Note: This position is classified as an 'evergreen role', meaning we continuously accept applications in this research domain. Given the high volume of applications, an immediate match for your skills and experience may not always be available. However, we encourage you to apply; we regularly review submissions and reach out as new opportunities arise. You are welcome to apply again after gaining more experience, but we ask that you refrain from applying more than once every six months. Additionally, specific postings for singular roles may be available for distinct projects or team needs, in which case you are welcome to apply directly in conjunction with this evergreen role.What You’ll DoDevelop and Optimize Recipes: Refine post-training recipes, encompassing various datasets, training stages, and hyperparameters, while assessing their impact on multiple performance metrics.Iterate on Evaluations: Engage in a continuous process of defining evaluation metrics, optimizing them, and recognizing their limitations. You will be accountable for enhancing performance metrics and ensuring they are meaningful.Debug and Analyze: During the fine-tuning of training configurations, you may encounter results that appear inconsistent. You will be responsible for troubleshooting and cultivating a deeper understanding to apply to subsequent challenges.Scale and Investigate: Assess and expand the capabilities of our models while exploring potential improvements.
Role overview OpenAI is looking for a Researcher focused on Agentic Post-Training, based in San Francisco. This role centers on analyzing and improving how AI systems behave after their initial training. The goal is to broaden the capabilities of AI and refine how models respond in complex situations. What you will do Study and assess agentic behaviors in trained AI models Create new approaches to strengthen these behaviors after training Collaborate with a talented team on projects that shape the future of artificial intelligence research Collaboration and impact This position involves hands-on research with other specialists at OpenAI. The work directly supports the advancement of AI capabilities and helps define new benchmarks for agentic performance in artificial intelligence.
Join Baseten as a Post-Training Research Engineer and contribute to groundbreaking advancements in machine learning and AI. In this role, you will leverage your engineering skills to analyze and enhance models post-training, ensuring optimal performance and efficiency.
Full-time|On-site|San Francisco Bay Area (San Mateo) or Boston (Somerville)
About the RoleIn the realm of machine learning, pretraining lays the foundation for a general model, while post-training refines that model, enhancing its utility, controllability, safety, and performance in real-world applications. As a Post-Training Research Scientist, you will transform large pretrained robot models into production-ready systems through methodologies such as fine-tuning, reinforcement learning, steering, human feedback, task specialization, evaluation, and on-robot validation at scale. This position offers a unique opportunity for individuals from diverse backgrounds to evolve into full-stack ML roboticists, adept at swiftly identifying challenges across machine learning and control domains. This is where innovative research converges with practical implementation.Your Responsibilities Include:Crafting fine-tuning and adaptation strategies tailored for specific robotic tasks and embodiments.Developing methodologies to enhance reliability, robustness, and controllability of robotic systems.Establishing evaluation frameworks to assess real-world robot performance beyond just offline metrics.Collaborating with ML infrastructure teams to optimize inference-time performance, including latency, stability, and memory usage.Utilizing advanced techniques such as imitation learning, reinforcement learning, distillation, synthetic data, and curriculum learning.Bridging the gap between model outputs and tangible outcomes in the physical world.You Might Excel in This Role If You:Possess experience in fine-tuning large models for downstream applications, including RLHF, imitation learning, reinforcement learning, distillation, and domain adaptation.Have a background in embodied AI, robotics, or real-world machine learning systems.Demonstrate a strong commitment to evaluation, benchmarking, and failure analysis.Are comfortable troubleshooting and debugging across the entire ML stack, from analyzing loss curves to understanding robot behavior.Enjoy rapid iteration and thrive on real-world feedback loops.Aspire to connect foundational models with practical deployment scenarios.About GeneralistAt Generalist, we are dedicated to realizing the vision of general-purpose robots. We envision a future where industries and homes benefit from collaborative interactions between humans and machines, enabling us to achieve more than ever before. Our focus is on building embodied foundation models, starting with dexterity, and advancing the frontiers of data, models, and hardware to empower robots to intelligently engage with their environments.
OverviewPluralis Research is at the forefront of Protocol Learning, innovating a decentralized approach to train and deploy AI models that democratizes access beyond just well-funded corporations. By aggregating computational resources from diverse participants, we incentivize collaboration while safeguarding against centralized control of model weights, paving the way for a truly open and cooperative environment for advanced AI.We are seeking a talented Machine Learning Training Platform Engineer to design, develop, and scale the core infrastructure that powers our decentralized ML training platform. In this role, you will have ownership over essential systems including infrastructure orchestration, distributed computing, and service integration, facilitating ongoing experimentation and large-scale model training.ResponsibilitiesMulti-Cloud Infrastructure: Create resource management systems that provision and orchestrate computing resources across AWS, GCP, and Azure using infrastructure-as-code tools like Pulumi or Terraform. Manage dynamic scaling, state synchronization, and concurrent operations across hundreds of diverse nodes.Distributed Training Systems: Design fault-tolerant infrastructure for distributed machine learning, including GPU clusters, NVIDIA runtime, S3 checkpointing, large dataset management and streaming, health monitoring, and resilient retry strategies.Real-World Networking: Develop systems that simulate and manage real-world network conditions—such as bandwidth shaping, latency injection, and packet loss—while accommodating dynamic node churn and ensuring efficient data flow across workers with varying connectivity, as our training occurs on consumer nodes and non-co-located infrastructure.
Join Baseten as a Post-Training Research Scientist, where you will play a vital role in advancing our machine learning capabilities. In this position, you will have the opportunity to conduct innovative research, analyze data, and contribute to the development of cutting-edge technologies. Your work will directly impact our projects and enhance the performance of our models.
Why Join Achira?Become part of an elite team comprising scientists, machine learning researchers, and engineers dedicated to transforming the predictability of the physical microcosm and revolutionizing drug discovery.Explore uncharted territories: we are on a mission to innovate next-generation model architectures that merge AI with chemistry.Engage in large-scale operations: harness massive computational resources, extensive datasets, and ambitious objectives.Take ownership of significant projects from inception to deployment on large-scale infrastructures.Thrive in a culture that values precision, speed, execution, and a proactive mindset.About the PositionAt Achira, we are committed to developing state-of-the-art foundation models that tackle the most complex challenges in simulation for drug discovery and beyond. Our atomistic foundation simulation models (FSMs) serve as world models of the physical microcosm, incorporating machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and various generative models.We are seeking a Machine Learning Research Engineer (MLRE) who excels at the intersection of advanced machine learning and rigorous research methodologies. Collaborate closely with our research scientists to design and enhance intelligent training systems that propel us beyond contemporary architectures into a new era of ML-driven molecular modeling.Your mission is clear yet ambitious: to establish the foundational frameworks for training atomistic simulation models at scale. This entails a deep dive into architecture, data, optimizers, losses, training metrics, and representation learning, all while constructing high-performance systems that maximize the potential of our models. In this role, you will be instrumental in creating a blueprint for pretraining FSMs similar to today’s large-scale generative AI systems, making a significant impact on drug discovery.At Achira, you will have the chance to pioneer models that comprehend and simulate the physical world at an atomic level, achieving unprecedented speed and accuracy.
Full-time|$350K/yr - $475K/yr|On-site|San Francisco
At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We envision a future where everyone can harness the knowledge and tools necessary for AI to serve their unique needs and aspirations. Our team comprises scientists, engineers, and builders who have developed some of the most widely utilized AI products, such as ChatGPT and Character.ai, as well as open-weight models like Mistral and popular open-source projects including PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleThe role of a Post-Training Researcher is pivotal to our strategic vision. This position serves as the essential link between raw model intelligence and a practical, safe, and collaborative system for human users.Our research in post-training data sits at the intersection of human insights and machine learning. By integrating human and synthetic data techniques alongside innovative methodologies, we capture the subtleties of human behavior to inform and guide our models. We investigate and model the mechanisms that derive value for individuals, enabling us to articulate, predict, and enhance human preferences, behaviors, and satisfaction. Our objective is to translate research concepts into actionable data through meticulously planned data labeling and collection initiatives, while also understanding the science behind high-quality data that effectively trains our models. Additionally, we develop and assess quantitative metrics to evaluate the success and impact of our data and training strategies.Beyond execution, we explore new paradigms for human-AI interaction and scalable oversight, experimenting with optimal ways for humans to supervise, guide, and collaborate with models. This interdisciplinary role merges research, data operations, and technical implementation, pushing the boundaries of aligned, human-centered AI systems.This position combines foundational research and practical engineering, as we do not differentiate between these roles internally. You will be expected to write high-performance code and comprehend technical reports. This role is perfect for individuals who thrive on deep theoretical exploration and hands-on experimentation, eager to shape the foundational aspects of AI learning.Note: This is an evergreen role that we maintain continuously to express interest in this research area. We receive a high volume of applications, and while there may not always be an immediate fit for your skills and experience, we encourage you to apply. We regularly review applications and reach out to candidates as new opportunities arise. You are welcome to reapply after gaining more experience, but please limit applications to once every six months. You may also notice postings for specific roles for targeted positions.
Full-time|$218.4K/yr - $273K/yr|On-site|San Francisco, CA; New York, NY
As AI continues to play a crucial role across various sectors, Scale AI is committed to accelerating the evolution of AI applications. For nearly a decade, we have been at the forefront of AI data solutions, driving significant innovations such as generative AI, defense technologies, and autonomous vehicles. With recent funding from Meta, we are intensifying our efforts to develop cutting-edge post-training algorithms essential for enhancing the performance of complex enterprise agents globally. The Enterprise ML Research Lab is at the forefront of this AI transformation. Our team is dedicated to creating a suite of proprietary research and resources tailored for our enterprise clientele. As a Machine Learning Systems Research Engineer, you will play a pivotal role in developing algorithms for our next-generation Agent Reinforcement Learning (RL) training platform, support large-scale training operations, and integrate state-of-the-art technologies to optimize our machine learning systems. You will collaborate with other ML Research Engineers and AI Architects on the Enterprise AI team to apply these training algorithms to various client use cases, from next-gen AI cybersecurity firewalls to foundational healthtech search models. If you are passionate about shaping the future of AI, we want to hear from you!
About the TeamJoin the innovative Post-Training team at OpenAI, where we focus on refining and elevating pre-trained models for deployment in ChatGPT, our API, and future products. Collaborating closely with various research and product teams, we conduct crucial research that prepares our models for real-world deployment to millions of users, ensuring they are safe, efficient, and reliable.About the RoleAs a Research Engineer / Scientist, you will spearhead the research and development of enhancements to our models. Our work intersects reinforcement learning and product development, aiming to create cutting-edge solutions.We seek passionate individuals with robust machine learning engineering skills and research experience, particularly with innovative and powerful models. The ideal candidate will be driven by a commitment to product-oriented research.This position is located in San Francisco, CA, and follows a hybrid work model requiring three days in the office each week. Relocation assistance is available for new employees.In this role, you will:Lead and execute a research agenda aimed at enhancing model capabilities and performance.Work collaboratively with research and product teams to empower customers to optimize their models.Develop robust evaluation frameworks to monitor and assess modeling advancements.Design, implement, test, and debug code across our research stack.You may excel in this role if you:Possess a deep understanding of machine learning and its applications.Have experience with relevant models and methodologies for evaluating model improvements.Are adept at navigating large ML codebases for debugging purposes.Thrive in a fast-paced and technically intricate environment.About OpenAIOpenAI is a pioneering AI research and deployment organization dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We are committed to pushing the boundaries of AI capabilities while prioritizing safety and human-centric values in our products. Our mission is to embrace diverse perspectives, voices, and experiences that represent the full spectrum of humanity, as we strive for a future where AI is a powerful ally for everyone.
Join Cartesia: Pioneering AI InnovationAt Cartesia, we are on a mission to redefine the landscape of artificial intelligence. Our goal is to create the next generation of AI that is interactive, ubiquitous, and capable of continuous reasoning across vast streams of audio, video, and text data. With an impressive foundation built on our pioneering work in State Space Models (SSMs) at the Stanford AI Lab, our team is uniquely positioned to advance model architectures that will make on-device reasoning a reality.Backed by prominent investors like Index Ventures and Lightspeed Venture Partners, along with a network of 90+ advisors, including top experts in AI, we are committed to pushing the boundaries of model innovation and systems engineering.About the RoleWe believe that the next significant advancement in model intelligence will stem from enhanced post-training methods and alignment strategies. As a Post-Training Researcher, you will be at the forefront of developing systems and methodologies that ensure our multimodal models are not just adaptive, but also aligned with human intentions.In this role, you will collaborate across machine learning research, alignment, and infrastructure, crafting innovative techniques for preference optimization, model evaluation, and feedback-driven learning. You will investigate how feedback signals can enhance reasoning capabilities across various modalities while establishing the necessary infrastructure to scale and improve these processes.Your contributions will be pivotal in shaping the learning and improvement trajectory of Cartesia’s foundational models, ultimately enhancing their connection with users.Your ImpactLead research initiatives aimed at enhancing the capabilities and alignment of multimodal models.Create cutting-edge post-training methods and evaluation frameworks to assess model advancements.Collaborate closely with research, product, and platform teams to establish best practices for specialized model development.Design, debug, and scale experimental systems to ensure reliability and reproducibility throughout training cycles.Convert research insights into production-ready systems that enhance model reasoning, consistency, and alignment with human values.
Full-time|$116K/yr - $170K/yr|Hybrid|Cambridge, MA USA; San Francisco, CA USA
Your Role at Lila SciencesWe are in search of a talented Machine Learning Research Engineer with a focus on LLM post-training. In this pivotal role, you will architect and oversee large-scale training systems, enhance the performance of extensive models, and incorporate state-of-the-art methodologies to boost efficiency and throughput.Key ResponsibilitiesDevelop Ray-based distributed training infrastructure for LLMs and multi-modal models.Implement performance optimizations for large-scale model training, including training and optimization workflows such as SFT, MoE, and long-context scaling.Manage the orchestration of leading-edge and open-source LLMs alongside intricate compute-intensive tools.Create scalable pipelines for data preprocessing and experiment orchestration, utilizing tools for efficient data loading, pipeline parallelism, and optimizer tuning.Establish system-level performance benchmarks and debugging utilities.
Full-time|$250K/yr - $450K/yr|On-site|San Francisco
About AfterQuery AfterQuery builds training data and evaluation frameworks used by leading AI labs around the world. The team partners with advanced research groups to create high-quality datasets and run detailed evaluations that go beyond standard benchmarks. As a small, post-Series A company based in San Francisco, every team member plays a key role in shaping how future AI models learn and improve. Role Overview The Post-Training Research Scientist focuses on proving the impact of AfterQuery's datasets. This work involves designing and running training experiments to isolate how specific data influences model performance. Projects span Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) post-training, with an emphasis on measuring effects on capability, generalization, and alignment. Working closely with partner labs, the scientist turns data into clear, verifiable results: showing exactly how a dataset leads to measurable improvements under defined conditions. The work is experimental and directly shapes the value of AfterQuery's products. What You Will Do Run controlled SFT and RL experiments to measure how datasets affect model outcomes. Quantify gains in areas like reasoning, tool use, long-horizon tasks, and specialized workflows. Share findings with partner labs to support sales and demonstrate value. Work with internal subject matter experts to improve data quality based on experimental results. What We Look For Strong background in LLM training and evaluation methods. Curiosity about how data structure, selection, and quality shape model behavior. Skill in designing experiments, executing quickly, and drawing practical insights from complex results. Comfort working across fields such as finance, software engineering, and policy. Focus on real-world implementation, not just theory. Research experience at the undergraduate or master's level is preferred; a PhD is not required. Compensation $250,000 - $450,000 total compensation plus equity
Full-time|$218K/yr - $273K/yr|On-site|San Francisco, CA; New York, NY
At Scale AI, we are at the forefront of the AI revolution, providing the essential data infrastructure that empowers organizations to create and implement robust AI applications. Our partnerships with top enterprises and government entities accelerate their AI goals through innovative data annotation platforms, generative AI solutions, and comprehensive enterprise AI capabilities.Discover the General Agents TeamThe General Agents team, an integral part of Scale's Enterprise division, is dedicated to developing advanced general agents tailored for diverse customer applications. We operate at the cutting edge of agent technology, transforming sophisticated reasoning and agentic capabilities into dependable, production-ready systems that deliver substantial economic benefits. Our agents are designed for scalability, focusing on recurring enterprise challenges, with a strong emphasis on generalization, extensibility, and widespread deployment.Your Impact in This RoleAs a Senior/Staff Machine Learning Engineer on the General Agents team, you will be pivotal in architecting, building, and deploying production-grade AI agents that address significant enterprise challenges. Your role will encompass the entire agent lifecycle—from system design and model evaluation to deployment and iterative refinement—effectively merging cutting-edge agent techniques with the practicalities of real-world customer settings.You will:Create and implement comprehensive agent systems that integrate LLM reasoning, memory, tool usage, and control logic to tackle recurring enterprise challenges.Develop scalable and reliable agent architectures that can adapt to a variety of customer data and tools.Establish evaluation frameworks, datasets, environments, and metrics to assess agent performance, reliability, and business outcomes in live settings.Collaborate with product managers, clients, data annotators, and engineering teams to translate enterprise needs into robust agent designs.Transition cutting-edge agent techniques (e.g., planning, multi-step reasoning, tool utilization, multi-agent collaboration) into maintainable and observable systems.Oversee the deployment, monitoring, and iterative enhancement of agent systems, including failure analysis and continuous improvement based on actual usage.Guide the technical direction and architectural practices for general agent development, with increased scope and leadership at the Staff level.
OverviewPluralis Research is at the forefront of innovation in Protocol Learning, specializing in the collaborative training of foundational models. Our approach ensures that no single participant ever has or can obtain a complete version of the model. This initiative aims to create community-driven, collectively owned frontier models that operate on self-sustaining economic principles.We are seeking experienced Senior or Staff Machine Learning Engineers with over 5 years of expertise in distributed systems and large-scale machine learning training. In this role, you will design and implement a groundbreaking substrate for training distributed ML models that function effectively over consumer-grade internet connections.
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).
Jan 30, 2026
Sign in to browse more jobs
Create account — see all 6,500 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.