Senior AI Infrastructure Engineer - Model Serving Platform

Scale AISan Francisco, CA; New York, NY

On-site Full-time $216.2K/yr - $270.3K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Key Responsibilities:Develop and sustain fault-tolerant, high-performance systems capable of serving LLM workloads at scale. Create an internal platform to facilitate LLM capability discovery. Collaborate with researchers and engineers to integrate and enhance models for both production and research purposes. Conduct architecture and design reviews to ensure adherence to best practices in system design and scalability. Implement monitoring and observability solutions to guarantee system health and performance. Lead projects from inception to completion in a cross-functional team setting. Preferred Qualifications:5+ years of experience in developing large-scale, high-performance backend systems. Proficient programming skills in languages such as Python, Go, Rust, or C++. Familiarity with LLM serving and routing principles including rate limiting, token streaming, load balancing, and budgeting. Understanding of LLM capabilities and concepts such as reasoning, tool calling, and prompt templates. Experience with containerization and orchestration tools like Docker and Kubernetes. Knowledge of cloud infrastructures such as AWS and GCP, along with infrastructure as code practices (e.g., Terraform). Demonstrated ability to independently tackle complex challenges in a fast-paced environment. Desirable Skills:Experience with contemporary LLM serving frameworks like vLLM, SGLang, TensorRT-LLM, or text-generation-inference.

About the job

The ideal candidate will possess a solid foundation in machine learning principles coupled with extensive experience in backend system architecture. You will thrive in a collaborative environment that bridges research and engineering, working diligently to provide seamless experiences for our customers and accelerating innovation across the organization.

About Scale AI

Scale AI is at the forefront of advancing artificial intelligence technology, providing a robust platform that empowers organizations to harness the power of machine learning. Our commitment to innovation and excellence drives us to create systems that enhance productivity and foster groundbreaking research.

Similar jobs

1 - 20 of 8,694 Jobs

Search for Senior Ai Infrastructure Engineer Model Serving Platform

8,694 results

Select all on this page (20)

Apply

Senior AI Infrastructure Engineer - Model Serving Platform

Scale AI

Full-time|$216.2K/yr - $270.3K/yr|On-site|San Francisco, CA; New York, NY

Join our dynamic Machine Learning Infrastructure team as a Senior AI Infrastructure Engineer, where you will play a pivotal role in designing and constructing platforms that ensure the scalable, reliable, and efficient serving of Large Language Models (LLMs). Our innovative platform supports a range of cutting-edge research and production systems, catering to both internal and external applications across diverse environments.The ideal candidate will possess a solid foundation in machine learning principles coupled with extensive experience in backend system architecture. You will thrive in a collaborative environment that bridges research and engineering, working diligently to provide seamless experiences for our customers and accelerating innovation across the organization.

Mar 26, 2026

Apply

Staff Software Engineer, Model Serving

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle the most challenging problems in the world — from realizing the future of transportation to fast-tracking medical innovations. We accomplish this by developing and operating the premier data and AI infrastructure platform, enabling our customers to harness profound data insights for business enhancement. Our Model Serving product equips organizations with a cohesive, scalable, and governed solution for deploying and managing AI/ML models — ranging from traditional machine learning to intricate proprietary large language models. It ensures real-time, low-latency inference, governance, monitoring, and lineage. As the adoption of AI surges, Model Serving stands as a fundamental component of the Databricks platform, allowing customers to operationalize models at scale with robust SLAs and cost efficiency. In the role of Staff Engineer, you will significantly influence both the product experience and the core infrastructure of Model Serving. Your responsibilities will include designing and constructing systems that facilitate high-throughput, low-latency inference across CPU and GPU workloads, steering architectural strategies, and collaborating extensively with platform, product, infrastructure, and research teams to create an exceptional serving platform.

Jan 30, 2026

Apply

Senior Software Engineer, Model Serving

Databricks

Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging issues of our time—from realizing the future of transportation to speeding up medical innovations. We achieve this by developing and maintaining the premier data and AI infrastructure platform, allowing our clients to leverage profound data insights to enhance their operations. Our Model Serving product equips organizations with a cohesive, scalable, and governed platform for deploying and overseeing AI/ML models, spanning traditional ML to specialized large language models. It provides real-time, low-latency inference, governance, monitoring, and lineage capabilities. With the rapid rise of AI adoption, Model Serving stands as a fundamental component of the Databricks platform, enabling clients to operationalize models efficiently and cost-effectively at scale. As a Senior Engineer, your role will be pivotal in transforming both the product experience and the underlying infrastructure of Model Serving. You will design and create systems enabling high-throughput, low-latency inference across CPU and GPU workloads, influence architectural strategies, and work closely with platform, product, infrastructure, and research teams to deliver an exceptional serving platform.

Jan 30, 2026

Apply

Staff Software Engineer, Foundation Model Serving

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, we are driven by our commitment to empower data teams in tackling the world's most challenging problems — from transforming transportation solutions to accelerating medical advancements. Our mission revolves around constructing and maintaining the world's premier data and AI infrastructure platform, enabling our clients to harness deep data insights for enhanced business outcomes.Foundation Model Serving represents the API product designed for hosting and serving advanced AI model inference, catering to both open-source models like Llama, Qwen, and GPT OSS, as well as proprietary models such as Claude and OpenAI GPT. We welcome engineers who have experience managing high-scale operational systems, including customer-facing APIs, Edge Gateways, or ML Inference services, even if they do not have a background in ML or AI. A passion for developing LLM APIs and runtimes at scale is essential.As a Staff Engineer, you will play a pivotal role in defining both the product experience and the underlying infrastructure. You will be tasked with designing and building systems that facilitate high-throughput, low-latency inference on GPU workloads with cutting-edge models. Your influence will extend to architectural direction, working closely with platform, product, infrastructure, and research teams to deliver an exceptional foundation model API product.The impact you will have:Design and implement core systems and APIs that drive Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.Collaborate with product and engineering leaders to outline the technical roadmap and long-term architecture for workload serving.Make architectural decisions to enhance performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.Contribute directly to critical components within the serving infrastructure, from systems like vLLM and SGLang to developing token-based rate limiters and optimizers, ensuring seamless and efficient operations at scale.Work cross-functionally with product, platform, and research teams to transform customer requirements into dependable and high-performing systems.Establish best practices for code quality, testing, and operational readiness while mentoring fellow engineers through design reviews and technical support.Represent the team in inter-departmental technical discussions, influencing Databricks’ wider AI platform strategy.

Jan 30, 2026

Apply

Senior Manager, Engineering - Model Serving

Databricks

Full-time|$217K/yr - $312.2K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle the most challenging global issues—whether it's transforming transportation or speeding up medical advancements. We achieve this by constructing and managing the world's leading data and AI infrastructure platform, enabling our clients to leverage deep data insights for business enhancement. The Model Serving product at Databricks offers enterprises a cohesive, scalable, and governed platform for deploying and managing AI/ML models—from conventional ML to sophisticated, proprietary large language models. It facilitates real-time, low-latency inference while providing governance, monitoring, and lineage capabilities. As AI adoption surges, Model Serving becomes a central component of the Databricks platform, allowing customers to operationalize models efficiently and cost-effectively. As a Senior Engineering Manager, you will lead a team responsible for both the product experience and the underlying infrastructure of Model Serving. This role involves shaping user-facing features while architecting for scalability, extensibility, and performance across CPU and GPU inference. You will collaborate closely with various teams across the platform, product, infrastructure, and research domains.

Feb 1, 2026

Apply

Lead Engineer for AI Infrastructure in Platform Engineering

Plasmidsaurus

Full-time|On-site|San Francisco

Plasmidsaurus helps scientists worldwide by streamlining sequencing. Researchers from leading institutions and companies rely on this platform daily. With a global network of labs, the company delivers fast, affordable sequencing results, and has recently expanded into RNA-seq to broaden its genomics reach. The team is focused on building a universal sequencing platform designed for efficiency and global scale. Role overview The Lead Engineer for AI Infrastructure in Platform Engineering sets both technical direction and management strategy for the company’s compute, data, AI, and security infrastructure. This position oversees the entire sequencing operation, from laboratory devices to data delivery. What you will do Oversee core services that coordinate laboratory devices, including robots, sequencers, and on-premises Linux servers, as well as the data ingestion pipeline. Develop cloud infrastructure and data pipelines for storing, processing, and delivering terabytes of sequencing data. Design systems to manage millions of bioinformatics tasks, handling queue management, workflow orchestration, and scheduling. Build AI infrastructure and internal tools to support autonomous systems, including: Quality Scientist Agents: Monitor operations, detect anomalies, and escalate quality or reliability concerns. Logistics Agents: Coordinate global transportation of samples to labs and carriers. Bioinformatics Coding Agents: Run adaptive analyses on varied sample types with different data distributions. Culture The team values initiative and a strong sense of ownership. High agency and responsibility shape how work gets done at Plasmidsaurus.

Apr 28, 2026

Apply

Lead Software Engineer for Model Serving Platform

Sciforium

Full-time|On-site|San Francisco

At Sciforium, we're at the forefront of AI infrastructure innovation, dedicated to developing cutting-edge multimodal AI models and a proprietary, high-efficiency model serving platform. With significant multi-million-dollar backing and direct collaboration from AMD, including hands-on support from AMD engineers, our team is rapidly expanding to construct the comprehensive stack that fuels leading-edge AI models and real-time applications.About the RoleJoin us in a unique opportunity to architect and spearhead the development of Sciforium's next-generation model serving platform, the powerhouse that will deliver a multimodal, high-performance foundation model to market. As a senior technical leader, you will not only craft core components but also mentor and guide fellow engineers, shaping engineering direction, standards, and quality of execution.You'll delve into the entire AI stack: from GPU kernels and quantized execution paths to distributed serving, scheduling, and the APIs that drive real-time AI applications. If you relish deep systems work, thrive on ownership, and aspire to lead engineers in constructing foundational AI infrastructure, this role places you at the heart of Sciforium's mission and growth.Your ResponsibilitiesSteer the technical direction of the model serving platform, overseeing architectural decisions and engineering execution.Develop core serving components such as execution runtimes, batching, scheduling, and distributed inference systems.Create high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.Collaborate with ML researchers to transition new multimodal models into production while ensuring low-latency, scalable inference.Construct Python APIs and services that make model capabilities accessible to downstream applications.Mentor and assist other engineers through code reviews, design discussions, and direct technical support.Lead performance profiling, benchmarking, and observability initiatives across the inference stack.Guarantee high reliability and maintainability through rigorous testing, monitoring, and adherence to engineering best practices.Diagnose and resolve intricate issues spanning GPU, runtime, and service layers.

Dec 6, 2025

Apply

Senior AI Infrastructure Engineer - Training Platform

Scale AI

Full-time|On-site|San Francisco, CA; Seattle, WA; New York, NY

Scale AI is seeking a Senior AI Infrastructure Engineer to help build and refine the company’s Training Platform. This position centers on designing, implementing, and improving infrastructure that supports machine learning teams as they train and deploy models. Role overview This engineer will work closely with colleagues across different functions to create solutions that make AI systems more efficient. The focus is on enabling faster, more reliable model training and deployment. Key responsibilities Design and build infrastructure for AI model training Implement and optimize systems to support machine learning workflows Collaborate with teams throughout the company to improve platform capabilities Locations This role is based in San Francisco, Seattle, or New York.

Apr 29, 2026

Apply

Senior Software Engineer - Model Training

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we are at the forefront of enabling transformative AI solutions for some of the world's leading companies, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our innovative platform combines cutting-edge AI research, adaptable infrastructure, and developer-friendly tools to facilitate the production of advanced models. Recently, we celebrated our rapid growth with a successful $300M Series E funding round from notable investors like BOND, IVP, Spark Capital, Greylock, and Conviction. We invite you to join our dynamic team and contribute to the evolution of AI product deployment.THE ROLEAs a Senior Software Engineer specializing in Model Training at Baseten, you will play a pivotal role in constructing the infrastructure essential for the large-scale training and fine-tuning of foundational AI models. Your responsibilities will include designing and implementing distributed training systems, optimizing GPU utilization, and establishing scalable pipelines that empower Baseten and our clientele to adapt models with efficiency and reliability. This role demands a high level of technical expertise and hands-on involvement: you will be responsible for critical components of our training stack, collaborate with product and infrastructure teams to identify customer needs, and drive advancements in scalable training infrastructure.EXAMPLE WORK:Training open-source models that surpass GPT-5 capabilities for a leading digital insurerExploring specialized, continuously learning models as the future of AIOverview of our training documentationResearch initiatives we've undertakenRESPONSIBILITIESDesign, construct, and sustain distributed training infrastructures for large foundation modelsDevelop scalable pipelines for fine-tuning and training across diverse GPU/accelerator clustersEnhance training performance through optimization of algorithms and infrastructureCollaborate closely with cross-functional teams to align technical solutions with business objectivesStay abreast of advancements in the field of machine learning and AI to continually improve our training processes

Aug 29, 2025

Apply

Senior AI Infrastructure Engineer

Hyperbolic Labs

Full-time|On-site|San Francisco, CA

Join Our MissionAt Hyperbolic Labs, we are dedicated to democratizing artificial intelligence by eliminating barriers to computing power through our Open-Access AI Cloud. We aggregate global computing resources to provide an innovative GPU marketplace and AI inference service, making AI affordable and accessible for everyone. As pioneers at the crossroads of AI and open-source technology, we envision a future where AI innovation is driven by imagination, not resource limitations. We invite forward-thinking individuals who share our vision of making AI universally accessible, secure, and cost-effective to join us in crafting a platform that empowers innovators to realize their groundbreaking AI projects.As we gear up for expansion following our Series A funding, our team, led by co-founders with PhDs in AI, Mathematics, and Computer Science, is set to transform the landscape of computing.The RoleWe are on the lookout for a Senior Infrastructure Engineer to drive the development and scaling of Hyperbolic's GPU Cloud Marketplace. In this pivotal role, you will create a multi-tenancy provisioning and virtualization solution that transforms raw GPUs from diverse global suppliers into a programmable, orchestrated resource pool serving thousands of AI developers and researchers. You will work at the forefront of cloud infrastructure, building the core orchestration layer that allows our platform to deliver cost savings of up to 75% compared to traditional cloud providers.

Mar 26, 2026

Apply

Senior Software Engineer - Platform Infrastructure

Discord Inc.

Full-time|$196K/yr - $220.5K/yr|On-site|San Francisco Bay Area

At Discord, we connect over 200 million users monthly for diverse experiences, with gaming being the predominant activity. Our platform supports more than 90% of our users in enjoying games, collectively logging 1.5 billion hours each month across various titles. As we shape the future of gaming, our mission is to enhance interactions before, during, and after gaming sessions.The Platform Infrastructure teams are pivotal in constructing and upholding the essential systems that energize Discord's core functionalities. We manage systems that process hundreds of thousands of requests per second and handle tens of billions of transactions daily, enabling seamless connections for millions of users. By developing foundational platform components, we empower internal developers to deploy new features swiftly and securely, ensuring Discord remains reliable, efficient, and scalable.As a Senior Software Engineer on our team, you will play a crucial role in continuously refining our codebase, processes, and infrastructure, directly impacting user interactions on Discord!

Feb 3, 2026

Apply

Infrastructure and Platform Engineer at pear-vc | San Francisco

Amari AI

Full-time|On-site|San Francisco

About the RoleJoin the innovative team at Known as an Infrastructure and Platform Engineer, where you will take the lead in managing and enhancing our core infrastructure and platform systems. Your work will be crucial in powering AI-driven matching, voice, and scheduling functionalities. You will be responsible for everything from cloud infrastructure and data orchestration to performance monitoring and model deployment support, designing and scaling systems that ensure Known operates swiftly, reliably, and securely.In this pivotal role, you will collaborate closely with the founding team, comprising experts in AI/ML, product development, and design, to establish Known’s technical foundation. You will play a key role in shaping our architecture, engineering culture, and best practices right from the start. This position is perfect for a practical builder who thrives in early-stage environments and is passionate about taking projects from concept to production.

Oct 9, 2025

Apply

Engineering Leader for Platform and Infrastructure

Peregrine Technologies

Full-time|$282K/yr - $363K/yr|On-site|San Francisco, CA

Supported by premier investors from Silicon Valley, Peregrine Technologies empowers public safety organizations, government entities, federal agencies, and private institutions to tackle societal challenges with unmatched speed and precision. Our AI-driven platform transforms isolated and unconnected data into actionable operational intelligence, swiftly surfacing critical information that enables better, faster decision-making, thereby enhancing outcomes at every interaction. Currently, Peregrine serves hundreds of clients across more than 30 states and two countries, impacting over 125 million individuals, and we are poised to extend our influence into enterprise sectors and globally.Our TeamAs a cohesive engineering unit, we firmly believe that empathy enhances our solutions. Observing how users interact with our products is pivotal in guiding us toward the right solutions. Engineers will have the opportunity to collaborate closely with our onsite team to grasp the diverse use cases that Peregrine addresses.We are on the lookout for an Engineering Manager to join our core engineering teams. You will collaborate cross-functionally with design and product management to develop robust, scalable, and user-centered systems. Our teams face a range of challenges, from enabling real-time collaboration on detailed maps to constructing high-scale backend architectures capable of processing billions of data points.We value both ownership and collaboration—you will take full responsibility for significant features while working closely with fellow engineers to drive projects to fruition. We hold that humility and empathy are vital for crafting the right solutions—you will engage directly with our deployment team and users as we iterate to tackle their challenges. Creativity and perseverance are essential in realizing our vision.RoleThis position is central to the strategic execution of Peregrine's platform. You will define how our core systems scale, perform, and evolve as Peregrine continues its rapid growth and strengthens its impact across public safety, government, and enterprise sectors.As a senior platform leader, your role transcends mere system management; you will establish the technical direction, build your team, and create the operational framework that empowers every product team at Peregrine to progress with speed, safety, and assurance. Your contributions will directly influence system reliability and performance.

Mar 18, 2026

Apply

Senior Software Engineer, Infrastructure at Retell AI | San Francisco

Retell AI

Full-time|On-site|San Francisco Bay Area

Join the Revolution at Retell AIRetell AI is pioneering the future of call centers through innovative voice AI, driven by first principles thinking.In just 18 months since our inception, we have empowered thousands of businesses with our AI voice agents, transforming how sales, support, and logistics calls are managed—previously requiring extensive human teams. Supported by prestigious investors such as Y Combinator and Alt Capital, we've rapidly scaled from $5M ARR to an impressive $36M ARR with a compact yet dynamic team of 20.Our ambition for 2026 is to create a revolutionary customer experience platform, where entire contact centers are powered by AI. Moving beyond basic automation, we aim to develop intelligent AI “workers” that serve as frontline agents, QA analysts, and managers, continuously enhancing customer interactions without the need for constant human oversight.As we expand, we are seeking passionate engineers who are eager to solve challenging technical problems, act swiftly, and make a significant impact in one of the fastest-growing voice AI startups. Let’s shape the future together.

Aug 12, 2025

Apply

Senior Software Engineer - Self-Serve Intelligence

Lyft, Inc.

Full-time|$185K/yr - $222K/yr|On-site|San Francisco, CA

Lyft’s Self-Serve Intelligence team builds the systems that help riders and drivers resolve issues on their own. Part of the Safety & Customer Care organization, this group focuses on backend services, APIs, and AI-powered products that let customers get help without waiting for an agent. The team’s work includes AI Assist (such as AI Agents), automations, and self-service workflows, all designed to make support fast and reliable. Role overview As a Senior Software Engineer on this team, the main responsibility is to design, build, deploy, and maintain backend systems and AI-driven tools that handle customer problems automatically. These solutions use Generative AI and automation to deliver scalable, dependable self-service experiences for millions of Lyft riders and drivers. What you will do Design and develop backend services and APIs for AI-powered self-service products Build and maintain AI Agents and automation tools that resolve customer issues without agent involvement Oversee the full development lifecycle: system design, prototyping, deployment, and ongoing operations Work closely with product managers, designers, data scientists, and operations teams to deliver robust solutions Focus on reliability, scalability, and operational excellence in all systems Location This role is based in San Francisco, CA.

Apr 17, 2026

Apply

Senior Software Engineer, Infrastructure & Platform

AfterQuery

Full-time|On-site|San Francisco

Senior Software Engineer, Infrastructure & PlatformRole OverviewIn the role of Senior Software Engineer, Infrastructure & Platform at AfterQuery, you will take on the exciting challenge of designing and constructing the essential infrastructure that drives our innovative data generation, evaluation, and agentic systems.Your responsibilities will include developing shared platforms that empower our engineering and research teams to execute large-scale human-in-the-loop workflows, evaluation harnesses, and automated data pipelines essential for training cutting-edge AI models.This position demands a high level of technical expertise and offers extensive ownership. You will be responsible for architecting and building the foundational infrastructure relied upon by numerous engineers, ensuring that systems are scalable, reliable, and capable of handling high-throughput workloads.Collaboration with the founding team will be key as you define system architecture, establish best engineering practices, and create the infrastructure that supports the evolution of AI development.

Mar 8, 2026

Apply

AI Platform Engineer - Infrastructure at Brain Co. | San Francisco Bay Area

Brain Co.

Full-time|On-site|San Francisco Bay Area

About Brain Co.At Brain Co., we are at the forefront of artificial intelligence, developing innovative systems that facilitate mission-critical operations for some of the world's leading institutions. Our cutting-edge platform operates in high-security, high-stakes environments, where reliability, performance, and robust engineering practices are paramount.As an AI Platform Engineer specializing in Infrastructure, you will be instrumental in building and scaling the foundational platform that supports AI systems used in essential sectors, including government, energy, and healthcare. You will work within dynamic environments that span both cloud and on-premises settings, directly influencing our platform's reliability and performance, ensuring we meet the high standards required by our clients.This role is pivotal within our Infrastructure/Platform team. You will collaborate closely with engineering, AI/ML, and product teams to design scalable architectures, enhance our environments, optimize deployment processes, and guarantee the robustness necessary for enterprise and sovereign applications.

Sep 29, 2025

Apply

Infrastructure & Platform Engineer

mlabs

Full-time|$185K/yr - $400K/yr|On-site|San Francisco, California, United States

Join Our Team as an Infrastructure & Platform EngineerWe are seeking a talented Infrastructure & Platform Engineer to join our dynamic team at mlabs in San Francisco. As a rapidly growing technology company, we are at the cutting edge of the crypto derivatives market, an industry that generates tens of billions in annual revenue. Our exchange is one of the fastest-growing platforms for crypto derivatives, and we are committed to enhancing our offerings to meet the evolving needs of our users.Your mission will be to develop the next critical feature: Multi-Asset Margin, which will streamline how users post collateral directly on-chain, thus improving trading efficiency. You will work alongside our Infrastructure & Platform team, focusing on designing and managing our high-performance systems that deliver exceptional speed and reliability.Key Responsibilities:Design and implement robust scripts and services that ensure optimal performance in real-time environments.Manage and deploy computing resources and containers for tailored services and integrations.Automate scaling, load balancing, and congestion control for both compute and database layers.Establish and maintain CI/CD pipelines for streamlined deployments and continuous delivery.Monitor and optimize system performance across multiple metrics to enhance throughput and resilience.Develop and maintain indexing and explorer services for fast, real-time data access.Provision and optimize diverse database systems, including time-series, relational, key-value, and in-memory databases.

Mar 5, 2026

Apply

Senior Software Engineer, Managed AI - AI Model Lifecycle

Crusoe

Full-time|$172.4K/yr - $209K/yr|On-site|San Francisco, CA - US

At Crusoe, we are on a mission to accelerate the convergence of energy and intelligence. We are building a powerful engine that enables individuals to innovate boldly with AI, all while upholding principles of scalability, speed, and sustainability.Join us in spearheading the AI revolution through sustainable technology. At Crusoe, you will be at the forefront of meaningful innovation, making a significant impact while collaborating with a team dedicated to shaping the future of responsible, transformative cloud infrastructure.About the Role:As a Senior Software Engineer on the Model Lifecycle team, you will play a pivotal role in developing a managed platform that supports the entire application development lifecycle, with an emphasis on harnessing the power of Machine Learning models, particularly Large Language Models (LLMs).Your Responsibilities:Design and maintain systems for fine-tuning large foundational models (SFT, PEFT, LoRA, adapters), ensuring multi-node orchestration, checkpointing, failure recovery, and cost-effective scaling.Create and manage end-to-end training pipelines for Large Language Models.Implement components for distillation and reinforcement learning pipelines, focusing on preference optimization, policy optimization, and reward modeling.Develop and sustain the core agent execution infrastructure.Implement features for dataset, model, and experiment management, emphasizing versioning, lineage, evaluation, and reproducible fine-tuning.Collaboration and Impact:Collaborate closely with Senior Engineers, Principal Engineers, and various product and platform teams to implement systems abstractions and APIs.Engage in technical discussions surrounding training runtimes, scheduling, storage, and overall model lifecycle management.Bring 4-5+ years of industry experience, demonstrating a strong track record of successfully leading a diverse portfolio of initiatives.Participate in and contribute to the open-source LLM ecosystem.This position involves taking significant ownership of core system components.Your Qualifications:Engineering Fundamentals:Bachelor's degree in Computer Science, Engineering, or a related discipline.Proven experience in software engineering with a focus on AI models and machine learning.

Feb 9, 2026

Apply

Senior Software Engineer, Model Data Platform

Waymo LLC

Full-time|$250K/yr - $334.5K/yr|Hybrid|Mountain View, CA USA; San Francisco, CA USA;

Waymo is a pioneering company in autonomous driving technology, dedicated to becoming the world’s most trusted driver. Originating from the Google Self-Driving Car Project in 2009, Waymo has established the Waymo Driver—The World’s Most Experienced Driver™—with a mission to enhance mobility access and save lives lost in traffic accidents. The Waymo Driver powers our fully autonomous ride-hailing service and can be integrated across various vehicle platforms and applications. Having completed over ten million rider-only trips, our technology has driven more than 100 million miles on public roads and tens of billions in simulations across over 15 U.S. states.The Perception team is responsible for developing systems that learn the spatial-temporal representations and semantic meanings of the environment surrounding our autonomous vehicles (AVs). We collaborate closely with downstream teams to optimize and integrate our work into the Waymo Driver, conduct research to solve real-world challenges, and work alongside research teams at Alphabet. With access to millions of miles of diverse driving data from various sensors, we empower engineers like you to (1) create methods for efficient continuous learning from extensive real-world data, (2) develop scalable models and training methodologies, (3) analyze real-world behaviors to create systems that can navigate complexities, and (4) optimize models for both onboard and offboard hardware.In this hybrid role, you will report to a Technical Lead Manager.

Apr 13, 2026

Create account — see all 8,694 results