Staff Technical Lead for Inference & ML Performance

falSan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Manager

Qualifications

The ideal candidate will possess strong leadership skills and a passion for driving technological innovation. We seek individuals who are not only technically adept but also excel in strategic thinking and team collaboration.

About the job

Why This Role Matters

In this role, you will play a crucial part in shaping the future of fal’s inference engine, ensuring that our generative models consistently deliver outstanding performance. Your contributions will directly affect our capacity to swiftly provide innovative creative solutions to a diverse clientele, from individual creators to global brands.

Your Responsibilities

Define and steer the technical direction, guiding your team across various domains including kernels, applied performance, ML compilers, and distributed inference to develop high-performance solutions.

About fal

fal is at the forefront of generative-media infrastructure, dedicated to pushing the limits of model inference performance. By leveraging advanced technologies, we empower creators and brands to deliver seamless and impactful creative experiences.

Similar jobs

1 - 20 of 4,315 Jobs

Search for Technical Program Manager Cloud Inference

4,315 results

Select all on this page (20)

Apply

Technical Program Manager, Cloud Inference

Anthropic

Full-time|On-site|San Francisco, CA | New York City, NY

Role overview Anthropic seeks a Technical Program Manager to support the Cloud Inference team. This position centers on steering technical projects that influence the development of cloud inference solutions. The role is located in either San Francisco, CA or New York City, NY. What you will do Oversee complex initiatives that move Anthropic’s cloud inference technologies forward Collaborate with engineers and partner teams to ensure delivery of dependable solutions Organize and synchronize work across different functions to achieve project objectives and deadlines

Apr 28, 2026

Apply

Engineering Manager for Cloud Inference on AWS

Anthropic

Full-time|Remote|San Francisco, CA | Seattle, WA

Join Anthropic as an Engineering Manager to lead our innovative Cloud Inference team utilizing AWS technologies. In this pivotal role, you will drive efforts to enhance the efficiency and scalability of our cloud systems while ensuring robust performance and reliability. Your leadership will inspire a talented team of engineers to solve complex challenges, implement best practices, and foster a culture of continuous improvement.

Mar 12, 2026

Apply

Inference Technical Lead, Sora

OpenAI

Full-time|Hybrid|San Francisco

Join the Sora Team at OpenAIThe Sora team is at the forefront of developing multimodal capabilities within OpenAI’s foundational models. We are a dynamic blend of research and product development, committed to integrating sophisticated multimodal functionalities into our AI offerings. Our focus is on delivering solutions that are not only reliable and intuitive but also resonate with our mission to foster broad societal benefits.Your Role as Inference Technical LeadWe are seeking a talented GPU Inference Engineer to enhance the model serving efficiency for Sora. This pivotal position will empower you to spearhead initiatives aimed at optimizing inference performance and scalability. You will collaborate closely with our researchers to design and develop models that are optimized for inference, directly contributing to the success of our projects.Your contributions will be vital in advancing the team’s overarching objectives, allowing leadership to concentrate on high-impact initiatives by establishing a robust technical foundation.Key Responsibilities:Enhance model serving, inference performance, and overall system efficiency through focused engineering efforts.Implement optimizations targeting kernel and data movement to boost system throughput and reliability.Collaborate with research and product teams to ensure our models operate effectively at scale.Design, construct, and refine essential serving infrastructure to meet Sora’s growth and reliability demands.You Will Excel in This Role If You:Possess deep knowledge in model performance optimization, particularly at the inference level.Have a strong foundation in kernel-level systems, data movement, and low-level performance tuning.Are passionate about scaling high-performing AI systems that address real-world, multimodal challenges.Thrive in ambiguous situations, setting technical direction, and driving complex projects to fruition.This role is based in San Francisco, CA. We follow a hybrid work model requiring 3 in-office days per week and offer relocation assistance to new hires.

Apr 21, 2025

Apply

Technical Staff Member - Inference Engineering

Inferact

Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

Jan 22, 2026

Apply

Software Engineer for Cloud Inference Safeguards

Anthropic

Full-time|Remote|San Francisco, CA | Seattle, WA

Join our innovative team at Anthropic as a Software Engineer specializing in Cloud Inference Safeguards. In this role, you will play a crucial part in developing and enhancing the systems that ensure the robustness and security of our cloud-based inference services. You will collaborate with cross-functional teams to design, implement, and maintain scalable solutions that meet our high standards for reliability and performance.

Mar 27, 2026

Apply

Staff Technical Program Manager - Infrastructure as a Service

Crusoe

Full-time|On-site|San Francisco, CA - US

Crusoe is looking for a Staff Technical Program Manager focused on Infrastructure as a Service (IaaS) in San Francisco, CA. This position takes a central role in guiding cross-functional teams as they build and launch technical solutions for the company. Role overview The Staff Technical Program Manager leads efforts across departments, making sure projects move forward smoothly from planning through deployment. Managing schedules, coordinating resources, and keeping all stakeholders informed are key parts of this job. What you will do Direct and align cross-functional teams to deliver IaaS projects Oversee project timelines and resource allocation Maintain clear communication with stakeholders to support company goals Requirements Experience managing technical programs or projects, especially in infrastructure or cloud services Strong organizational and communication skills Ability to coordinate across multiple teams and priorities

Apr 28, 2026

Apply

Staff Technical Lead for Inference & ML Performance

fal

Full-time|On-site|San Francisco

Join fal as we revolutionize the generative-media infrastructure landscape. Our mission is to enhance model inference performance, enabling creative experiences on an unprecedented scale. We are seeking a Staff Technical Lead for Inference & ML Performance, an individual who possesses a unique blend of deep technical knowledge and strategic foresight. In this pivotal role, you will lead a talented team dedicated to building and optimizing cutting-edge inference systems. If you're ready to influence the future of inference performance in a fast-paced and rapidly growing environment, we want to hear from you.Why This Role MattersIn this role, you will play a crucial part in shaping the future of fal’s inference engine, ensuring that our generative models consistently deliver outstanding performance. Your contributions will directly affect our capacity to swiftly provide innovative creative solutions to a diverse clientele, from individual creators to global brands.Your ResponsibilitiesDefine and steer the technical direction, guiding your team across various domains including kernels, applied performance, ML compilers, and distributed inference to develop high-performance solutions.

Oct 29, 2025

Apply

Technical Program Manager

Decagon

Full-time|On-site|San Francisco

Role overview Decagon seeks a Technical Program Manager based in San Francisco to coordinate work across multiple teams and deliver new technical solutions. This position guides projects from initial planning through completion, ensuring schedules stay on track and technical milestones are achieved. What you will do Lead cross-functional teams to meet project goals Track project progress and adjust plans when necessary Share updates and collect feedback from stakeholders Clarify and verify technical requirements throughout each project Help advance Decagon’s strategic programs through strong program management

Apr 24, 2026

Apply

Staff Product Manager, Managed Inference at Crusoe | San Francisco

Crusoe

Full-time|$204K/yr - $247K/yr|On-site|San Francisco, CA - US

At Crusoe, we are on a mission to enhance the availability of energy and intelligence. We are developing the driving force behind a future where individuals can harness the power of AI without compromising on scale, speed, or sustainability.Join the AI revolution with sustainable technology at Crusoe. This is your chance to lead impactful innovations, contribute to meaningful projects, and collaborate with a team dedicated to pioneering responsible and transformative cloud infrastructure.Role Overview:As an integral member of the Crusoe Managed AI Services team, you will oversee the entire product lifecycle for our Managed Inference services. From conceptualization and strategic planning to execution and market introduction, you will be the driving force behind our inference service offerings. Your ability to translate market demands and technical details into succinct product specifications and narratives will be crucial in fostering business growth for Crusoe Cloud.This position is a Staff-level individual contributor role that offers considerable autonomy and influence. You will act as a senior product owner for a pivotal segment of our platform, collaborating closely with engineering, infrastructure, and go-to-market teams to expand and enhance Crusoe’s inference capabilities as the organization evolves.This is a unique opportunity to shape and develop a foundational product area within a rapidly growing and innovative company.Key Responsibilities:Lead the complete product lifecycle for Crusoe’s Managed Inference services, encompassing roadmap creation, execution, and iterative improvements.Convert customer feedback, market insights, and technical limitations into clear product requirements and prioritization strategies.Collaborate effectively with Engineering, Infrastructure, and Platform teams to provide scalable and dependable inference services.Influence product decisions regarding performance, reliability, cost-effectiveness, and user experience for developers.Establish and monitor success metrics for inference services operating in production environments.Work alongside go-to-market teams to facilitate product launches, brand positioning, and customer engagement.Articulate product strategy and decisions clearly to cross-functional partners and leadership.

Dec 24, 2025

Apply

Inference Technical Lead - On-Device Transformers

OpenAI

Full-time|Hybrid|San Francisco

About Our TeamJoin the Future of Computing Research team at OpenAI, an innovative applied research group within the Consumer Devices division. Our mission is to pioneer new methods and models that contribute to our overarching goal of developing Artificial General Intelligence (AGI) for the betterment of humanity.Role OverviewAs the Inference Technical Lead, you will collaborate with world-class machine learning researchers and top-notch design talents to push the boundaries of model capabilities. This position is stationed in San Francisco, CA, offering a hybrid work model that includes 4 days in the office, along with relocation assistance for new hires.Key ResponsibilitiesAssess and select silicon platforms, including GPUs, NPUs, and specialized accelerators, for the deployment of OpenAI models on-device and at the edge.Collaborate closely with research teams to co-design model architectures that satisfy real-world constraints such as latency, memory, power, and bandwidth.Conduct system performance analyses to identify trade-offs in model design, memory hierarchy, compute throughput, and hardware capabilities.Work hand-in-hand with hardware vendors and internal infrastructure teams to launch new accelerators, ensuring efficient execution of transformer workloads.Lead a team of engineers in implementing the low-level inference stack, encompassing kernel development and runtime systems.Navigate challenges to transform emerging research capabilities into scalable solutions.Ideal Candidate ProfileProven experience in evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.Strong understanding of transformer model performance characteristics, including attention mechanisms, KV-cache behaviors, and memory bandwidth requirements.Experience designing or optimizing high-performance computing systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.Background in building or leading teams focused on low-level performance-critical software, including CUDA kernels, compilers, or ML runtimes.Demonstrated ability to thrive in a fast-paced, innovative environment.

Mar 13, 2026

Apply

Technical Staff Member - Machine Learning Systems & Inference

Gimlet Labs

Full-time|On-site|San Francisco

At Gimlet Labs, we are pioneering the development of the first heterogeneous neocloud designed specifically for AI workloads. As the demand for AI systems surges, traditional homogeneous infrastructures face critical limits in power, capacity, and cost. Our innovative platform effectively decouples AI workloads from their hardware foundations, intelligently partitioning tasks and orchestrating them to the most suitable hardware for optimal performance and efficiency. This strategy fosters heterogeneous systems that span multiple vendors and generations, including cutting-edge accelerators, enabling significant enhancements in performance and cost-effectiveness at scale.In addition to this foundational work, Gimlet is establishing a robust neocloud for agentic workloads. Our clients benefit from deploying and managing their workloads via stable, production-ready APIs, without the need to navigate hardware selection or performance optimization intricacies.We collaborate with foundation labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI datacenters.We are currently seeking a Member of Technical Staff specializing in ML systems and inference. In this pivotal role, you will be responsible for designing and constructing inference systems that facilitate the execution of complete models in real production environments. You will operate at the intersection of model architecture and system performance to ensure that inference processes are swift, predictable, and scalable.This position is perfect for engineers with a deep understanding of modern model execution and a passion for optimizing latency, throughput, and memory utilization across the entire inference lifecycle.

Mar 10, 2026

Apply

Technical Program Manager for U.S. Government Programs

Astranis Space Technologies Corp.

Full-time|$150K/yr - $200K/yr|On-site|San Francisco

Astranis Space Technologies Corp. is at the forefront of satellite innovation, crafting advanced satellites designed for high orbits that extend humanity's reach into the cosmos. Our satellites deliver dedicated and secure communications networks to a diverse clientele, including large enterprises, sovereign governments, and the U.S. military. With five satellites successfully in orbit and numerous launches on the horizon, we are addressing a backlog of over $1 billion in commercial contracts.As a trusted partner in satellite communications, we cater to clients with rigorous demands for reliability, data security, network visibility, and tailored solutions. Backed by over $750 million in funding from leading investors such as Andreessen Horowitz, Blackrock, and Fidelity, our team of 450 engineers and entrepreneurs operates from a 153,000 sq. ft. state-of-the-art headquarters in Northern California.Technical Program Manager, USG ProgramsAs a Technical Program Manager focused on U.S. Government programs, you will be pivotal in steering the execution of Astranis's satellite initiatives from initial concept to final delivery. Reporting directly to the Director of USG Programs, you will be responsible for program outcomes related to schedule, budget, risk management, and customer engagement. This role demands a combination of programmatic leadership and technical expertise, engaging in hands-on problem solving while maintaining strategic oversight. You will conduct trade studies to enhance mission system architectures, facilitate coordination across spacecraft and ground segments, and ensure all technical, contractual, and mission requirements are fulfilled. Additionally, you will help refine and scale our USG program execution frameworks and cross-functional processes as our portfolio grows.Key ResponsibilitiesOversee government-related program execution from inception to completion, encompassing small internal R&D projects to comprehensive multi-spacecraft missions.Establish, manage, and communicate program schedules, budgets, and resource allocations to guarantee successful project delivery.Lead the risk management process, including documentation, tracking of mitigation strategies, and approval of risk closures.Ensure that programs meet all technical and programmatic specifications by monitoring and reporting key performance indicators.Conduct regular internal and external program reviews, emphasizing schedule trends, risks, and progress against significant milestones.Act as the primary liaison for government programs, leading design reviews, reporting, and milestone readiness assessments.Collaborate with engineering, operations, and mission assurance teams to ensure alignment on requirements, interfaces, and design development.

Mar 10, 2026

Apply

Technical Program Manager Lead

Motive

Full-time|$200K/yr - $240K/yr|Hybrid|Hybrid - San Francisco, California

About Us:Motive empowers physical operations with innovative tools designed to enhance safety, productivity, and profitability. For the first time, safety, operations, and finance teams can manage their drivers, vehicles, equipment, and fleet expenditures through a unified system. With cutting-edge AI technology, the Motive platform delivers comprehensive visibility and control, significantly minimizing manual workloads through automation and task simplification.Serving nearly 100,000 clients, from Fortune 500 corporations to small businesses, Motive operates across industries such as transportation and logistics, construction, energy, field services, manufacturing, agriculture, food and beverage, retail, and the public sector.Discover more at gomotive.com.Role Overview:We are looking for a dynamic leader to establish and steer our Technical Program Management team. This strategic, high-impact position will be pivotal in shaping our execution across the engineering domain as we create state-of-the-art custom hardware and software solutions. You will lead a team of Technical Program Managers—experts in embedded software and feature integration—who facilitate the daily execution of our Connected Devices Engineering initiatives.This role requires exceptional cross-functional leadership, acting as a vital link between engineering, product, customer-facing, and business stakeholders. You will gain executive-level visibility and be responsible for ensuring that our most crucial programs are delivered on schedule while continuously enhancing our operational efficiency. In addition to keeping programs on track, you will foster a culture of open communication, proactive problem-solving, and seamless collaboration among full-stack software, product, and Quality Assurance teams.Key Responsibilities:Develop and expand a high-performing Technical Program Management organization by attracting, nurturing, and retaining top talent.Ensure impeccable execution of mission-critical programs by upholding schedule integrity, proactively identifying and resolving blockers, clarifying requirements, and aligning all stakeholders.Influence strategic direction by shaping product roadmaps, driving data-informed feature prioritization, and owning end-to-end schedule commitments with full accountability for delivery.Promote operational excellence by designing, implementing, and continually refining scalable processes and frameworks that enhance efficiency and productivity.

Mar 23, 2026

Apply

Technical Staff Member - Inference & Reinforcement Learning Systems

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.

Feb 28, 2026

Apply

Junior Technical Program Manager

Sigma Computing

Temporary|On-site|San Francisco, CA

Sigma Computing builds cloud-scale analytics and business intelligence tools that keep the familiar feel of a spreadsheet. The platform helps business professionals, non-technical users, and data teams explore, analyze, visualize, and collaborate on data throughout their organizations. About the Temp-to-Hire Program This program is designed for early-career Technical Program Managers interested in the operational side of engineering, with a focus on Infrastructure and Data Services. The role combines technical knowledge with project management skills. Success in this position relies on a proactive approach and the ability to work with cross-functional teams to improve execution and meet goals. A strong interest in building quality products and supporting customer satisfaction is important. Note: This is a temp-to-hire position. The initial engagement lasts three months as a temporary employee. Full-time conversion is possible, depending on performance during that period.

Apr 25, 2026

Apply

Senior Technical Program Manager at Mercury | Remote or Multiple Locations

Mercury

Full-time|Remote|San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States

As Mercury continues to expand, our revenue platforms increasingly operate at the nexus of product launches, data systems, go-to-market strategies, and external partnerships. In a landscape where every task holds significance, the greatest threat is not a lack of effort, but rather a stagnation in momentum. We are seeking a Senior Technical Program Manager for our Revenue Technology team, who will ensure that intricate, multi-team projects not only progress but do so efficiently: in the right order, with clear dependencies highlighted, and tangible advancements visible on a weekly basis. This role transcends mere status updates; we need a hands-on technical TPM adept enough to directly resolve issues — whether that involves Linear, Salesforce, data workflows, or collaboration across Product, Engineering, Data, and Revenue teams. In this position, you will report to the Head of Platforms & Infrastructure and play a pivotal role in maintaining execution focus, allowing specialized technical roles to function at their highest capacity. *Mercury is a fintech company and not an FDIC-insured bank. Banking services are provided through Choice Financial Group and Column N.A., Members FDIC.

Apr 8, 2026

Apply

Technical Program Manager for Research Initiatives

Anthropic

Full-time|Remote|San Francisco, CA | New York City, NY

Anthropic is seeking a Technical Program Manager for Research Initiatives to coordinate and deliver advanced projects in artificial intelligence. This position is based in San Francisco, CA or New York City, NY. Role overview This role centers on managing research projects that push the boundaries of AI. The Technical Program Manager will oversee cross-functional teams, keeping projects on track and ensuring research goals are met. What you will do Lead and organize research initiatives focused on artificial intelligence Coordinate teams from different disciplines to achieve project objectives Track project timelines and deliverables, ensuring milestones are reached Requirements Experience managing technical or research-focused projects Ability to work with cross-functional teams Strong organizational and communication skills

Apr 29, 2026

Apply

Staff + Senior Software Engineer, Cloud Inference

Anthropic

On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA

About AnthropicAt Anthropic, our mission is to develop AI systems that are safe, interpretable, and controllable. We believe in harnessing AI for the greater good of our users and society at large. Our dynamic team comprises dedicated researchers, engineers, policy experts, and business leaders who collaborate to create beneficial AI systems.About the RoleThe Cloud Inference team is responsible for scaling and optimizing Claude to cater to a vast array of developers and enterprise clients across platforms such as AWS, GCP, Azure, and future cloud service providers (CSPs). We manage the complete lifecycle of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and daily operations.Our engineers wield significant influence, driving multiple key revenue streams while optimizing one of Anthropic's most valuable resources—compute power. As we expand to additional cloud providers, the intricacies of efficiently managing inference across diverse platforms with varying hardware, networking frameworks, and operational models grow substantially. We seek engineers adept at navigating these variances, developing strong abstractions that are effective across providers, and making informed infrastructure choices that keep us cost-effective at scale.Your contributions will enhance the operational scale of our services, expedite our capacity to launch cutting-edge models and innovative features to clients across all platforms, and ensure our large language models (LLMs) adhere to stringent safety, performance, and security standards.

Feb 5, 2026

Apply

Technical Program Manager, Enterprise

Mapbox, Inc.

Full-time|On-site|Mapbox US

Mapbox is looking for a Technical Program Manager, Enterprise to manage high-impact programs within the enterprise group. This role centers on keeping projects on track and aligning teams across the organization. Key Responsibilities Oversee enterprise programs from initial planning to final delivery, collaborating with engineering, product, and design teams. Set clear program goals and monitor progress toward milestones. Facilitate communication and manage expectations among stakeholders from different teams. Spot and address issues that could slow down or hinder project success. Requirements Strong interest in technology and comfort working in complex organizations. Experience coordinating across teams and managing large-scale programs or initiatives.

Apr 24, 2026

Apply

Senior Technical Program Manager - Operations

Pano AI

Full-time|On-site|San Francisco, California

Role overview Pano AI seeks a Senior Technical Program Manager - Operations to join the team in San Francisco, California. This position plays a key role in shaping operational strategies and guiding cross-functional teams to deliver impactful projects. The focus is on refining processes and driving greater efficiency across the organization. What you will do Coordinate and lead teams from different departments to achieve project objectives Manage operational projects from initial planning stages through to completion Spot opportunities to streamline workflows and enhance efficiency Use technical program management skills to address operational challenges Requirements Background in managing technical programs within operational settings Strong collaboration skills with diverse stakeholders Demonstrated success in delivering projects and improving processes

Apr 23, 2026

Create account — see all 4,315 results