Software Engineer, Inference – AMD GPU Enablement

OpenAISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Experience in GPU kernel development or porting using HIP, CUDA, or Triton. Familiarity with communication libraries such as NCCL/RCCL and their application in large-scale model serving. Proven background in distributed inference systems and model scaling across accelerator fleets. Passion for addressing end-to-end performance challenges across various technical layers. Enthusiasm for contributing to a fast-paced, innovative team environment.

About the job

About Our Team
The Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.

About the Role
We are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.

This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.

Your Responsibilities Include:

Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.
Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.
Debugging and optimizing distributed inference workloads across memory, network, and compute layers.
Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.
Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.
Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.

Ideal Candidates Will:

Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.
Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.
Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.
Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.
Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.

About OpenAI

At OpenAI, we are on a mission to ensure that artificial general intelligence (AGI) benefits all of humanity. Our team is composed of leading experts in AI research and technology, working collaboratively to develop cutting-edge solutions that drive progress and empower users globally. Join us in shaping the future of AI and making a lasting impact on the world.

Similar jobs

1 - 20 of 5,997 Jobs

Search for Staff Software Engineer Inference Infrastructure

5,997 results

Select all on this page (20)

Apply

Staff Software Engineer, Inference Infrastructure

Cohere

Full-Time|On-site|San Francisco

Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.

Jan 12, 2026

Apply

Lead Staff Software Engineer - GenAI Inference

Databricks

Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California

P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).

Jan 30, 2026

Apply

Technical Staff Member - Inference Engineering

Inferact

Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.

Jan 22, 2026

Apply

Staff Software Engineer, Infrastructure

Check

Full-time|$180K/yr - $247.5K/yr|Remote|San Francisco or Remote

Join the Revolution at CheckAt Check, we are transforming the payroll landscape. Our mission goes beyond just building a successful business; we collaborate with our partners to innovate payroll solutions. As pioneers of embedded payroll, we are reshaping the payment process, enabling payroll businesses to launch, expand, and succeed with ease. Discover our journey | Listen in.Check is more than an API; we are the catalyst for developing and scaling payroll operations.Our TeamThe payroll system is in dire need of innovation. We invite you to join a passionate team dedicated to making an impactful change! At Check, you will leverage creative problem-solving and critical thinking to influence every business we partner with. We view challenges as opportunities for improvement, valuing the unique contributions of each team member in our collective mission.If you're ready to dive in and transform payroll, let's collaborate to simplify complexity and enhance the future for businesses of all sizes.Your RoleAt Check, engineering is our foundation. We believe that payroll should resemble modern financial software; achieving this requires a comprehensive understanding of systems and reliable infrastructure that our partners can trust. Every product we deliver relies on scalable and secure systems that ensure timely payments and payroll processing.We are seeking a Staff Software Engineer who possesses strong software design capabilities coupled with hands-on infrastructure experience. In this position, you will focus on the essential systems that drive payroll operations, enhancing our service scalability, production operations, and empowering engineers with the tools to deliver software confidently and securely.You will collaborate across product and platform areas to enhance our cloud infrastructure, fortify our deployment and monitoring strategies, and streamline the architecture that supports embedded payroll services. The challenges you will address often intersect infrastructure, product, and operational domains.This opportunity is perfect for someone who has managed complex systems end-to-end in a dynamic environment and takes pride in developing resilient, comprehensible infrastructure that is vital to our operations.

Mar 12, 2026

Apply

Software Engineer, Inference

Pulse

Full-time|On-site|San Francisco

OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.

Jul 30, 2025

Apply

Staff + Senior Software Engineer, Cloud Inference

Anthropic

On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA

About AnthropicAt Anthropic, our mission is to develop AI systems that are safe, interpretable, and controllable. We believe in harnessing AI for the greater good of our users and society at large. Our dynamic team comprises dedicated researchers, engineers, policy experts, and business leaders who collaborate to create beneficial AI systems.About the RoleThe Cloud Inference team is responsible for scaling and optimizing Claude to cater to a vast array of developers and enterprise clients across platforms such as AWS, GCP, Azure, and future cloud service providers (CSPs). We manage the complete lifecycle of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and daily operations.Our engineers wield significant influence, driving multiple key revenue streams while optimizing one of Anthropic's most valuable resources—compute power. As we expand to additional cloud providers, the intricacies of efficiently managing inference across diverse platforms with varying hardware, networking frameworks, and operational models grow substantially. We seek engineers adept at navigating these variances, developing strong abstractions that are effective across providers, and making informed infrastructure choices that keep us cost-effective at scale.Your contributions will enhance the operational scale of our services, expedite our capacity to launch cutting-edge models and innovative features to clients across all platforms, and ensure our large language models (LLMs) adhere to stringent safety, performance, and security standards.

Feb 5, 2026

Apply

Software Engineer - Inference Platform at Fluidstack | San Francisco

Fluidstack

Full-time|$165K/yr - $500K/yr|On-site|San Francisco, CA

Join the Fluidstack TeamAt Fluidstack, we’re pioneering the infrastructure for advanced intelligence. We collaborate with leading AI laboratories, governmental entities, and major corporations—including Mistral, Poolside, and Meta—to deliver computing solutions at unprecedented speeds.Our mission is to transform the vision of Artificial General Intelligence (AGI) into a reality. Driven by our purpose, our dedicated team is committed to building state-of-the-art infrastructure that prioritizes our customers' success. If you share our passion for excellence and are eager to contribute to the future of intelligence, we invite you to be part of our journey.Role OverviewThe Inference Platform team at Fluidstack is at the forefront of addressing the cost and latency challenges associated with frontier AI. You will play a crucial role in managing the serving layer that connects our global accelerator supply with the production workloads of our clients, which include LLM serving frameworks, KV cache infrastructure, and Kubernetes orchestration across multiple data centers.This hands-on individual contributor role combines elements of distributed systems, model optimization, and serving infrastructure. You will oversee the entire lifecycle of inference deployments for leading AI labs, striving for enhancements in throughput, cost-efficiency, and response times, while also influencing the architectural decisions that guide Fluidstack’s deployment strategies.

Mar 5, 2026

Apply

Staff Software Engineer - Node Infrastructure

Anthropic

Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA

Anthropic is hiring a Staff Software Engineer to focus on Node Infrastructure. This position is based in San Francisco, New York City, or Seattle. Role overview This role centers on designing, building, and maintaining the core systems that support Anthropic’s services. The work directly affects the reliability and scalability of the company’s AI offerings. Collaboration Work closely with a skilled engineering team to develop infrastructure that supports high-quality AI solutions. The team values input and hands-on problem solving from every member. Impact Efforts in this role help ensure Anthropic’s services remain stable and can grow as demand increases. The systems you help create will play a key part in the company’s ability to deliver dependable AI products.

Apr 29, 2026

Apply

Senior Staff Software Engineer, Data Infrastructure

Watney Robotics

Full-time|$200K/yr - $275K/yr|On-site|San Francisco

About Watney RoboticsAt Watney Robotics, we are pioneers in developing autonomous robotic solutions aimed at enhancing critical infrastructure. Recently securing $21 million in seed funding from leading investors such as Conviction, Abstract, and A*, we are collaborating with the world’s largest hyperscalers to propel the expansion of data centers and streamline maintenance processes.This is an extraordinary opportunity to join our team at a pivotal stage as we transition from prototype to large-scale production. Be part of a team that not only ships cutting-edge systems but also plays a crucial role in shaping the operational framework of an innovative robotics company.

Oct 5, 2025

Apply

Software Engineer, Model Inference

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

Feb 6, 2025

Apply

Software Engineer, Inference - Multi Modal

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin OpenAI’s dynamic Inference team, where we empower the deployment of cutting-edge AI models, including our renowned GPT models, advanced Image Generation capabilities, and Whisper, across diverse platforms. Our mission is to ensure these models are not only high-performing and scalable but also available for real-world applications. Collaborating closely with our Research team, we’re committed to bringing the next generation of AI innovations to fruition. As a compact, agile team, we prioritize delivering an exceptional developer experience while continuously pushing the frontiers of artificial intelligence.As we expand our focus into multimodal inference, we are building the necessary infrastructure to support models that process images, audio, and other non-text modalities. This work involves tackling diverse model sizes and interactions, managing complex input/output formats, and ensuring seamless collaboration between product and research teams.About The RoleWe are seeking a passionate Software Engineer to aid in the large-scale deployment of OpenAI’s multimodal models. You will join a small yet impactful team dedicated to creating robust, high-performance infrastructure for real-time audio, image, and various multimodal workloads in production environments.This position is inherently collaborative; you will work directly with researchers who develop these models and with product teams to define novel interaction modalities. Your contributions will enable users to generate speech, interpret images, and engage with models in innovative ways that extend beyond traditional text-based interactions.Key Responsibilities:Design and implement advanced inference infrastructure for large-scale multimodal models.Optimize systems for high-throughput and low-latency processing of image and audio inputs and outputs.Facilitate the transition of experimental research workflows into dependable production services.Engage closely with researchers, infrastructure teams, and product engineers to deploy state-of-the-art capabilities.Contribute to systemic enhancements, including GPU utilization, tensor parallelism, and hardware abstraction layers.You May Excel In This Role If You:Have a proven track record of building and scaling inference systems for large language models or multimodal architectures.Possess experience with GPU-based machine learning workloads and a solid understanding of the performance dynamics associated with large models, particularly with intricate data types like images or audio.Thrive in a fast-paced, experimental environment and enjoy collaborating with cross-functional teams to drive impactful results.

May 21, 2025

Apply

Technical Staff Member - Machine Learning Systems & Inference

Gimlet Labs

Full-time|On-site|San Francisco

At Gimlet Labs, we are pioneering the development of the first heterogeneous neocloud designed specifically for AI workloads. As the demand for AI systems surges, traditional homogeneous infrastructures face critical limits in power, capacity, and cost. Our innovative platform effectively decouples AI workloads from their hardware foundations, intelligently partitioning tasks and orchestrating them to the most suitable hardware for optimal performance and efficiency. This strategy fosters heterogeneous systems that span multiple vendors and generations, including cutting-edge accelerators, enabling significant enhancements in performance and cost-effectiveness at scale.In addition to this foundational work, Gimlet is establishing a robust neocloud for agentic workloads. Our clients benefit from deploying and managing their workloads via stable, production-ready APIs, without the need to navigate hardware selection or performance optimization intricacies.We collaborate with foundation labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI datacenters.We are currently seeking a Member of Technical Staff specializing in ML systems and inference. In this pivotal role, you will be responsible for designing and constructing inference systems that facilitate the execution of complete models in real production environments. You will operate at the intersection of model architecture and system performance to ensure that inference processes are swift, predictable, and scalable.This position is perfect for engineers with a deep understanding of modern model execution and a passion for optimizing latency, throughput, and memory utilization across the entire inference lifecycle.

Mar 10, 2026

Apply

Technical Staff Member - Inference & Reinforcement Learning Systems

Magic.dev

Full-time|On-site|San Francisco

At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.

Feb 28, 2026

Apply

Staff Software Engineer, Core Infrastructure

Harvey

Full-time|$236K/yr - $290K/yr|On-site|San Francisco

Harvey builds a secure, enterprise-grade platform for legal and professional services, powered by advanced agentic AI. The company serves more than 1,000 clients in over 60 countries and is backed by top investors. Harvey’s team emphasizes speed, ownership, and high standards, working closely with customers to address real-world needs. This Staff Software Engineer role is based in San Francisco and requires in-person work. Relocation support is available for those moving to the area. Role overview The Core Infrastructure team at Harvey designs and maintains the systems that support every user interaction on the company’s global legal AI platform. These systems process billions of prompt tokens and millions of daily requests for leading law firms and professional service providers around the world. The position combines new infrastructure development with a focus on operational reliability. The work has a direct effect on the platform’s scalability, security, and resilience as Harvey grows into new regions and serves more customers. Key responsibilities Design and implement scalable, fault-tolerant infrastructure systems for Harvey’s AI platform across multiple cloud regions. Own and enhance multi-cloud infrastructure (Azure, GCP), with emphasis on Kubernetes orchestration, networking, and container management. Lead technical initiatives in observability, incident response, and performance tuning.

Apr 27, 2026

Apply

Staff Software Engineer - Machine Learning Infrastructure

Decagon

Full-time|Remote|San Francisco

Join Decagon as a Staff Software Engineer specializing in Machine Learning Infrastructure. In this role, you will play a crucial part in enhancing and optimizing our machine learning systems. You will collaborate with a talented team of engineers to build scalable and efficient infrastructure that supports our AI-driven initiatives.As a key contributor, you will leverage your expertise in software engineering and machine learning to solve complex challenges and drive innovation. Your work will impact various projects and help shape the future of our technology.

Feb 24, 2026

Apply

Software Engineer, Inference – AMD GPU Enablement

OpenAI

Full-time|On-site|San Francisco

About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.

Oct 8, 2025

Apply

Software Engineer for Cloud Inference Safeguards

Anthropic

Full-time|Remote|San Francisco, CA | Seattle, WA

Join our innovative team at Anthropic as a Software Engineer specializing in Cloud Inference Safeguards. In this role, you will play a crucial part in developing and enhancing the systems that ensure the robustness and security of our cloud-based inference services. You will collaborate with cross-functional teams to design, implement, and maintain scalable solutions that meet our high standards for reliability and performance.

Mar 27, 2026

Apply

Staff Software Engineer, Database Infrastructure

Gusto, Inc.

Full-time|$200K/yr - $270K/yr|On-site|Denver, CO;San Francisco, CA;New York, NY;Los Angeles, CA;Seattle, WA

About GustoAt Gusto, we are dedicated to empowering small businesses by managing essential services like payroll, health insurance, 401(k)s, and HR, allowing owners to focus on their passions and customers. With offices in Denver, San Francisco, and New York, we proudly support over 400,000 small businesses nationwide, fostering a workplace that reflects and celebrates the diverse customers we serve. Explore our Total Rewards philosophy. About the Role:We are seeking a seasoned engineer with extensive knowledge in distributed data systems to help shape the future of Gusto's storage architecture. In this impactful role, you will oversee intricate migrations, design high-scale systems, and establish benchmarks for automation, resilience, and security. Your work in implementing distributed database solutions will facilitate Gusto's ongoing growth and scalability.About the Team:The Datastores Infrastructure Engineering team is responsible for designing, building, and maintaining the data platforms that drive Gusto's products, including MySQL, Postgres, Redis, Kafka, and S3. We are committed to ensuring that our infrastructure is consistent, dependable, and equipped to support Gusto's expanding requirements. As we transition to self-hosted distributed databases, our focus lies in minimizing the blast radius, enhancing operational resilience, and enabling sustainable scalability.Here’s what you’ll do day-to-day:Architect, deploy, and manage the complete lifecycle of distributed database systems (TiDB) on Kubernetes at scale, ensuring high availability, data consistency, and operational excellence.Coordinate complex, zero-downtime migrations from monolithic to distributed architectures, including vertical sharding to isolate Product Services.Define and implement efficiency enhancements across the storage infrastructure through query optimization, caching strategies, and workload management.Establish standards and develop reliable automation to maintain data consistency, integrity, and security across distributed systems.Continuously enhance operational excellence by decreasing on-call burdens with sustainable, long-term solutions.Collaborate with product engineering teams and technical partners to enable rapid and reliable product development.

Jan 27, 2026

Apply

Staff Product Manager, Managed Inference at Crusoe | San Francisco

Crusoe

Full-time|$204K/yr - $247K/yr|On-site|San Francisco, CA - US

At Crusoe, we are on a mission to enhance the availability of energy and intelligence. We are developing the driving force behind a future where individuals can harness the power of AI without compromising on scale, speed, or sustainability.Join the AI revolution with sustainable technology at Crusoe. This is your chance to lead impactful innovations, contribute to meaningful projects, and collaborate with a team dedicated to pioneering responsible and transformative cloud infrastructure.Role Overview:As an integral member of the Crusoe Managed AI Services team, you will oversee the entire product lifecycle for our Managed Inference services. From conceptualization and strategic planning to execution and market introduction, you will be the driving force behind our inference service offerings. Your ability to translate market demands and technical details into succinct product specifications and narratives will be crucial in fostering business growth for Crusoe Cloud.This position is a Staff-level individual contributor role that offers considerable autonomy and influence. You will act as a senior product owner for a pivotal segment of our platform, collaborating closely with engineering, infrastructure, and go-to-market teams to expand and enhance Crusoe’s inference capabilities as the organization evolves.This is a unique opportunity to shape and develop a foundational product area within a rapidly growing and innovative company.Key Responsibilities:Lead the complete product lifecycle for Crusoe’s Managed Inference services, encompassing roadmap creation, execution, and iterative improvements.Convert customer feedback, market insights, and technical limitations into clear product requirements and prioritization strategies.Collaborate effectively with Engineering, Infrastructure, and Platform teams to provide scalable and dependable inference services.Influence product decisions regarding performance, reliability, cost-effectiveness, and user experience for developers.Establish and monitor success metrics for inference services operating in production environments.Work alongside go-to-market teams to facilitate product launches, brand positioning, and customer engagement.Articulate product strategy and decisions clearly to cross-functional partners and leadership.

Dec 24, 2025

Apply

Staff Software Engineer, Compute

fal

Full-time|$180K/yr - $250K/yr|On-site|San Francisco

Join our innovative team at fal as a Staff Software Engineer specializing in large-scale computation platforms. We are seeking a seasoned software engineer with extensive experience in developing backend systems that efficiently orchestrate workloads and manage resource constraints. Your expertise in foundational cloud infrastructure and Linux provisioning will be crucial as you work towards achieving high reliability and scalability with minimal operational overhead.

Dec 16, 2025

Create account — see all 5,997 results