Software Engineer, Inference - Performance Optimization

OpenAISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Proficiency in programming languages such as Python, C++, or Java. Strong understanding of algorithms and data structures. Experience with machine learning frameworks (e.g., TensorFlow, PyTorch). Ability to analyze and optimize code for performance. Excellent problem-solving skills and attention to detail. Strong communication skills and ability to work in a team-oriented environment.

About the job

This Software Engineer position at OpenAI focuses on inference and performance optimization. Based in San Francisco, the role centers on increasing the speed and efficiency of advanced AI systems. Collaboration with experienced engineers is a key part of the work, with an emphasis on refining AI performance.

What you will do

Work on optimizing the performance of AI inference systems
Collaborate with other engineers to improve efficiency and speed
Contribute to solutions that enhance AI system capabilities

Location

This role is based in San Francisco.

About OpenAI

OpenAI is a leading artificial intelligence research lab focused on developing safe and beneficial AI technologies. Our mission is to ensure that advanced AI is aligned with human values and benefits all of humanity. At OpenAI, you will work alongside some of the brightest minds in the field, contributing to groundbreaking projects that have a real-world impact.

Similar jobs

1 - 20 of 6,790 Jobs

Search for Software Engineer Productivity And Model Performance

6,790 results

Select all on this page (20)

Apply

Software Engineer - Productivity and Model Performance

OpenAI

Full-time|On-site|San Francisco

OpenAI is seeking a Software Engineer in San Francisco to focus on improving productivity by optimizing model performance. This position centers on developing solutions that make machine learning models more efficient and effective. Role overview This role involves working closely with teams across different functions to identify and address areas where model performance can be improved. The aim is to deliver changes that have a measurable impact on both systems and workflows. What you will do Collaborate with engineers and other specialists to enhance model efficiency Develop and implement solutions that improve the effectiveness of machine learning systems Contribute to projects that streamline processes and drive productivity gains Impact Your work will help shape improvements in how models operate and how teams at OpenAI achieve their goals. The changes you help deliver will support more effective use of resources and better outcomes for the organization.

Apr 29, 2026

Apply

Performance Modeling Engineer

OpenAI

Full-time|Remote|San Francisco

OpenAI is seeking a Performance Modeling Engineer based in San Francisco. This role centers on building and improving models that enhance the performance and efficiency of AI systems. The work directly supports the technical backbone of OpenAI’s products. Key responsibilities Develop and refine models aimed at optimizing the performance of AI systems. Collaborate with engineers and data scientists to tackle technical challenges as they arise. Contribute to projects that improve the efficiency of large-scale AI infrastructure. Role overview This position offers the chance to work on foundational technology that underpins OpenAI’s products. The focus is on practical improvements and close teamwork with technical colleagues to advance the capabilities and efficiency of AI at scale.

Apr 20, 2026

Apply

Performance Modeling Engineer II

OpenAI

Full-time|On-site|San Francisco

Role overview The Performance Modeling Engineer II position at OpenAI centers on building and applying performance models to enhance the efficiency of advanced AI systems. Based in San Francisco, this role contributes to the reliability and speed of OpenAI’s technologies. What you will do Develop and implement performance models for AI systems Collaborate with data scientists and engineers to refine performance metrics Support the efficiency and rigorous standards of OpenAI’s technologies

Apr 20, 2026

Apply

Software Engineer - Model Performance

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENBaseten is at the forefront of AI technology, empowering leading-edge companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer to seamlessly integrate advanced AI models into their operations. Our unique blend of applied AI research, adaptable infrastructure, and intuitive developer tools enables innovators to bring their most ambitious AI products to life. With our recent $300M Series E funding from top-tier investors such as BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in shaping the platform that engineers rely on to deploy transformative AI solutions.THE ROLEAre you driven by a passion for enhancing artificial intelligence applications? We are seeking a proactive Software Engineer specializing in ML performance to join our energetic team. This position is perfect for backend engineers who thrive in a fast-paced startup environment and are eager to make substantial contributions to the realm of Large Language Model (LLM) Inference. If you're enthusiastic about optimizing open-source ML models, we can't wait to hear from you!EXAMPLE INITIATIVESAs a member of our Model Performance team, you will have the opportunity to work on exciting projects, including:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESDevelop, refine, and implement advanced techniques (quantization, speculative decoding, kv cache reuse, chunked prefill, and LoRA) for ML model inference and infrastructure.Conduct thorough investigations into the codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to troubleshoot and resolve ML performance issues.Scale and apply optimization techniques across a diverse array of ML models, with a focus on large language models.

Mar 28, 2024

Apply

Performance Modeling Lead

OpenAI

Full-time|On-site|San Francisco

Role overview The Performance Modeling Lead at OpenAI works from San Francisco and takes on both technical and leadership responsibilities. This position centers on developing new modeling methods that enhance performance across a variety of applications. Alongside direct technical contributions, the role involves guiding a team and shaping project direction. What you will do Develop and improve modeling strategies to raise performance metrics for multiple projects. Use expertise in data analysis, machine learning, and optimization to address complex problems. Lead and mentor a team, supporting their technical development and ensuring strong project outcomes.

Apr 20, 2026

Apply

Software Engineer - Model API's

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we are at the forefront of AI innovation, providing critical inference solutions for leading AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our platform combines advanced AI research, adaptable infrastructure, and intuitive developer tools, empowering organizations to deploy state-of-the-art models effectively. With rapid growth and a recent $300M Series E funding round backed by top-tier investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we invite you to join our mission in building the platform of choice for engineers delivering AI products.THE ROLE:As a member of Baseten’s Model Performance (MP) team, you will play a pivotal role in ensuring our platform’s model APIs are not only fast and reliable but also cost-effective. Your primary focus will be on developing and optimizing the infrastructure that supports our hosted API endpoints for cutting-edge open-source models. This role involves working with distributed systems, model serving, and enhancing the developer experience. You will collaborate with a small, dynamic team at the intersection of product development, model performance, and infrastructure, defining how developers interact with AI models on a large scale.RESPONSIBILITIES:Design, develop, and maintain the Model APIs surface, focusing on advanced inference features such as structured outputs (JSON mode, grammar-constrained generation), tool/function calling, and multi-modal serving.Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, create custom CUDA operators, and enhance memory allocation patterns for maximum efficiency across multi-GPU setups.Implement performance improvements across various runtimes based on a deep understanding of their internals, including speculative decoding, guided generation for structured outputs, and custom scheduling algorithms for high-performance serving.Develop robust benchmarking frameworks to evaluate real-world performance across diverse model architectures, batch sizes, sequence lengths, and hardware configurations.Enhance performance across runtimes (e.g., TensorRT, TensorRT-LLM) through techniques such as speculative decoding, quantization, batching, and KV-cache reuse.Integrate deep observability mechanisms (metrics, traces, logs) and establish repeatable benchmarks to assess speed, reliability, and quality.

Oct 11, 2025

Apply

Software Engineer, Model Inference

OpenAI

Full-time|On-site|San Francisco

About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.

Feb 6, 2025

Apply

Senior Software Engineer - 3D Modeling

Hover

Full-time|$165K/yr - $203K/yr|On-site|san_francisco

At Hover, we empower individuals to design, enhance, and safeguard their cherished properties. Utilizing proprietary AI technology built on over a decade of real property data, we provide answers to pressing questions such as “What will it look like?” and “What will it cost?” Homeowners, contractors, and insurance professionals depend on Hover to receive fully measured, accurate, and interactive 3D models of any property—achieved through a smartphone scan in mere minutes.We are driven by curiosity, purpose, and a collective commitment to our customers, communities, and each other. At Hover, we believe the most innovative ideas stem from diverse perspectives, and we take pride in fostering an inclusive, high-performance culture that encourages growth, accountability, and excellence. Supported by leading investors like Google Ventures and Menlo Ventures, and trusted by industry leaders including Travelers, State Farm, and Nationwide, we are transforming how people perceive and interact with their environments.Why Join Hover?At Hover, 3D models are not just a feature; they are the essence of our product. Each scan and data point we process empowers homeowners, insurers, and contractors to make informed, data-driven decisions. We are seeking a Software Engineer who has a passion for geometry, automation, and making a tangible impact in the real world. In this role, you will design and implement systems that convert customer-captured imagery into meticulously accurate 3D models, enhancing the scalability and precision of Hover’s modeling pipeline. You will work collaboratively with designers and engineers across frontend, backend, computer vision, and DevOps to bring innovative capabilities to fruition, blending technical expertise with strong communication and cross-functional collaboration.The 3D Modeling Pipeline team develops the tools essential for our in-house operations to transform customer-captured scans into highly detailed, accurate 3D models of buildings. This team is also responsible for creating the pipeline and systems that process 3D data through both automated and manual steps, as well as exporting data into customer-facing formats.Your Contributions Will Include:Owning and evolving backend systems that convert raw scan data into exact 3D models, ensuring timely delivery to key ecosystem partners like Xactimate and Cotality.Building and refining internal modeling tools that enable teams to efficiently generate, validate, and optimize high-quality 3D data.Collaborating with machine learning and computer vision engineers to implement new algorithms into production, bridging research with practical applications.Enhancing customer and partner experiences by improving how Hover’s 3D outputs integrate with downstream workflows and external platforms.Promoting innovation and ongoing enhancement across our modeling pipeline.

Mar 19, 2026

Apply

Engineering Manager - Model Performance

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the most innovative AI companies—such as Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer—by providing a robust platform for mission-critical inference. Our unique combination of applied AI research, adaptable infrastructure, and cutting-edge developer tools allows companies at the forefront of AI to deploy state-of-the-art models seamlessly. Having recently secured a $300M Series E funding round from notable investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we are poised for rapid growth. Join us in creating the essential platform for engineers to launch AI products.THE ROLEAre you driven to push the boundaries of artificial intelligence while leading a team of talented engineers? We are seeking a Technical Lead Manager with a focus on machine learning performance and inference. This position is perfect for an individual with a strong engineering foundation who is eager to guide and mentor a team while remaining actively engaged in hands-on technology work. If you excel in a dynamic startup atmosphere and are excited to tackle both leadership and technical challenges, we invite you to apply.EXAMPLE INITIATIVESAs a member of our Model Performance team, you will work on projects such as:Baseten Embeddings Inference: The fastest embeddings solution availableThe Baseten Inference StackDriving model performance optimizationRESPONSIBILITIESLead, mentor, and manage a team of engineers dedicated to developing and optimizing ML model inference and performance.Oversee technical strategy and architectural decisions, fostering improvements across our engineering organization.Collaborate with cross-functional teams to ensure the seamless integration and scalability of ML models in production settings.Drive innovation in model performance and advocate for best practices within the team.

Sep 12, 2024

Apply

Staff Software Engineer, Model Serving

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle the most challenging problems in the world — from realizing the future of transportation to fast-tracking medical innovations. We accomplish this by developing and operating the premier data and AI infrastructure platform, enabling our customers to harness profound data insights for business enhancement. Our Model Serving product equips organizations with a cohesive, scalable, and governed solution for deploying and managing AI/ML models — ranging from traditional machine learning to intricate proprietary large language models. It ensures real-time, low-latency inference, governance, monitoring, and lineage. As the adoption of AI surges, Model Serving stands as a fundamental component of the Databricks platform, allowing customers to operationalize models at scale with robust SLAs and cost efficiency. In the role of Staff Engineer, you will significantly influence both the product experience and the core infrastructure of Model Serving. Your responsibilities will include designing and constructing systems that facilitate high-throughput, low-latency inference across CPU and GPU workloads, steering architectural strategies, and collaborating extensively with platform, product, infrastructure, and research teams to create an exceptional serving platform.

Jan 30, 2026

Apply

Senior Software Engineer - Model Training

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we are at the forefront of enabling transformative AI solutions for some of the world's leading companies, including Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our innovative platform combines cutting-edge AI research, adaptable infrastructure, and developer-friendly tools to facilitate the production of advanced models. Recently, we celebrated our rapid growth with a successful $300M Series E funding round from notable investors like BOND, IVP, Spark Capital, Greylock, and Conviction. We invite you to join our dynamic team and contribute to the evolution of AI product deployment.THE ROLEAs a Senior Software Engineer specializing in Model Training at Baseten, you will play a pivotal role in constructing the infrastructure essential for the large-scale training and fine-tuning of foundational AI models. Your responsibilities will include designing and implementing distributed training systems, optimizing GPU utilization, and establishing scalable pipelines that empower Baseten and our clientele to adapt models with efficiency and reliability. This role demands a high level of technical expertise and hands-on involvement: you will be responsible for critical components of our training stack, collaborate with product and infrastructure teams to identify customer needs, and drive advancements in scalable training infrastructure.EXAMPLE WORK:Training open-source models that surpass GPT-5 capabilities for a leading digital insurerExploring specialized, continuously learning models as the future of AIOverview of our training documentationResearch initiatives we've undertakenRESPONSIBILITIESDesign, construct, and sustain distributed training infrastructures for large foundation modelsDevelop scalable pipelines for fine-tuning and training across diverse GPU/accelerator clustersEnhance training performance through optimization of algorithms and infrastructureCollaborate closely with cross-functional teams to align technical solutions with business objectivesStay abreast of advancements in the field of machine learning and AI to continually improve our training processes

Aug 29, 2025

Apply

Senior Software Engineer - Performance

Databricks

Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

P-97 At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging problems in the world. We achieve this by creating and managing a leading data and AI infrastructure platform that enables our clients to leverage deep data insights for business enhancement. Our commitment to pushing the limits of data and AI technology is matched by our focus on resilience, security, and scalability, which are essential for our customers' success on our platform. Databricks operates one of the largest-scale software platforms, comprising millions of virtual machines that generate terabytes of logs and process exabytes of data daily. Given our scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must adeptly protect our customers from these issues. As a Senior Performance Engineer, you will collaborate with various teams throughout the organization to assess product and feature performance, pinpoint performance bottlenecks, and partner with engineers to address performance and scalability challenges. This includes setting performance goals for different software releases, guiding teams in developing performance benchmarks, conducting competitive benchmark analyses for various Databricks products, and performing in-depth analyses to identify and resolve performance issues.

Jan 30, 2026

Apply

Software Engineer, Cybersecurity Products

Anthropic

On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA; Washington, DC

Join Anthropic as a Senior Software Engineer focused on developing cutting-edge AI-driven cybersecurity products. In this dynamic role, you will collaborate with a passionate team to prototype and build innovative solutions that enhance security applications. Your work will bridge research, product development, and customer engagement, allowing you to contribute significantly to the future of cybersecurity.

Jan 29, 2026

Apply

Senior Software Engineer, Model Serving

Databricks

Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California

At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging issues of our time—from realizing the future of transportation to speeding up medical innovations. We achieve this by developing and maintaining the premier data and AI infrastructure platform, allowing our clients to leverage profound data insights to enhance their operations. Our Model Serving product equips organizations with a cohesive, scalable, and governed platform for deploying and overseeing AI/ML models, spanning traditional ML to specialized large language models. It provides real-time, low-latency inference, governance, monitoring, and lineage capabilities. With the rapid rise of AI adoption, Model Serving stands as a fundamental component of the Databricks platform, enabling clients to operationalize models efficiently and cost-effectively at scale. As a Senior Engineer, your role will be pivotal in transforming both the product experience and the underlying infrastructure of Model Serving. You will design and create systems enabling high-throughput, low-latency inference across CPU and GPU workloads, influence architectural strategies, and work closely with platform, product, infrastructure, and research teams to deliver an exceptional serving platform.

Jan 30, 2026

Apply

Senior Software Engineer - Video Performance

Canva

Full-time|On-site|San Francisco

Join our talented team at Canva as a Senior Software Engineer specializing in Video Performance. We are looking for an innovative and solutions-oriented engineer who is passionate about optimizing video experiences for our users. In this role, you will collaborate with cross-functional teams to enhance performance, develop new features, and implement best practices in video engineering.

Mar 16, 2026

Apply

Software Engineer, Inference - Performance Optimization

OpenAI

Full-time|On-site|San Francisco

Role overview This Software Engineer position at OpenAI focuses on inference and performance optimization. Based in San Francisco, the role centers on increasing the speed and efficiency of advanced AI systems. Collaboration with experienced engineers is a key part of the work, with an emphasis on refining AI performance. What you will do Work on optimizing the performance of AI inference systems Collaborate with other engineers to improve efficiency and speed Contribute to solutions that enhance AI system capabilities Location This role is based in San Francisco.

Apr 25, 2026

Apply

Senior Software Engineer - Network Performance & Reliability

Cloudflare, Inc.

Full-time|Hybrid|Hybrid

Join Cloudflare as a Senior Software Engineer specializing in Network Performance & Reliability! In this role, you'll be at the forefront of enhancing the performance and stability of our global network, ensuring our customers benefit from unparalleled speed and reliability. You'll collaborate with experts across various teams to design and implement innovative solutions that optimize network operations.

Mar 11, 2026

Apply

Staff Software Engineer, Foundation Model Serving

Databricks

Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, we are driven by our commitment to empower data teams in tackling the world's most challenging problems — from transforming transportation solutions to accelerating medical advancements. Our mission revolves around constructing and maintaining the world's premier data and AI infrastructure platform, enabling our clients to harness deep data insights for enhanced business outcomes.Foundation Model Serving represents the API product designed for hosting and serving advanced AI model inference, catering to both open-source models like Llama, Qwen, and GPT OSS, as well as proprietary models such as Claude and OpenAI GPT. We welcome engineers who have experience managing high-scale operational systems, including customer-facing APIs, Edge Gateways, or ML Inference services, even if they do not have a background in ML or AI. A passion for developing LLM APIs and runtimes at scale is essential.As a Staff Engineer, you will play a pivotal role in defining both the product experience and the underlying infrastructure. You will be tasked with designing and building systems that facilitate high-throughput, low-latency inference on GPU workloads with cutting-edge models. Your influence will extend to architectural direction, working closely with platform, product, infrastructure, and research teams to deliver an exceptional foundation model API product.The impact you will have:Design and implement core systems and APIs that drive Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.Collaborate with product and engineering leaders to outline the technical roadmap and long-term architecture for workload serving.Make architectural decisions to enhance performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.Contribute directly to critical components within the serving infrastructure, from systems like vLLM and SGLang to developing token-based rate limiters and optimizers, ensuring seamless and efficient operations at scale.Work cross-functionally with product, platform, and research teams to transform customer requirements into dependable and high-performing systems.Establish best practices for code quality, testing, and operational readiness while mentoring fellow engineers through design reviews and technical support.Represent the team in inter-departmental technical discussions, influencing Databricks’ wider AI platform strategy.

Jan 30, 2026

Apply

Software Engineer - Enhancing Network Performance & Reliability

Cloudflare, Inc.

Full-time|Hybrid|Hybrid

Join Cloudflare as a Software Engineer dedicated to enhancing our network performance and reliability. In this dynamic role, you will collaborate with cross-functional teams to develop innovative software solutions that optimize our network infrastructure and ensure high availability and performance for our users. Your contributions will directly impact millions of users worldwide, making the internet a safer place for everyone.

Mar 11, 2026

Apply

Senior Software Engineer, Model Data Platform

Waymo LLC

Full-time|$250K/yr - $334.5K/yr|Hybrid|Mountain View, CA USA; San Francisco, CA USA;

Waymo is a pioneering company in autonomous driving technology, dedicated to becoming the world’s most trusted driver. Originating from the Google Self-Driving Car Project in 2009, Waymo has established the Waymo Driver—The World’s Most Experienced Driver™—with a mission to enhance mobility access and save lives lost in traffic accidents. The Waymo Driver powers our fully autonomous ride-hailing service and can be integrated across various vehicle platforms and applications. Having completed over ten million rider-only trips, our technology has driven more than 100 million miles on public roads and tens of billions in simulations across over 15 U.S. states.The Perception team is responsible for developing systems that learn the spatial-temporal representations and semantic meanings of the environment surrounding our autonomous vehicles (AVs). We collaborate closely with downstream teams to optimize and integrate our work into the Waymo Driver, conduct research to solve real-world challenges, and work alongside research teams at Alphabet. With access to millions of miles of diverse driving data from various sensors, we empower engineers like you to (1) create methods for efficient continuous learning from extensive real-world data, (2) develop scalable models and training methodologies, (3) analyze real-world behaviors to create systems that can navigate complexities, and (4) optimize models for both onboard and offboard hardware.In this hybrid role, you will report to a Technical Lead Manager.

Apr 13, 2026

Create account — see all 6,790 results