Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Proven experience in software development, specifically in building scalable applications. Strong proficiency in programming languages such as Java, Python, or JavaScript. Experience with cloud technologies and microservices architecture. Excellent problem-solving skills and a passion for technology. Ability to work collaboratively in a team-oriented environment.
About the job
Join Pinterest as a Staff Software Engineer in the Agent Foundations team, where you will be instrumental in developing and enhancing our core engineering capabilities. You will collaborate with cross-functional teams to create innovative solutions that drive our platform forward. Your expertise will contribute to shaping the future of our product and ensuring a seamless user experience.
About Pinterest, Inc.
Pinterest is a visual discovery engine that empowers users to find inspiration and ideas for their projects and interests. With a diverse team passionate about creativity and innovation, we strive to create a positive impact on the world through our platform.
Similar jobs
1 - 20 of 5,891 Jobs
Search for Staff Software Engineer Foundation Model Serving
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California
At Databricks, we are driven by our commitment to empower data teams in tackling the world's most challenging problems — from transforming transportation solutions to accelerating medical advancements. Our mission revolves around constructing and maintaining the world's premier data and AI infrastructure platform, enabling our clients to harness deep data insights for enhanced business outcomes.Foundation Model Serving represents the API product designed for hosting and serving advanced AI model inference, catering to both open-source models like Llama, Qwen, and GPT OSS, as well as proprietary models such as Claude and OpenAI GPT. We welcome engineers who have experience managing high-scale operational systems, including customer-facing APIs, Edge Gateways, or ML Inference services, even if they do not have a background in ML or AI. A passion for developing LLM APIs and runtimes at scale is essential.As a Staff Engineer, you will play a pivotal role in defining both the product experience and the underlying infrastructure. You will be tasked with designing and building systems that facilitate high-throughput, low-latency inference on GPU workloads with cutting-edge models. Your influence will extend to architectural direction, working closely with platform, product, infrastructure, and research teams to deliver an exceptional foundation model API product.The impact you will have:Design and implement core systems and APIs that drive Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.Collaborate with product and engineering leaders to outline the technical roadmap and long-term architecture for workload serving.Make architectural decisions to enhance performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.Contribute directly to critical components within the serving infrastructure, from systems like vLLM and SGLang to developing token-based rate limiters and optimizers, ensuring seamless and efficient operations at scale.Work cross-functionally with product, platform, and research teams to transform customer requirements into dependable and high-performing systems.Establish best practices for code quality, testing, and operational readiness while mentoring fellow engineers through design reviews and technical support.Represent the team in inter-departmental technical discussions, influencing Databricks’ wider AI platform strategy.
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle the most challenging problems in the world — from realizing the future of transportation to fast-tracking medical innovations. We accomplish this by developing and operating the premier data and AI infrastructure platform, enabling our customers to harness profound data insights for business enhancement. Our Model Serving product equips organizations with a cohesive, scalable, and governed solution for deploying and managing AI/ML models — ranging from traditional machine learning to intricate proprietary large language models. It ensures real-time, low-latency inference, governance, monitoring, and lineage. As the adoption of AI surges, Model Serving stands as a fundamental component of the Databricks platform, allowing customers to operationalize models at scale with robust SLAs and cost efficiency. In the role of Staff Engineer, you will significantly influence both the product experience and the core infrastructure of Model Serving. Your responsibilities will include designing and constructing systems that facilitate high-throughput, low-latency inference across CPU and GPU workloads, steering architectural strategies, and collaborating extensively with platform, product, infrastructure, and research teams to create an exceptional serving platform.
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle some of the most challenging issues of our time—from realizing the future of transportation to speeding up medical innovations. We achieve this by developing and maintaining the premier data and AI infrastructure platform, allowing our clients to leverage profound data insights to enhance their operations. Our Model Serving product equips organizations with a cohesive, scalable, and governed platform for deploying and overseeing AI/ML models, spanning traditional ML to specialized large language models. It provides real-time, low-latency inference, governance, monitoring, and lineage capabilities. With the rapid rise of AI adoption, Model Serving stands as a fundamental component of the Databricks platform, enabling clients to operationalize models efficiently and cost-effectively at scale. As a Senior Engineer, your role will be pivotal in transforming both the product experience and the underlying infrastructure of Model Serving. You will design and create systems enabling high-throughput, low-latency inference across CPU and GPU workloads, influence architectural strategies, and work closely with platform, product, infrastructure, and research teams to deliver an exceptional serving platform.
Full-time|$194K/yr - $267.3K/yr|Hybrid|San Francisco, California
Okta secures identity for both people and AI, providing trusted infrastructure that helps organizations adapt to change. The company addresses complex, real-world problems with practical solutions and values urgency, excellence, and teamwork. The Developer Foundations team is looking for a Staff Software Engineer based in San Francisco, California. This engineer will play a key role in scaling Okta’s systems and accelerating software delivery. The position centers on refining the company’s approach to Continuous Delivery and driving improvements in engineering velocity and productivity across multiple teams. Collaboration is central in this role. The Staff Software Engineer works closely with engineers, architects, operations, program management, and quality assurance. The team values fresh ideas and solutions that have a direct impact on internal developer experience. What you will do Create high-quality internal tools and automation that support continuous delivery and boost developer productivity. Design and implement Continuous Delivery pipelines for a range of projects, using technologies such as Java, Jenkins, AWS, Docker, Python, Node, iOS, Ruby, Bash, and Go. Develop proof of concepts, guide technology decisions, contribute to internal frameworks, and participate in design and code reviews. Roll out solutions to internal users in phases, monitor adoption, gather feedback, and refine approaches to fit team needs. Maintain pre-production infrastructure on AWS, focusing on monitoring, backup and restore, SLA management, cost control, and deployment processes.
Full-time|$217K/yr - $312.2K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle the most challenging global issues—whether it's transforming transportation or speeding up medical advancements. We achieve this by constructing and managing the world's leading data and AI infrastructure platform, enabling our clients to leverage deep data insights for business enhancement. The Model Serving product at Databricks offers enterprises a cohesive, scalable, and governed platform for deploying and managing AI/ML models—from conventional ML to sophisticated, proprietary large language models. It facilitates real-time, low-latency inference while providing governance, monitoring, and lineage capabilities. As AI adoption surges, Model Serving becomes a central component of the Databricks platform, allowing customers to operationalize models efficiently and cost-effectively. As a Senior Engineering Manager, you will lead a team responsible for both the product experience and the underlying infrastructure of Model Serving. This role involves shaping user-facing features while architecting for scalability, extensibility, and performance across CPU and GPU inference. You will collaborate closely with various teams across the platform, product, infrastructure, and research domains.
Join Prima MenteAt Prima Mente, we are pioneers in the field of biology-focused artificial intelligence. Our mission is to generate unique datasets, develop versatile biological foundation models, and translate scientific breakthroughs into real-world clinical applications. Our primary focus is on understanding the brain in-depth, safeguarding it from neurological disorders, and enhancing its function during health. Our dynamic team of AI researchers, experimentalists, clinicians, and operational experts are strategically located in London, San Francisco, and Dubai.Your Role: Foundation Models for BiologyAs a Machine Learning Engineer, you will be instrumental in the design, implementation, and scaling of foundational AI models and infrastructure for multi-omics at an unprecedented scale. Your contributions will facilitate significant advancements in scientific comprehension and lead to groundbreaking applications in the medical and biological fields.Key Responsibilities:Develop high-performance machine learning algorithms optimized for large-scale applications, ensuring utmost reliability and efficiency.Design, implement, and maintain comprehensive experimentation pipelines that allow for rapid iterations, precise assessments, and reproducible research results.Refactor and enhance prototype research code into clean, maintainable, and efficient repositories prepared for production-level deployments.Create fast data processing workflows that can effectively manage extensive datasets to expedite research and model development.Engage in experimental design, with a focus on high-impact experiments that yield the greatest signal-to-noise ratio.Growth ExpectationsIn 1 month, you will initiate initial experiments utilizing state-of-the-art machine learning models, review and apply advanced research papers, and enhance existing code for improved efficiency and precision.By 3 months, you will take ownership of a prototype model architecture, showcasing notable algorithmic enhancements, and contribute to methods for large-scale data ingestion and training.Within 6 months, you will have significantly impacted the implementation of a high-performance foundation model, incorporating key algorithmic optimizations that improve scalability and throughput, along with publishing internal benchmarks that demonstrate substantial effects.
Full-time|On-site|San Francisco, Boulder, or Austin
Join Our MissionFoundation is on the lookout for exceptional engineers to propel our mission of modernizing the homebuilding industry and enhancing the journey of purchasing, selling, and owning homes.About UsWith an impressive backing of $6.8 million from top venture capitalists, including Y Combinator, Foundation is formed by a team of former Opendoor professionals dedicated to revolutionizing residential real estate.Our flagship product serves as a pioneering customer experience platform tailored for homebuilders, akin to “Shopify for Homebuilders.” We collaborate with large-scale homebuilders to provide a cutting-edge digital customer experience, significantly boosting customer satisfaction and team productivity. In just two years, we’ve achieved notable product-market fit and rapid growth, supported solely by contract design.Our Growth PhasesWe are currently navigating the first of three interconnected growth phases:Innovative, AI-driven vertical SaaS for homebuilding - a public-scale potential in its own right.The enterprise ecosystem for real estate - enabling collaboration among adjacent trillion-dollar industries including lending, title, home insurance, and home services.The AI-native home operating system and interface - leveraging the network effects of the enterprise ecosystem to deliver transformative AI experiences for homebuyers and homeowners. Discover, purchase, and manage your home seamlessly with Foundation.Your RoleIn this pivotal role, you will contribute to creating the most reliable and modern customer experience platform within the homebuilding sector, transitioning traditional workflows into the AI era with insight, pragmatism, and empathy.You will work closely with product and customer teams, engaging in comprehensive stack development, collaborating with design and go-to-market teams, and owning significant portions of the platform from initial discovery to production refinement. Our focus is on real businesses, real revenue, and real users who require robust and reliable tools.
About Our TeamThe Applied Foundations team at OpenAI is at the forefront of safeguarding our innovative technology against diverse adversarial threats. Our primary mission is to ensure the integrity and security of our platforms as they grow.We are dedicated to defending against financial misuse, large-scale attacks, and other forms of exploitation that could compromise user experience or destabilize our operations. The Integrity Foundations team lays the groundwork and infrastructure to support this critical mission.About the RoleAt OpenAI, we aim to advance artificial intelligence in a manner that is safe, reliable, and aligned with broader societal values. The role of Software Engineer in Applied Foundations is vital for maintaining the dependability of our platforms. You will play a key role in developing strong defenses against a variety of adversarial behaviors that threaten our ecosystem.In this position, you will collaborate with our entire engineering team to design and implement systems that detect and prevent abuse, promote user safety, and mitigate risks across our platform. You will be at the forefront of our initiatives to responsibly and sustainably harness the vast potential of AI.Key Responsibilities:Design and enhance systems to identify and prevent various forms of abuse, including financial fraud, botting, and scripting.Work collaboratively with cross-functional teams to create solutions that defend against adversarial attacks while preserving an optimal user experience.Assist in responding to active incidents on the platform and develop new tools and infrastructure to address fundamental challenges.You will excel in this role if you:Possess a minimum of 3 years of professional experience in software engineering.Have experience in setting up and maintaining production backend services and data pipelines.Exhibit a humble attitude, a willingness to support colleagues, and a commitment to team success.Demonstrate self-direction and enjoy innovating solutions to complex problems.Take ownership of issues from start to finish and are eager to acquire any necessary knowledge to achieve your goals.Have a passion for AI safety in production environments and the skills to build effective software systems.
Full-time|Remote|San Francisco, CA, US; Remote, US
Join Pinterest as a Staff Software Engineer in the Agent Foundations team, where you will be instrumental in developing and enhancing our core engineering capabilities. You will collaborate with cross-functional teams to create innovative solutions that drive our platform forward. Your expertise will contribute to shaping the future of our product and ensuring a seamless user experience.
Full-time|$185K/yr - $222K/yr|On-site|San Francisco, CA
Lyft’s Self-Serve Intelligence team builds the systems that help riders and drivers resolve issues on their own. Part of the Safety & Customer Care organization, this group focuses on backend services, APIs, and AI-powered products that let customers get help without waiting for an agent. The team’s work includes AI Assist (such as AI Agents), automations, and self-service workflows, all designed to make support fast and reliable. Role overview As a Senior Software Engineer on this team, the main responsibility is to design, build, deploy, and maintain backend systems and AI-driven tools that handle customer problems automatically. These solutions use Generative AI and automation to deliver scalable, dependable self-service experiences for millions of Lyft riders and drivers. What you will do Design and develop backend services and APIs for AI-powered self-service products Build and maintain AI Agents and automation tools that resolve customer issues without agent involvement Oversee the full development lifecycle: system design, prototyping, deployment, and ongoing operations Work closely with product managers, designers, data scientists, and operations teams to deliver robust solutions Focus on reliability, scalability, and operational excellence in all systems Location This role is based in San Francisco, CA.
About GranicaGranica is a pioneering AI research and infrastructure company dedicated to creating reliable and steerable representations of enterprise data.We build trust through Crunch, a policy-driven health layer designed to keep extensive tabular datasets efficient, reliable, and reversible. From this foundation, we are developing Large Tabular Models—systems that learn cross-column and relational structures to provide trustworthy answers and automation, complete with built-in provenance and governance.Our MissionThe current limitations of AI are not solely due to model design but also to the inefficiencies of the data that supports it. At scale, every redundant byte, poorly organized dataset, and inefficient data path contributes to significant costs, latency, and energy waste.Granica’s mission is to eliminate these inefficiencies. We leverage cutting-edge research in information theory, probabilistic modeling, and distributed systems to create self-optimizing data infrastructures that continuously enhance how information is represented and utilized by AI.Our engineering team collaborates closely with the Granica Research group led by Prof. Andrea Montanari from Stanford University, merging advancements in information theory and learning efficiency with large-scale distributed systems. We believe that the next major breakthrough in AI will stem from innovations in efficient systems, rather than simply larger models.What You Will CreateGlobal Metadata Substrate. Design and refine the global metadata and transactional substrate that enables atomic consistency and schema evolution across exabyte-scale data systems.Adaptive Engines. Architect systems that self-optimize, reorganizing and compressing data according to access patterns, achieving unprecedented efficiency improvements.Intelligent Data Layouts. Innovate new encoding and layout strategies that challenge the theoretical limits of signal per byte read.Autonomous Compute Pipelines. Spearhead the development of distributed compute platforms that scale predictively and maintain reliability even under extreme load and failure conditions.Research to Production. Partner with Granica Research to transform advances in compression and probabilistic modeling into production-ready, industry-leading systems.Latency as Intelligence. Propel systems forward by optimizing for latency as a key aspect of intelligence.
Full-time|On-site|CA - San Francisco; WA - Seattle; UT - Cottonwood Heights
Join SoFi as a Senior Software Engineer in our Data Foundations team, where you will play a pivotal role in shaping our data architecture and enhancing our data-driven capabilities. You will work closely with cross-functional teams to develop robust data solutions that empower our business decisions and improve customer experiences.As a Senior Software Engineer, you will leverage your expertise in data engineering, software development, and cloud technologies to build scalable data pipelines and maintain high-quality data infrastructure. Your contributions will directly impact our ability to deliver innovative financial solutions.
Join Us as Our First Marketing LeadFoundation is on the lookout for our inaugural marketing leader to propel our vision of revolutionizing homebuilding and enhancing the journey of buying, selling, and owning a home.About FoundationWith approximately $6.8M in backing from top-tier venture capitalists, including Y Combinator, Foundation is composed of a dynamic team formerly from Opendoor, dedicated to reshaping the future of residential real estate.Our flagship product is a cutting-edge customer experience platform designed specifically for homebuilders—think of it as the "Shopify for Homebuilders." We collaborate with large-scale homebuilders to deliver a modern digital experience, significantly boosting customer satisfaction and team productivity. In just two years, we've achieved remarkable product-market fit and impressive growth, all without a dedicated marketing team.Our Growth JourneyWe are currently navigating the first of three interconnected growth phases:AI-Driven SaaS for Homebuilding: A transformative opportunity with public-scale potential.Real Estate Enterprise Ecosystem: Homebuilders drive this ecosystem, which fosters collaboration among adjacent trillion-dollar sectors such as lending, title, home insurance, and retail.AI Native Home Operating System: This will enable seamless home buying and ownership through our platform.Your Role as Our First MarketerWe seek a hands-on, results-driven marketer passionate about transforming a key sector of the U.S. economy and redefining marketing in the age of AI.Key ResponsibilitiesYou will be pivotal in steering Foundation's next growth phases by integrating AI with marketing and real estate innovation. Your primary objectives will include:Accelerating Growth: Drive rapid expansion of our core AI-driven product line for homebuilders.
Full-time|On-site|San Francisco / Tel Aviv / Zurich
Tzafon is at the forefront of machine intelligence, operating as a cutting-edge foundation model lab dedicated to building scalable computing systems. With offices in San Francisco, Zurich, and Tel Aviv, we have secured over $12 million in funding to propel our mission of expanding the boundaries of machine intelligence.Our talented team comprises engineers and scientists with extensive expertise in ML infrastructure and research, founded by distinguished IOI and IMO medalists, PhD holders, and alumni from top tech firms such as Google DeepMind, Character, and NVIDIA. We specialize in training models and constructing infrastructure for swarms of agents to automate tasks across real-world environments.In this role, you'll collaborate between our product and post-training teams to deploy Large Action Models that deliver results. Your responsibilities will include building evaluations, benchmarks, and fine-tuning pipelines, as well as defining optimal model behavior and achieving it at scale.
Full-time|$191K/yr - $223K/yr|On-site|United States
Founded in 2007, Airbnb has transformed the way people travel and connect. From humble beginnings with just three guests in a San Francisco home, we have grown to a global community of over 5 million hosts who have welcomed more than 2 billion guest arrivals across nearly every country. Our hosts provide unique accommodations and experiences that foster authentic connections with local communities.Join Our Talented Community:As a member of Airbnb's App Foundation team, you will collaborate across platforms to create high-quality, performant capabilities that enhance nearly all features within the Guest and Host ecosystems.Our primary focus is on developing App Product Frameworks, Insights & Logging, Performance & App Health, and Feature Architecture. We work hand in hand with Product, Design, Platform (iOS/Android/Web), Analytics Infrastructure, Data Platform, and other Product Foundation teams to establish cohesive paved paths and standards at scale. We pride ourselves on a culture that values technical excellence, pragmatic decision-making, robust ownership, and a steadfast commitment to enhancing both developer and user experiences through foundational work.Your Impact:Collaborate with cross-functional partners (design and product) to explore, shape, and implement new product experiences from ideation to large-scale execution.Develop efficient and reusable backend capabilities that prioritize quality while ensuring performance and scalability.Lead initiatives that significantly enhance Guest and Host experiences by improving app responsiveness and ensuring reliable performance across critical backend systems impacting millions.Establish a performance roadmap by identifying bottlenecks, prioritizing impactful work, and delivering enhancements across services, data access patterns, and infrastructure.Elevate performance engineering standards by creating tools, benchmarks, and guardrails that prevent regressions and integrate performance considerations into team workflows.Influence architecture and standards across Airbnb's backend ecosystem to improve observability, efficiency, and adaptability of systems.A Day in Your Life:Join us in making a difference for millions of users worldwide by building the backend foundations that power the Airbnb experience.
Join Us in Revolutionizing AI InfrastructureAt Meter, we are pioneering the application of cutting-edge AI technology to transform the way the internet is constructed, monitored, and managed.Our vertical integration encompasses the entire enterprise networking stack: from hardware and firmware to operating systems and operations. This unique position offers us comprehensive visibility and control over the entire stack via a singular API, along with a proprietary dataset that is unmatched in the industry, paving the way for complete end-to-end automation. Our solutions are already in use by Fortune 500 companies, educational institutions, manufacturing facilities, and cloud-scale clients.We are in the process of assembling a founding core engineering team dedicated to developing and training models that can comprehend these systems, enhance operational efficiency, predict failures, and resolve issues proactively. In essence, you will be instrumental in creating the decision-making framework that underpins the infrastructure of the modern world.You will collaborate closely with our founders, playing a key role in shaping the future of one of the most impactful applications of models available today.Learn more about us at meter.ai.
Full-time|$216.2K/yr - $270.3K/yr|On-site|San Francisco, CA; New York, NY
Join our dynamic Machine Learning Infrastructure team as a Senior AI Infrastructure Engineer, where you will play a pivotal role in designing and constructing platforms that ensure the scalable, reliable, and efficient serving of Large Language Models (LLMs). Our innovative platform supports a range of cutting-edge research and production systems, catering to both internal and external applications across diverse environments.The ideal candidate will possess a solid foundation in machine learning principles coupled with extensive experience in backend system architecture. You will thrive in a collaborative environment that bridges research and engineering, working diligently to provide seamless experiences for our customers and accelerating innovation across the organization.
Join Our Team as a Customer Success Manager/Lead!At Foundation, we're on a mission to revolutionize the homebuilding industry and enhance the experience of buying, selling, and owning homes. We are looking for a dedicated Customer Success Manager to join our dynamic team and drive our core business forward.About UsWith $6.8M in funding from top-tier venture capitalists, including Y Combinator, Foundation is comprised of a talented team formerly from Opendoor, focused on transforming residential real estate. Our flagship product is a cutting-edge customer experience platform tailored for homebuilders — envision it as the 'Shopify for Homebuilders'. We partner with large-scale homebuilders to provide a modern digital experience that significantly boosts customer satisfaction and team productivity. Within just two years, we have achieved remarkable growth and established a strong product-market fit.The RoleAs a Customer Success Manager, you will play a pivotal role in fostering and expanding relationships with our diverse portfolio of clients. You will be the primary contact for our customers post-onboarding, ensuring they maximize their use of our platform and derive long-term value. This position requires a proactive approach in managing multiple accounts, identifying risks and opportunities, and collaborating closely with our Product, Engineering, and Operations teams.This is an ideal opportunity for someone who thrives in a fast-paced environment, enjoys tackling challenges, and seeks a meaningful role within a scaling startup. You will have the autonomy to influence our customer engagement strategies and drive significant impact.
ABOUT BASETENAt Baseten, we are at the forefront of AI innovation, providing critical inference solutions for leading AI companies like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer. Our platform combines advanced AI research, adaptable infrastructure, and intuitive developer tools, empowering organizations to deploy state-of-the-art models effectively. With rapid growth and a recent $300M Series E funding round backed by top-tier investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we invite you to join our mission in building the platform of choice for engineers delivering AI products.THE ROLE:As a member of Baseten’s Model Performance (MP) team, you will play a pivotal role in ensuring our platform’s model APIs are not only fast and reliable but also cost-effective. Your primary focus will be on developing and optimizing the infrastructure that supports our hosted API endpoints for cutting-edge open-source models. This role involves working with distributed systems, model serving, and enhancing the developer experience. You will collaborate with a small, dynamic team at the intersection of product development, model performance, and infrastructure, defining how developers interact with AI models on a large scale.RESPONSIBILITIES:Design, develop, and maintain the Model APIs surface, focusing on advanced inference features such as structured outputs (JSON mode, grammar-constrained generation), tool/function calling, and multi-modal serving.Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, create custom CUDA operators, and enhance memory allocation patterns for maximum efficiency across multi-GPU setups.Implement performance improvements across various runtimes based on a deep understanding of their internals, including speculative decoding, guided generation for structured outputs, and custom scheduling algorithms for high-performance serving.Develop robust benchmarking frameworks to evaluate real-world performance across diverse model architectures, batch sizes, sequence lengths, and hardware configurations.Enhance performance across runtimes (e.g., TensorRT, TensorRT-LLM) through techniques such as speculative decoding, quantization, batching, and KV-cache reuse.Integrate deep observability mechanisms (metrics, traces, logs) and establish repeatable benchmarks to assess speed, reliability, and quality.
About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.
Feb 6, 2025
Sign in to browse more jobs
Create account — see all 5,891 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.