GPU Performance Engineer

GenmoSan Francisco HQ

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

QualificationsBachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field. A minimum of 5 years of systems programming experience, with at least 3 years specializing in GPU optimization. Expert-level proficiency with GPU profiling tools including Nsight Systems and nvprof. Strong skills in CUDA programming and experience in developing production kernels. In-depth knowledge of GPU architecture including memory hierarchy, streaming multiprocessors (SMs), and warps. Proven track record of achieving substantial performance improvements (5-10x). Experience with Python and C++ in production environments.

About the job

We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.

Key Responsibilities

Utilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.
Develop high-performance CUDA and Triton kernels to optimize essential model functions.
Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.
Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.
Collaborate closely with machine learning engineers to optimize model implementations.
Diagnose and resolve performance issues throughout the application and hardware stack.
Implement custom memory pooling and allocation strategies to enhance performance.
Promote performance optimization techniques and foster a culture of excellence across teams.

About Genmo

Genmo is a pioneering research lab focused on developing open and advanced models for video generation. Our commitment to innovation aims to unlock new possibilities in artificial intelligence, particularly in the realm of AGI.

Similar jobs

1 - 20 of 4,932 Jobs

Search for Technical Internship In Ai And Gpu Optimization

4,932 results

Select all on this page (20)

Apply

Technical Internship in AI and GPU Optimization

Wafer

Internship|On-site|San Francisco

About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.

Oct 15, 2025

Apply

Technical Intern - GPU Optimization and AI Infrastructure

Wafer

Internship|On-site|San Francisco

About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.

Oct 15, 2025

Apply

Technical Staff Engineer - GPU Optimization at Wafer | San Francisco

Wafer

Full-time|On-site|San Francisco

About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.

Feb 4, 2026

Apply

Technical Staff Member - GPU Performance & Kernel Optimization

Gimlet Labs

Full-time|On-site|San Francisco

At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.

Mar 10, 2026

Apply

Technical Staff Member - GPU Performance Engineering at Liquid AI | San Francisco

Liquid AI

Full-time|On-site|San Francisco

Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.

Jul 29, 2025

Apply

Technical Intern (Member of Technical Staff) at Wafer | San Francisco

Wafer

Internship|On-site|San Francisco

About the InternshipEmbark on an exciting journey with Wafer as a Technical Intern, where you'll play a pivotal role in shaping the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our talented team to define our technical strategy and develop the foundational systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and build scalable infrastructure for training and inference of AI models.Make key architectural decisions and lead technical direction.

Oct 15, 2025

Apply

GPU Performance Engineer

Genmo

Full-time|On-site|San Francisco HQ

At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.Key ResponsibilitiesUtilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.Develop high-performance CUDA and Triton kernels to optimize essential model functions.Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.Collaborate closely with machine learning engineers to optimize model implementations.Diagnose and resolve performance issues throughout the application and hardware stack.Implement custom memory pooling and allocation strategies to enhance performance.Promote performance optimization techniques and foster a culture of excellence across teams.

Jul 17, 2025

Apply

GPU Performance Engineer - Member of Technical Staff

Reka

Full-time|Remote|US, UK, Singapore, Remote

Join our dynamic team at Reka as a GPU Performance Engineer, where you will leverage your expertise in Python and large-scale model training to enhance our training infrastructure. You will play a pivotal role in optimizing model performance, contributing to critical technical decisions, and improving our post-training processes, including reinforcement learning and fine-tuning. Your contributions will also focus on enhancing the efficiency and scalability of our model serving infrastructure.

Jan 8, 2026

Apply

Go-to-Market Champion for GPU & AI Infrastructure

Impossible Cloud

Full-time|Hybrid|On-site/ Hybrid / Remote

Group: Impossible Cloud / Impossible Cloud Network (ICN)Focus: Integrating Enterprise Storage with Decentralized GPU OrchestrationOur MissionAt Impossible Cloud, we are transforming enterprise storage through our patented decentralized object storage technology, delivering a high-performance, cost-effective infrastructure. We aim to expand this foundation by creating a next-generation AI-first platform that integrates storage, compute, and GPU functionalities.We are looking for a dynamic and hands-on Go-to-Market Champion specializing in AI and GPU Infrastructure to accelerate Impossible Cloud's position in the market for Agentic AI infrastructure. This is an exceptional opportunity to join a rapidly growing AI infrastructure company during a critical phase, owning the GTM strategy from development to scaling a successful sales organization.In this role, you will collaborate closely with founders, Product, Marketing, and Customer Success teams to transform our viral product into a reliable, scalable revenue machine for enterprises. Our culture thrives on relentless innovation, accountability, and ownership, where each team member is dedicated to excellence and urgency in their work.Key Responsibilities- Develop and execute Impossible Cloud’s global Go-to-Market (GTM) strategy, focusing on market segmentation, value propositions, pricing, and packaging for GPU cloud and AI infrastructure tailored to enterprises, startups, and research entities.- Create scalable customer acquisition and retention strategies through direct sales, channels, and partnerships, enhancing commercial enablement and managing the customer journey (both commercial and technical).- Build and lead a high-performing global GTM team encompassing presales, direct sales, partnerships, solutions engineering, marketing, and customer success, while developing playbooks and performance metrics to instill a culture of customer focus and excellence.- Work closely with Product and Engineering to align GTM strategies with the product roadmap, integrating direct customer insights, and gathering market intelligence to anticipate trends in AI and cloud technology adoption.- Identify, negotiate, and lead strategic partnerships with AI firms, ISVs, integrators, and cloud marketplaces, while engaging with Enterprise and AI Native clients as a trusted advisor.

Feb 26, 2026

Apply

Energy Optimization Engineering Internship

Redwood Materials

Internship|$41/hr - $54/hr|On-site|San Francisco, California, United States

About Redwood MaterialsRedwood Materials is at the forefront of localizing a global battery supply chain that integrates recovery, reuse, and recycling—ensuring critical minerals remain in circulation while propelling the energy transition. Founded in 2017, we are pioneering low-cost, large-scale energy storage and manufacturing battery materials in the U.S. for the first time, all sourced from batteries that are currently in circulation.Essential Duties: As an Energy Optimization Engineering Intern, you will play a crucial role in developing the predictive "intelligence layer" designed to manage energy for AI Data Centers and microgrids. Under the mentorship of senior engineers, you will assist in constructing and validating time-series forecasting models for GPU power loads and market prices, integrating these insights into Mixed-Integer Programming (MIP) prototypes. You will work alongside cloud software teams to test these "forecast-informed" algorithms in a cloud-native setting, aiding in the simulation and backtesting of energy management strategies. Your goal is to enhance the accuracy and efficiency of our Energy Management System (EMS), gaining practical experience in "value-stacking" and real-world energy optimization. This internship is for Summer 2026.Responsibilities Include:

Mar 18, 2026

Apply

GPU Kernel Engineer

Baseten

Full-time|On-site|San Francisco

ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.

Jul 17, 2025

Apply

Global GPU Commodity Manager

Andromeda Cluster

Full-time|Remote|Global Remote / San Francisco, CA

Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster provides early-stage startups with access to scaled AI infrastructure, once exclusive to hyperscalers. Our journey began with a single managed cluster that rapidly gained demand, leading us to develop a robust system, network, and orchestration layer to democratize AI infrastructure.Today, we partner with leading AI labs, data centers, and cloud providers to efficiently deliver compute resources wherever needed. Our platform expertly routes training and inference jobs across global supply chains, promoting flexibility and efficiency in one of the fastest-growing markets in the world.Our vision is to create a liquidity layer for global AI compute, and we are on the lookout for bright minds in AI infrastructure, research, and engineering to join our expanding team.The OpportunityWe are seeking a dedicated Global GPU Commodity Manager to enhance the supply and demand matching on our platform. This role is an Individual Contributor position reporting to the Head of Infrastructure. The Infrastructure team is pivotal to our operations, responsible for acquiring and facilitating compute resources across the organization while collaborating closely with compute providers, sales, and technical teams to align supply with demand.With a solid foundation established with our providers, we are now scaling to expand our network and liquidity, broaden our service offerings, and accelerate our growth trajectory.What You'll DoMatch incoming leads from the sales team to internal and external market capacity.Maximize utilization of compute resources.Source and onboard new compute suppliers globally.Identify capacity based on customer requirements and market trends.Resolve customer and supplier challenges in a fast-paced environment.Analyze technical and commercial differences between suppliers to optimize our capacity funnel.Develop a proactive compute strategy driven by market intelligence.Negotiate costs with suppliers and other vendors.Create and implement processes around capacity planning.

Mar 25, 2026

Apply

GPU Kernel Engineer

Sciforium

Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.

Dec 6, 2025

Apply

Bioinformatics Software Engineer (GPU Accelerated)

Prima Mente

Full-time|On-site|San Francisco

Join Prima Mente: A Leader in Biology AIAt Prima Mente, we are redefining the frontier of biology through artificial intelligence. Our mission is to generate unique datasets, develop versatile biological foundation models, and translate groundbreaking discoveries into impactful research and clinical outcomes. With a commitment to understanding the complexities of the brain, we aim to shield it from neurological diseases while enhancing overall health. Our diverse team of AI researchers, experimentalists, clinicians, and operational experts operates across London, San Francisco, and Dubai.Role Overview: GPU/CPU-Accelerated BioinformaticsWe are seeking a skilled Bioinformatics Software Engineer to architect and implement scalable production pipelines for processing multi-omics data. The successful candidate will enable rapid transitions from hypothesis to patent-ready solutions in a matter of months.Key Responsibilities:Design and implement bioinformatics pipelines optimized for GPU/CPU utilization utilizing tools like Flyte and Nextflow, capable of processing over 1,000 samples at scale.Optimize performance and cost efficiency by leveraging GPU/CPU acceleration where it provides the greatest benefit.Collaborate with experimental and machine learning teams to validate computational results and align processing with model requirements.Foster and manage collaborations with academic and industrial research partners.Growth Expectations1 Month: You will be deploying your workflows on GPU/CPU-accelerated cloud infrastructure to process multi-omic experiments, while building relationships with AI/ML and wet lab teams to understand their requirements.3 Months: Your optimized pipelines will be processing thousands of samples with substantial speed enhancements and reduced costs, yielding publication and patent-ready outcomes.6 Months: Your automated pipelines will support daily AI model training, and you will co-design experiments alongside AI/ML engineers, leading technical execution on external collaborations.Your ProfileYou are passionate about pushing the boundaries of AI and biology. As an engineer rather than an analyst, you thrive on enhancing performance and efficiency while architecting robust systems. You are comfortable making rapid technical decisions and iterate quickly.Desired QualificationsExperience in bioinformatics, computational biology, or a related field.Proficiency in software engineering, particularly in developing scalable data processing pipelines.Strong understanding of multi-omics data and methods.Familiarity with GPU/CPU acceleration techniques.Excellent communication and collaboration skills.

Mar 2, 2026

Apply

Member of Technical Staff - GPU Infrastructure

Prime Intellect

Full-time|On-site|San Francisco

Join Our Mission to Build Open Superintelligence InfrastructureAt Prime Intellect, we are pioneering the development of an open superintelligence stack that encompasses cutting-edge agentic models and the infrastructure that empowers anyone to create, train, and deploy these advanced AI systems. Our innovative approach aggregates and orchestrates global computational resources into a cohesive control plane, complemented by a comprehensive reinforcement learning (RL) post-training toolkit that includes environments, secure sandboxes, verifiable evaluations, and our asynchronous RL trainer. We provide researchers, startups, and enterprises with the capabilities to execute end-to-end reinforcement learning at unparalleled scale, adapting models to real-world tools, workflows, and deployment scenarios.As a Solutions Architect for GPU Infrastructure, you will be the technical authority responsible for translating customer needs into robust, production-ready systems designed to train the world’s most sophisticated AI models.With a recent funding round raising $15 million (totaling $20 million) led by Founders Fund, alongside contributions from Menlo Ventures and illustrious angels such as Andrej Karpathy (Tesla, OpenAI), Tri Dao (Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), and Emad Mostaque (Stability AI), we are poised for significant growth and innovation.Key Technical ResponsibilitiesThis role requires a blend of deep technical knowledge and hands-on implementation skills. Your contributions will be crucial in:Customer Architecture & DesignCollaborating with clients to comprehend workload specifications and architect optimal GPU cluster solutions.Drafting technical proposals and conducting capacity planning for clusters ranging from 100 to over 10,000 GPUs.Formulating deployment strategies for large language model (LLM) training, inference, and high-performance computing (HPC) tasks.Delivering architectural recommendations to both technical teams and executive stakeholders.Infrastructure Deployment & OptimizationImplementing and configuring orchestration frameworks such as SLURM and Kubernetes for distributed workloads.Establishing high-performance networking through InfiniBand, RoCE, and NVLink interconnects.Enhancing GPU utilization, memory management, and inter-node communication.Setting up parallel file systems (Lustre, BeeGFS, GPFS) to maximize I/O efficiency.Tuning system performance, from kernel parameters to CUDA configurations.Production Operations & SupportEnsuring the reliability and performance of GPU infrastructure through continuous monitoring and support.Collaborating with cross-functional teams to troubleshoot and optimize operational workflows.Documenting processes and creating training materials for team members and clients.

Aug 30, 2025

Apply

Product Manager, GPU Infrastructure NPI

Fluidstack

Full-time|$150K/yr - $250K/yr|On-site|San Francisco, CA

About FluidstackFluidstack is at the forefront of building groundbreaking infrastructure designed for the future of intelligence. We collaborate with premier AI research labs, government entities, and leading enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver compute solutions at unparalleled speeds.Our mission is to expedite the realization of Artificial General Intelligence (AGI). Our team is dedicated, passionate, and driven to create world-class infrastructure, treating our clients' success as our own. If you possess a strong sense of purpose, a dedication to excellence, and the willingness to work diligently to transform the future of intelligence, we welcome you to join us in shaping what lies ahead.About the RoleWe are seeking a Product Manager to spearhead New Product Introduction (NPI) for our GPU infrastructure. You will collaborate with our datacenter, infrastructure, and networking teams to launch new GPU SKUs and compute solutions. Your role will involve defining the frameworks through which Fluidstack assesses, qualifies, and brings new GPU generations to market—from NVIDIA Blackwell and Rubin to AMD MI300X and future accelerators. This highly cross-functional position demands strong technical acumen, adept vendor relationship management, and a clear understanding of how hardware capabilities align with customer workload requirements. By doing so, you will help ensure that Fluidstack remains a leader in providing optimal compute options tailored for training, inference, and specialized AI workloads.Key ResponsibilitiesManage the NPI roadmap for GPU SKUs, including evaluation criteria, qualification timelines, and market strategies for new hardware generations.Collaborate with datacenter teams to establish requirements for power delivery (HVDC/LVDC), cooling systems (liquid vs. air), rack architecture, and the physical infrastructure necessary for next-gen GPUs.Engage with infrastructure engineers to validate hardware performance across essential metrics: training throughput (MFU), inference latency (TTFT, TBT), memory bandwidth, and interconnect topology (NVLink, InfiniBand).Foster vendor relationships with NVIDIA, AMD, and emerging XPU providers—conducting in-depth technical discussions, negotiating supply agreements, and overseeing early access programs.Define product specifications for system configurations: single-GPU instances, multi-GPU nodes, full rack deployments, and megacluster architectures.Analyze customer workload profiles to identify the optimal GPU mix: H100 for large model training, L40S for inference, B200 for frontier research, and MI300X for cost-sensitive workloads.Develop business cases for new SKU introductions.

Mar 3, 2026

Apply

Performance Engineer - Member of Technical Staff, Kernel Engineering

Inferact

Full-time|$200K/yr - $400K/yr|Remote|San Francisco

At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, significantly enhancing the speed and reducing the cost of AI inference. Our founders, the visionaries behind vLLM, have spent years bridging the gap between advanced models and cutting-edge hardware.About the RoleWe are seeking a skilled performance engineer dedicated to maximizing the computational efficiency of modern accelerators. In this role, you'll develop kernels and implement low-level optimizations that position vLLM as the fastest inference engine globally. Your contributions will be pivotal as your code will execute across a broad spectrum of hardware accelerators, from NVIDIA GPUs to the latest silicon innovations. You'll collaborate closely with hardware vendors to ensure we fully leverage the capabilities of each new generation of chips.

Jan 22, 2026

Apply

Senior HPC & GPU Infrastructure Engineer

Sciforium

Full-time|On-site|San Francisco

At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.

Jan 7, 2026

Apply

Multimodal AI Model Optimization Research Engineer

Tavus

Full-time|On-site|San Francisco (London/Europe - OK)

Tavus – Multimodal AI Model OptimizationResearch EngineerAt Tavus, we are pioneering the human aspect of AI technology. Our objective is to make human-AI interactions as seamless and natural as in-person conversations, allowing for a human touch in areas that were once considered unscalable.We accomplish this through groundbreaking research in multimodal AI, focusing on human-to-human communication modeling (encompassing language, audio, and video) and the development of audio-visual avatar behaviors. Our innovative models drive applications ranging from text-to-video AI avatars to real-time conversational video experiences across sectors such as healthcare, recruitment, sales, and education.By empowering AI to perceive, listen, and engage with an authentic human-like presence, we are laying the groundwork for the next generation of AI workers, assistants, and companions.As a Series B company, we are supported by renowned investors, including Sequoia, Y Combinator, and Scale VC. Join us as we shape the future of human-AI interaction.The RoleWe are seeking an accomplished Research Scientist/Engineer with expertise in model optimization to be a vital part of our core AI team.The ideal candidate thrives in dynamic startup environments, is adept at setting priorities independently, and is open to making calculated decisions. We are moving swiftly and need individuals who can help navigate our path forward.Your MissionTransform state-of-the-art research models into fast, efficient, and production-ready systems through techniques such as sparsification, distillation, and quantization.Oversee the optimization lifecycle for critical models: establish metrics, conduct experiments, and evaluate trade-offs among latency, cost, and quality.Collaborate closely with researchers and engineers to convert innovative concepts into deployable solutions.RequirementsExtensive experience in deep learning with PyTorch.Practical experience in model optimization and compression, including knowledge distillation, pruning/sparsification, quantization, and mixed precision.Familiarity with efficient architectures such as low-rank adapters.Strong grasp of inference performance and GPU/accelerator fundamentals.Proficient in Python coding and adherence to best practices in research engineering.Experience with large models and datasets in cloud environments.Capability to read ML literature, reproduce results, and modify ideas accordingly.

Apr 3, 2026

Apply

Technical Advisor Specialist Internship - Part-Time

Scale AI

Internship|$50/hr - $50/hr|Remote|San Francisco, CA

About the Internship ProgramThe Scale AI Technical Advisor Specialist (Part-Time Internship) is an exciting summer semester opportunity tailored for university students, particularly those with a background in competitive coding, mathematics, and STEM-related fields. As a Specialist, you will engage in innovative projects that redefine the landscape of generative AI, providing you with invaluable exposure to advanced research and technology. You'll enjoy the flexibility to tailor your work hours around your academic commitments, access to renowned AI professionals, and a supportive team dedicated to fostering your professional development.Your ResponsibilitiesEngage in Pioneering AI Projects: Contribute to initiatives aimed at enhancing AI capabilities, including training models for complex reasoning tasks and identifying potential model failure modes. You’ll work with cutting-edge technologies, significantly boosting your technical skills.Participate in Knowledge-Sharing Sessions: Join bi-weekly focus groups to share insights, explore new AI methodologies, and learn from seasoned researchers. This collaborative environment promotes in-depth discussions that sharpen your critical thinking and communication skills.Collaborate on Team Projects: Work closely with fellow specialists on creative challenges, ranging from writing engaging content to devising innovative AI solutions. This teamwork will enhance your leadership and problem-solving skills.Enjoy Flexible Work Hours: Set your own schedule to balance your academic life with meaningful contributions. Our structure supports your commitments without compromising the quality of your work.

Mar 26, 2026

Create account — see all 4,932 results