Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
What We Look ForWe are seeking candidates who have a blend of technical skills and a passion for innovation in AI and GPU technologies. Your ability to collaborate with a team and contribute to complex projects will be key to your success.
About the job
About the Role
We're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.
Your Responsibilities
Design and implement scalable infrastructure for AI model training and inference tasks.
Guide the team in making technical decisions and architectural choices.
Qualifications We Seek
Essential Technical Skills
GPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.
Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.
Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.
Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.
Preferred Background
Publications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.
Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
About Wafer
wafer is at the forefront of AI infrastructure development, dedicated to pushing the boundaries of GPU optimization. We foster a collaborative environment where innovation thrives, and we are passionate about empowering our team to make impactful contributions.
Full-time|Remote|Global Remote / San Francisco, CA
Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster provides early-stage startups with access to scaled AI infrastructure, once exclusive to hyperscalers. Our journey began with a single managed cluster that rapidly gained demand, leading us to develop a robust system, network, and orchestration layer to democratize AI infrastructure.Today, we partner with leading AI labs, data centers, and cloud providers to efficiently deliver compute resources wherever needed. Our platform expertly routes training and inference jobs across global supply chains, promoting flexibility and efficiency in one of the fastest-growing markets in the world.Our vision is to create a liquidity layer for global AI compute, and we are on the lookout for bright minds in AI infrastructure, research, and engineering to join our expanding team.The OpportunityWe are seeking a dedicated Global GPU Commodity Manager to enhance the supply and demand matching on our platform. This role is an Individual Contributor position reporting to the Head of Infrastructure. The Infrastructure team is pivotal to our operations, responsible for acquiring and facilitating compute resources across the organization while collaborating closely with compute providers, sales, and technical teams to align supply with demand.With a solid foundation established with our providers, we are now scaling to expand our network and liquidity, broaden our service offerings, and accelerate our growth trajectory.What You'll DoMatch incoming leads from the sales team to internal and external market capacity.Maximize utilization of compute resources.Source and onboard new compute suppliers globally.Identify capacity based on customer requirements and market trends.Resolve customer and supplier challenges in a fast-paced environment.Analyze technical and commercial differences between suppliers to optimize our capacity funnel.Develop a proactive compute strategy driven by market intelligence.Negotiate costs with suppliers and other vendors.Create and implement processes around capacity planning.
Full-time|Remote|San Francisco Bay Area; U.S. Remote
Join WEKA, a trailblazer in AI and accelerated compute workflows, as we redefine data infrastructures with our cutting-edge NeuralMesh™ technology. Unlike conventional storage systems that falter under increasing demands, NeuralMesh™ enhances performance and efficiency as it scales, providing a robust foundation for enterprises and AI innovations. Our technology is trusted by over 30% of Fortune 50 companies and leading industry hyperscalers. If you thrive in a customer-focused, collaborative, and innovative environment, we want you on our team!About the Role:As the Director of Commodity Management & NPI, you will spearhead WEKA's supplier strategy, overseeing cost structures and hardware readiness across critical component categories such as SSDs, memory, and networking. Your leadership will cultivate long-term supplier partnerships, align roadmaps, and develop KPIs to ensure optimal availability, quality, and competitive edge in a fast-evolving market.You will also provide strategic oversight of WEKA's NPI pipeline, collaborating with engineering, product, and operations teams to manage program timelines, mitigate risks, and execute effectively. By leveraging your extensive domain expertise and operational acumen, you will enhance WEKA's market agility and drive hardware innovation, ensuring seamless and efficient product launches.
Full-time|$150K/yr - $250K/yr|On-site|San Francisco, CA
About FluidstackFluidstack is at the forefront of building groundbreaking infrastructure designed for the future of intelligence. We collaborate with premier AI research labs, government entities, and leading enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver compute solutions at unparalleled speeds.Our mission is to expedite the realization of Artificial General Intelligence (AGI). Our team is dedicated, passionate, and driven to create world-class infrastructure, treating our clients' success as our own. If you possess a strong sense of purpose, a dedication to excellence, and the willingness to work diligently to transform the future of intelligence, we welcome you to join us in shaping what lies ahead.About the RoleWe are seeking a Product Manager to spearhead New Product Introduction (NPI) for our GPU infrastructure. You will collaborate with our datacenter, infrastructure, and networking teams to launch new GPU SKUs and compute solutions. Your role will involve defining the frameworks through which Fluidstack assesses, qualifies, and brings new GPU generations to market—from NVIDIA Blackwell and Rubin to AMD MI300X and future accelerators. This highly cross-functional position demands strong technical acumen, adept vendor relationship management, and a clear understanding of how hardware capabilities align with customer workload requirements. By doing so, you will help ensure that Fluidstack remains a leader in providing optimal compute options tailored for training, inference, and specialized AI workloads.Key ResponsibilitiesManage the NPI roadmap for GPU SKUs, including evaluation criteria, qualification timelines, and market strategies for new hardware generations.Collaborate with datacenter teams to establish requirements for power delivery (HVDC/LVDC), cooling systems (liquid vs. air), rack architecture, and the physical infrastructure necessary for next-gen GPUs.Engage with infrastructure engineers to validate hardware performance across essential metrics: training throughput (MFU), inference latency (TTFT, TBT), memory bandwidth, and interconnect topology (NVLink, InfiniBand).Foster vendor relationships with NVIDIA, AMD, and emerging XPU providers—conducting in-depth technical discussions, negotiating supply agreements, and overseeing early access programs.Define product specifications for system configurations: single-GPU instances, multi-GPU nodes, full rack deployments, and megacluster architectures.Analyze customer workload profiles to identify the optimal GPU mix: H100 for large model training, L40S for inference, B200 for frontier research, and MI300X for cost-sensitive workloads.Develop business cases for new SKU introductions.
At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.Key ResponsibilitiesUtilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.Develop high-performance CUDA and Triton kernels to optimize essential model functions.Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.Collaborate closely with machine learning engineers to optimize model implementations.Diagnose and resolve performance issues throughout the application and hardware stack.Implement custom memory pooling and allocation strategies to enhance performance.Promote performance optimization techniques and foster a culture of excellence across teams.
ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.
At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.
About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.
About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
Join Prima Mente: A Leader in Biology AIAt Prima Mente, we are redefining the frontier of biology through artificial intelligence. Our mission is to generate unique datasets, develop versatile biological foundation models, and translate groundbreaking discoveries into impactful research and clinical outcomes. With a commitment to understanding the complexities of the brain, we aim to shield it from neurological diseases while enhancing overall health. Our diverse team of AI researchers, experimentalists, clinicians, and operational experts operates across London, San Francisco, and Dubai.Role Overview: GPU/CPU-Accelerated BioinformaticsWe are seeking a skilled Bioinformatics Software Engineer to architect and implement scalable production pipelines for processing multi-omics data. The successful candidate will enable rapid transitions from hypothesis to patent-ready solutions in a matter of months.Key Responsibilities:Design and implement bioinformatics pipelines optimized for GPU/CPU utilization utilizing tools like Flyte and Nextflow, capable of processing over 1,000 samples at scale.Optimize performance and cost efficiency by leveraging GPU/CPU acceleration where it provides the greatest benefit.Collaborate with experimental and machine learning teams to validate computational results and align processing with model requirements.Foster and manage collaborations with academic and industrial research partners.Growth Expectations1 Month: You will be deploying your workflows on GPU/CPU-accelerated cloud infrastructure to process multi-omic experiments, while building relationships with AI/ML and wet lab teams to understand their requirements.3 Months: Your optimized pipelines will be processing thousands of samples with substantial speed enhancements and reduced costs, yielding publication and patent-ready outcomes.6 Months: Your automated pipelines will support daily AI model training, and you will co-design experiments alongside AI/ML engineers, leading technical execution on external collaborations.Your ProfileYou are passionate about pushing the boundaries of AI and biology. As an engineer rather than an analyst, you thrive on enhancing performance and efficiency while architecting robust systems. You are comfortable making rapid technical decisions and iterate quickly.Desired QualificationsExperience in bioinformatics, computational biology, or a related field.Proficiency in software engineering, particularly in developing scalable data processing pipelines.Strong understanding of multi-omics data and methods.Familiarity with GPU/CPU acceleration techniques.Excellent communication and collaboration skills.
About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.
About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.
Group: Impossible Cloud / Impossible Cloud Network (ICN)Focus: Integrating Enterprise Storage with Decentralized GPU OrchestrationOur MissionAt Impossible Cloud, we are transforming enterprise storage through our patented decentralized object storage technology, delivering a high-performance, cost-effective infrastructure. We aim to expand this foundation by creating a next-generation AI-first platform that integrates storage, compute, and GPU functionalities.We are looking for a dynamic and hands-on Go-to-Market Champion specializing in AI and GPU Infrastructure to accelerate Impossible Cloud's position in the market for Agentic AI infrastructure. This is an exceptional opportunity to join a rapidly growing AI infrastructure company during a critical phase, owning the GTM strategy from development to scaling a successful sales organization.In this role, you will collaborate closely with founders, Product, Marketing, and Customer Success teams to transform our viral product into a reliable, scalable revenue machine for enterprises. Our culture thrives on relentless innovation, accountability, and ownership, where each team member is dedicated to excellence and urgency in their work.Key Responsibilities- Develop and execute Impossible Cloud’s global Go-to-Market (GTM) strategy, focusing on market segmentation, value propositions, pricing, and packaging for GPU cloud and AI infrastructure tailored to enterprises, startups, and research entities.- Create scalable customer acquisition and retention strategies through direct sales, channels, and partnerships, enhancing commercial enablement and managing the customer journey (both commercial and technical).- Build and lead a high-performing global GTM team encompassing presales, direct sales, partnerships, solutions engineering, marketing, and customer success, while developing playbooks and performance metrics to instill a culture of customer focus and excellence.- Work closely with Product and Engineering to align GTM strategies with the product roadmap, integrating direct customer insights, and gathering market intelligence to anticipate trends in AI and cloud technology adoption.- Identify, negotiate, and lead strategic partnerships with AI firms, ISVs, integrators, and cloud marketplaces, while engaging with Enterprise and AI Native clients as a trusted advisor.
At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.
Join Baseten as a Software Engineer focusing on GPU Networking and Distributed Systems. In this pivotal role, you'll collaborate with talented engineers and researchers to develop cutting-edge solutions that leverage GPU technology for high-performance networking operations. Your contributions will be instrumental in shaping the future of distributed systems, enhancing performance, scalability, and reliability.
Join our dynamic team at Reka as a GPU Performance Engineer, where you will leverage your expertise in Python and large-scale model training to enhance our training infrastructure. You will play a pivotal role in optimizing model performance, contributing to critical technical decisions, and improving our post-training processes, including reinforcement learning and fine-tuning. Your contributions will also focus on enhancing the efficiency and scalability of our model serving infrastructure.
Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products
About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.
At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.
Join Our Mission to Build Open Superintelligence InfrastructureAt Prime Intellect, we are pioneering the development of an open superintelligence stack that encompasses cutting-edge agentic models and the infrastructure that empowers anyone to create, train, and deploy these advanced AI systems. Our innovative approach aggregates and orchestrates global computational resources into a cohesive control plane, complemented by a comprehensive reinforcement learning (RL) post-training toolkit that includes environments, secure sandboxes, verifiable evaluations, and our asynchronous RL trainer. We provide researchers, startups, and enterprises with the capabilities to execute end-to-end reinforcement learning at unparalleled scale, adapting models to real-world tools, workflows, and deployment scenarios.As a Solutions Architect for GPU Infrastructure, you will be the technical authority responsible for translating customer needs into robust, production-ready systems designed to train the world’s most sophisticated AI models.With a recent funding round raising $15 million (totaling $20 million) led by Founders Fund, alongside contributions from Menlo Ventures and illustrious angels such as Andrej Karpathy (Tesla, OpenAI), Tri Dao (Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), and Emad Mostaque (Stability AI), we are poised for significant growth and innovation.Key Technical ResponsibilitiesThis role requires a blend of deep technical knowledge and hands-on implementation skills. Your contributions will be crucial in:Customer Architecture & DesignCollaborating with clients to comprehend workload specifications and architect optimal GPU cluster solutions.Drafting technical proposals and conducting capacity planning for clusters ranging from 100 to over 10,000 GPUs.Formulating deployment strategies for large language model (LLM) training, inference, and high-performance computing (HPC) tasks.Delivering architectural recommendations to both technical teams and executive stakeholders.Infrastructure Deployment & OptimizationImplementing and configuring orchestration frameworks such as SLURM and Kubernetes for distributed workloads.Establishing high-performance networking through InfiniBand, RoCE, and NVLink interconnects.Enhancing GPU utilization, memory management, and inter-node communication.Setting up parallel file systems (Lustre, BeeGFS, GPFS) to maximize I/O efficiency.Tuning system performance, from kernel parameters to CUDA configurations.Production Operations & SupportEnsuring the reliability and performance of GPU infrastructure through continuous monitoring and support.Collaborating with cross-functional teams to troubleshoot and optimize operational workflows.Documenting processes and creating training materials for team members and clients.
Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.
Jul 29, 2025
Sign in to browse more jobs
Create account — see all 3,562 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.