Technical Intern Gpu Optimization And Ai Infrastructure jobs in San Francisco – Browse 5,157 openings on RoboApply Jobs
Technical Intern Gpu Optimization And Ai Infrastructure jobs in San Francisco
Open roles matching “Technical Intern Gpu Optimization And Ai Infrastructure” with location signals for San Francisco. 5,157 active listings on RoboApply Jobs.
5,157 jobs found
Technical Intern - GPU Optimization and AI Infrastructure
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
What We SeekTechnical Expertise RequiredGPU Fundamentals: In-depth knowledge of GPU architectures, CUDA programming, and parallel computing methodologies. Deep Learning Frameworks: Experience with PyTorch, TensorFlow, or JAX, particularly for GPU-accelerated applications. LLM/AI Understanding: Strong foundation in large language models (training, fine-tuning, evaluation). Systems Engineering: Proficiency in C++, Python, and potentially Rust/Go for developing tools around CUDA. Preferred BackgroundPublications or contributions in inference GPU computing or ML/AI are advantageous. Hands-on experience with large-scale experimentation, benchmarking, and performance optimization.
About the job
About the Role
We invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.
Your Responsibilities
Design and implement scalable infrastructure for AI model training and inference.
Make pivotal technical decisions and influence architectural choices.
About Wafer
Wafer is at the forefront of AI technology, dedicated to advancing GPU optimization and inference systems. Our team is passionate about building innovative solutions that push the boundaries of artificial intelligence and machine learning.
About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.
About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.
At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.
About the InternshipEmbark on an exciting journey with Wafer as a Technical Intern, where you'll play a pivotal role in shaping the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our talented team to define our technical strategy and develop the foundational systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and build scalable infrastructure for training and inference of AI models.Make key architectural decisions and lead technical direction.
Group: Impossible Cloud / Impossible Cloud Network (ICN)Focus: Integrating Enterprise Storage with Decentralized GPU OrchestrationOur MissionAt Impossible Cloud, we are transforming enterprise storage through our patented decentralized object storage technology, delivering a high-performance, cost-effective infrastructure. We aim to expand this foundation by creating a next-generation AI-first platform that integrates storage, compute, and GPU functionalities.We are looking for a dynamic and hands-on Go-to-Market Champion specializing in AI and GPU Infrastructure to accelerate Impossible Cloud's position in the market for Agentic AI infrastructure. This is an exceptional opportunity to join a rapidly growing AI infrastructure company during a critical phase, owning the GTM strategy from development to scaling a successful sales organization.In this role, you will collaborate closely with founders, Product, Marketing, and Customer Success teams to transform our viral product into a reliable, scalable revenue machine for enterprises. Our culture thrives on relentless innovation, accountability, and ownership, where each team member is dedicated to excellence and urgency in their work.Key Responsibilities- Develop and execute Impossible Cloud’s global Go-to-Market (GTM) strategy, focusing on market segmentation, value propositions, pricing, and packaging for GPU cloud and AI infrastructure tailored to enterprises, startups, and research entities.- Create scalable customer acquisition and retention strategies through direct sales, channels, and partnerships, enhancing commercial enablement and managing the customer journey (both commercial and technical).- Build and lead a high-performing global GTM team encompassing presales, direct sales, partnerships, solutions engineering, marketing, and customer success, while developing playbooks and performance metrics to instill a culture of customer focus and excellence.- Work closely with Product and Engineering to align GTM strategies with the product roadmap, integrating direct customer insights, and gathering market intelligence to anticipate trends in AI and cloud technology adoption.- Identify, negotiate, and lead strategic partnerships with AI firms, ISVs, integrators, and cloud marketplaces, while engaging with Enterprise and AI Native clients as a trusted advisor.
Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.
At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.
Full-time|$150K/yr - $250K/yr|On-site|San Francisco, CA
About FluidstackFluidstack is at the forefront of building groundbreaking infrastructure designed for the future of intelligence. We collaborate with premier AI research labs, government entities, and leading enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver compute solutions at unparalleled speeds.Our mission is to expedite the realization of Artificial General Intelligence (AGI). Our team is dedicated, passionate, and driven to create world-class infrastructure, treating our clients' success as our own. If you possess a strong sense of purpose, a dedication to excellence, and the willingness to work diligently to transform the future of intelligence, we welcome you to join us in shaping what lies ahead.About the RoleWe are seeking a Product Manager to spearhead New Product Introduction (NPI) for our GPU infrastructure. You will collaborate with our datacenter, infrastructure, and networking teams to launch new GPU SKUs and compute solutions. Your role will involve defining the frameworks through which Fluidstack assesses, qualifies, and brings new GPU generations to market—from NVIDIA Blackwell and Rubin to AMD MI300X and future accelerators. This highly cross-functional position demands strong technical acumen, adept vendor relationship management, and a clear understanding of how hardware capabilities align with customer workload requirements. By doing so, you will help ensure that Fluidstack remains a leader in providing optimal compute options tailored for training, inference, and specialized AI workloads.Key ResponsibilitiesManage the NPI roadmap for GPU SKUs, including evaluation criteria, qualification timelines, and market strategies for new hardware generations.Collaborate with datacenter teams to establish requirements for power delivery (HVDC/LVDC), cooling systems (liquid vs. air), rack architecture, and the physical infrastructure necessary for next-gen GPUs.Engage with infrastructure engineers to validate hardware performance across essential metrics: training throughput (MFU), inference latency (TTFT, TBT), memory bandwidth, and interconnect topology (NVLink, InfiniBand).Foster vendor relationships with NVIDIA, AMD, and emerging XPU providers—conducting in-depth technical discussions, negotiating supply agreements, and overseeing early access programs.Define product specifications for system configurations: single-GPU instances, multi-GPU nodes, full rack deployments, and megacluster architectures.Analyze customer workload profiles to identify the optimal GPU mix: H100 for large model training, L40S for inference, B200 for frontier research, and MI300X for cost-sensitive workloads.Develop business cases for new SKU introductions.
Join Our Mission to Build Open Superintelligence InfrastructureAt Prime Intellect, we are pioneering the development of an open superintelligence stack that encompasses cutting-edge agentic models and the infrastructure that empowers anyone to create, train, and deploy these advanced AI systems. Our innovative approach aggregates and orchestrates global computational resources into a cohesive control plane, complemented by a comprehensive reinforcement learning (RL) post-training toolkit that includes environments, secure sandboxes, verifiable evaluations, and our asynchronous RL trainer. We provide researchers, startups, and enterprises with the capabilities to execute end-to-end reinforcement learning at unparalleled scale, adapting models to real-world tools, workflows, and deployment scenarios.As a Solutions Architect for GPU Infrastructure, you will be the technical authority responsible for translating customer needs into robust, production-ready systems designed to train the world’s most sophisticated AI models.With a recent funding round raising $15 million (totaling $20 million) led by Founders Fund, alongside contributions from Menlo Ventures and illustrious angels such as Andrej Karpathy (Tesla, OpenAI), Tri Dao (Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), and Emad Mostaque (Stability AI), we are poised for significant growth and innovation.Key Technical ResponsibilitiesThis role requires a blend of deep technical knowledge and hands-on implementation skills. Your contributions will be crucial in:Customer Architecture & DesignCollaborating with clients to comprehend workload specifications and architect optimal GPU cluster solutions.Drafting technical proposals and conducting capacity planning for clusters ranging from 100 to over 10,000 GPUs.Formulating deployment strategies for large language model (LLM) training, inference, and high-performance computing (HPC) tasks.Delivering architectural recommendations to both technical teams and executive stakeholders.Infrastructure Deployment & OptimizationImplementing and configuring orchestration frameworks such as SLURM and Kubernetes for distributed workloads.Establishing high-performance networking through InfiniBand, RoCE, and NVLink interconnects.Enhancing GPU utilization, memory management, and inter-node communication.Setting up parallel file systems (Lustre, BeeGFS, GPFS) to maximize I/O efficiency.Tuning system performance, from kernel parameters to CUDA configurations.Production Operations & SupportEnsuring the reliability and performance of GPU infrastructure through continuous monitoring and support.Collaborating with cross-functional teams to troubleshoot and optimize operational workflows.Documenting processes and creating training materials for team members and clients.
At Magic, our mission is to create safe AGI that propels humanity forward in addressing the world’s most critical challenges. We believe that the key to achieving safe AGI lies in automating research and code generation to enhance models and resolve alignment issues more effectively than humans alone. Our unique approach integrates frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time computation to realize this vision.Role OverviewAs a vital member of our Supercomputing Platform & Infrastructure team, you will be instrumental in designing, constructing, and managing the extensive GPU infrastructure that underpins Magic’s model training and inference processes.A key aspect of your role will involve leveraging Terraform-driven infrastructure-as-code methodologies to build and maintain our infrastructure, ensuring reproducibility, reliability, and operational clarity across clusters comprising thousands of GPUs.Magic’s long-context models exert continuous demands on compute, networking, and storage systems. The infrastructure must support long-running distributed jobs, high-throughput data movement, and stringent availability requirements, necessitating designs that are automated, observable, and resilient. You will take ownership of the systems and IaC foundations that facilitate these capabilities.This position has the potential to expand into broader responsibilities encompassing supercomputing platform architecture, influencing how Magic scales GPU clusters and enhances infrastructure reliability as model workloads expand.Key ResponsibilitiesDesign and manage large-scale GPU clusters for model training and inference.Construct and sustain infrastructure utilizing Terraform across both cloud and hybrid environments.Develop modular, scalable IaC frameworks for provisioning compute, networking, and storage resources.Enhance deployment reproducibility, maintain environment consistency, and ensure operational safety.Optimize networking and storage architectures for high-throughput AI workloads.Automate fault detection and recovery mechanisms across distributed clusters.Diagnose complex cross-layer issues involving hardware, drivers, networking, storage, operating systems, and cloud environments.Enhance observability, monitoring, and reliability of essential platform systems.QualificationsStrong foundation in systems engineering principles.Extensive hands-on experience with Terraform, including module design, state management, environment isolation, and large-scale implementations.
About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.
Our VisionAt Reflection AI, we are on a mission to develop open superintelligence and democratize its access for everyone.Our team, hailing from renowned organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic, is dedicated to creating open weight models that cater to individuals, enterprises, and even nations.Role OverviewDesign, construct, and manage state-of-the-art GPU infrastructure for high-throughput model inference and mid-training processes.Develop systems that facilitate synthetic data generation and reinforcement learning pipelines at scale.Create high-performance inference platforms capable of serving and evaluating models across thousands of GPUs.Optimize throughput, latency, and GPU utilization for large language model inference and deployment tasks.Construct infrastructure that enhances reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy enhancement loops.Collaborate closely with research teams to support distributed reinforcement learning workloads and extensive model evaluation infrastructure.Enhance model execution performance through kernel-level optimization, model parallelism strategies, and GPU runtime improvements.Develop distributed systems that enable large-scale synthetic data generation and reinforcement learning-driven training workflows.Identify and address performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed computing systems.
Full-time|Remote|Global Remote / San Francisco, CA
Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster provides early-stage startups with access to scaled AI infrastructure, once exclusive to hyperscalers. Our journey began with a single managed cluster that rapidly gained demand, leading us to develop a robust system, network, and orchestration layer to democratize AI infrastructure.Today, we partner with leading AI labs, data centers, and cloud providers to efficiently deliver compute resources wherever needed. Our platform expertly routes training and inference jobs across global supply chains, promoting flexibility and efficiency in one of the fastest-growing markets in the world.Our vision is to create a liquidity layer for global AI compute, and we are on the lookout for bright minds in AI infrastructure, research, and engineering to join our expanding team.The OpportunityWe are seeking a dedicated Global GPU Commodity Manager to enhance the supply and demand matching on our platform. This role is an Individual Contributor position reporting to the Head of Infrastructure. The Infrastructure team is pivotal to our operations, responsible for acquiring and facilitating compute resources across the organization while collaborating closely with compute providers, sales, and technical teams to align supply with demand.With a solid foundation established with our providers, we are now scaling to expand our network and liquidity, broaden our service offerings, and accelerate our growth trajectory.What You'll DoMatch incoming leads from the sales team to internal and external market capacity.Maximize utilization of compute resources.Source and onboard new compute suppliers globally.Identify capacity based on customer requirements and market trends.Resolve customer and supplier challenges in a fast-paced environment.Analyze technical and commercial differences between suppliers to optimize our capacity funnel.Develop a proactive compute strategy driven by market intelligence.Negotiate costs with suppliers and other vendors.Create and implement processes around capacity planning.
Join Our MissionAt Hyperbolic Labs, we are dedicated to democratizing artificial intelligence by eliminating barriers to computing power through our Open-Access AI Cloud. We aggregate global computing resources to provide an innovative GPU marketplace and AI inference service, making AI affordable and accessible for everyone. As pioneers at the crossroads of AI and open-source technology, we envision a future where AI innovation is driven by imagination, not resource limitations. We invite forward-thinking individuals who share our vision of making AI universally accessible, secure, and cost-effective to join us in crafting a platform that empowers innovators to realize their groundbreaking AI projects.As we gear up for expansion following our Series A funding, our team, led by co-founders with PhDs in AI, Mathematics, and Computer Science, is set to transform the landscape of computing.The RoleWe are on the lookout for a Senior Infrastructure Engineer to drive the development and scaling of Hyperbolic's GPU Cloud Marketplace. In this pivotal role, you will create a multi-tenancy provisioning and virtualization solution that transforms raw GPUs from diverse global suppliers into a programmable, orchestrated resource pool serving thousands of AI developers and researchers. You will work at the forefront of cloud infrastructure, building the core orchestration layer that allows our platform to deliver cost savings of up to 75% compared to traditional cloud providers.
At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.Key ResponsibilitiesUtilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.Develop high-performance CUDA and Triton kernels to optimize essential model functions.Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.Collaborate closely with machine learning engineers to optimize model implementations.Diagnose and resolve performance issues throughout the application and hardware stack.Implement custom memory pooling and allocation strategies to enhance performance.Promote performance optimization techniques and foster a culture of excellence across teams.
Join our dynamic team at Reka as a GPU Performance Engineer, where you will leverage your expertise in Python and large-scale model training to enhance our training infrastructure. You will play a pivotal role in optimizing model performance, contributing to critical technical decisions, and improving our post-training processes, including reinforcement learning and fine-tuning. Your contributions will also focus on enhancing the efficiency and scalability of our model serving infrastructure.
About UsAt Parallel, we are a pioneering web infrastructure company dedicated to empowering businesses across various sectors, including sales, marketing, insurance, and software development. Our innovative products enable organizations to create cutting-edge AI agents with robust and flexible programmatic access to the web.Having successfully raised $130 million from esteemed investors such as Kleiner Perkins, Index Ventures, and Spark Capital, our mission is to reshape the web for AI applications. We are assembling a talented team of engineers, designers, marketers, and operational experts to help us achieve this vision.Job Overview: As a member of our technical staff, you will play a crucial role in building, operating, and scaling our infrastructure, particularly around large language models. Your responsibilities will include ensuring system reliability and cost-efficiency as we expand, anticipating potential bottlenecks, evolving our architecture to meet growing demands, and developing the tools that enhance engineering productivity.About You: You possess a deep understanding of distributed systems, cloud platforms, performance optimization, and scalable architecture. You are adept at balancing trade-offs between cost, reliability, and speed, and you are passionate about enabling teams to innovate rapidly and confidently while supporting products that serve millions of users seamlessly.
Full-time|$225K/yr - $315K/yr|On-site|San Francisco
About UsAt Lavendo, we are at the forefront of AI cloud infrastructure, rapidly expanding with a significant global presence that includes R&D centers in North America, Europe, and Israel. Our exceptional team of engineers and AI researchers is dedicated to creating innovative solutions that provide the essential infrastructure for the next wave of AI-driven enterprises.We empower organizations, from Fortune 500 companies to pioneering AI startups and research institutions, allowing them to address complex AI challenges without incurring heavy infrastructure costs or the need to develop extensive in-house AI/ML teams.Our MissionWe aim to democratize access to top-tier AI infrastructure, enabling organizations of all sizes to transform ambitious AI goals into tangible outcomes. Our culture fosters creativity, embraces challenges, and thrives on teamwork.Your RoleAs a Cloud Solutions Architect (Pre-Sales), you will serve as a vital technical partner to some of the most forward-thinking AI teams globally. You will engage directly with cutting-edge GPU infrastructure, including the latest NVIDIA technology, to assist clients in designing, deploying, and optimizing AI workloads at scale. This high-profile position lies at the intersection of deep technical expertise and strategic customer interaction, significantly shaping customer experiences and platform adoption.Key ResponsibilitiesAct as a trusted technical advisor to customers throughout the entire pre-sales and onboarding process.Lead proof-of-concept initiatives, architectural workshops, presentations, and training on GPU cloud technologies and industry best practices.Work closely with customers to understand their business needs and translate them into scalable solution architectures.Develop and document Infrastructure as Code solutions, reference architectures, and technical guides in collaboration with support engineers and technical writers.Assist clients in optimizing machine learning pipeline performance and resource efficiency.Serve as a cross-functional technical expert, connecting product, technical support, and marketing teams with customer requirements.Represent Lavendo at external events, including hackathons, conferences, and industry showcases.
About Sygaldry Technologies Sygaldry Technologies develops quantum-accelerated AI servers in San Francisco, focusing on faster AI training and inference. By combining quantum technology with artificial intelligence, the team addresses challenges in computing costs and energy efficiency. Their AI servers integrate multiple qubit types within a fault-tolerant system, aiming for a balance of cost, scalability, and speed. The company values optimism, rigor, and a drive to solve complex problems in physics, engineering, and AI. Role Overview: ML Infrastructure Engineer The ML Infrastructure Engineer joins the AI & Algorithms team, which includes research scientists, applied mathematicians, and quantum algorithm specialists. This role centers on building and maintaining the compute infrastructure that powers advanced research. The systems you build will support reliable GPU access, reproducible experiments, and scalable workloads, so researchers can focus on their core work without needing deep cloud expertise. Expect to design and manage compute platforms for a range of tasks, including quantum circuit simulation, large-scale numerical optimization, model training, tensor network contractions, and high-throughput data generation. These workloads span multiple cloud providers and on-premises GPU servers. Key Responsibilities Develop compute abstractions for diverse workloads, such as GPU-accelerated simulations, distributed training, high-throughput CPU jobs, and interactive analyses using frameworks like PyTorch and JAX. Set up infrastructure to support experiment tracking and reproducibility. Create developer tools that make cloud computing feel local, streamlining environment setup, job submission, monitoring, and artifact management. Scale experiments from single-GPU prototypes to large, multi-node production runs. Multi-Cloud GPU Orchestration Design orchestration strategies for workloads across multiple cloud providers, optimizing job routing for cost, availability, and capability. Monitor and improve cloud spending, keeping track of credit balances, burn rates, and expiration dates.
About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.
About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.
At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.
About the InternshipEmbark on an exciting journey with Wafer as a Technical Intern, where you'll play a pivotal role in shaping the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our talented team to define our technical strategy and develop the foundational systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and build scalable infrastructure for training and inference of AI models.Make key architectural decisions and lead technical direction.
Group: Impossible Cloud / Impossible Cloud Network (ICN)Focus: Integrating Enterprise Storage with Decentralized GPU OrchestrationOur MissionAt Impossible Cloud, we are transforming enterprise storage through our patented decentralized object storage technology, delivering a high-performance, cost-effective infrastructure. We aim to expand this foundation by creating a next-generation AI-first platform that integrates storage, compute, and GPU functionalities.We are looking for a dynamic and hands-on Go-to-Market Champion specializing in AI and GPU Infrastructure to accelerate Impossible Cloud's position in the market for Agentic AI infrastructure. This is an exceptional opportunity to join a rapidly growing AI infrastructure company during a critical phase, owning the GTM strategy from development to scaling a successful sales organization.In this role, you will collaborate closely with founders, Product, Marketing, and Customer Success teams to transform our viral product into a reliable, scalable revenue machine for enterprises. Our culture thrives on relentless innovation, accountability, and ownership, where each team member is dedicated to excellence and urgency in their work.Key Responsibilities- Develop and execute Impossible Cloud’s global Go-to-Market (GTM) strategy, focusing on market segmentation, value propositions, pricing, and packaging for GPU cloud and AI infrastructure tailored to enterprises, startups, and research entities.- Create scalable customer acquisition and retention strategies through direct sales, channels, and partnerships, enhancing commercial enablement and managing the customer journey (both commercial and technical).- Build and lead a high-performing global GTM team encompassing presales, direct sales, partnerships, solutions engineering, marketing, and customer success, while developing playbooks and performance metrics to instill a culture of customer focus and excellence.- Work closely with Product and Engineering to align GTM strategies with the product roadmap, integrating direct customer insights, and gathering market intelligence to anticipate trends in AI and cloud technology adoption.- Identify, negotiate, and lead strategic partnerships with AI firms, ISVs, integrators, and cloud marketplaces, while engaging with Enterprise and AI Native clients as a trusted advisor.
Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.
At Sciforium, we are at the forefront of AI infrastructure, pioneering advanced multimodal AI models and an innovative, high-efficiency serving platform. With substantial backing from AMD and a dedicated team of engineers, we are rapidly expanding our capabilities to support the next generation of frontier AI models and real-time applications.About the RoleWe are looking for a highly skilled Senior HPC & GPU Infrastructure Engineer who will be responsible for ensuring the health, reliability, and performance of our GPU compute cluster. As the primary custodian of our high-density accelerator environment, you will serve as the crucial link between hardware operations, distributed systems, and machine learning workflows. This position encompasses a range of responsibilities, from hands-on Linux systems engineering and GPU driver setup to maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM). If you are passionate about optimizing hardware performance, enjoy troubleshooting GPUs at scale, and aspire to create world-class AI infrastructure, we would love to hear from you.Your Responsibilities1. System Health & Reliability (SRE)On-Call Response: Be the primary responder for system outages, GPU failures, node crashes, and other cluster-wide incidents, ensuring rapid issue resolution to minimize downtime.Cluster Monitoring: Develop and maintain monitoring protocols for GPU health, thermal behavior, PCIe/NVLink topology issues, memory errors, and general system load.Vendor Liaison: Collaborate with data center personnel, hardware vendors, and on-site technicians for repairs, RMA processing, and physical maintenance of the cluster.2. Linux & Network AdministrationOS Management: Oversee the installation, patching, and maintenance of Linux distributions (Ubuntu / CentOS / RHEL), ensuring consistent configuration, kernel tuning, and automation for large node fleets.Security & Access Controls: Set up VPNs, iptables/firewalls, SSH hardening, and network routing to secure our computing infrastructure.Identity & Storage Management: Manage LDAP/FreeIPA/AD for user identity and administer distributed file systems like NFS, GPFS, or Lustre.3. GPU & ML Stack EngineeringDeployment & Bring-Up: Spearhead the deployment of new GPU nodes, including BIOS configuration and software integration to ensure optimal performance.
Full-time|$150K/yr - $250K/yr|On-site|San Francisco, CA
About FluidstackFluidstack is at the forefront of building groundbreaking infrastructure designed for the future of intelligence. We collaborate with premier AI research labs, government entities, and leading enterprises like Mistral, Poolside, Black Forest Labs, and Meta to deliver compute solutions at unparalleled speeds.Our mission is to expedite the realization of Artificial General Intelligence (AGI). Our team is dedicated, passionate, and driven to create world-class infrastructure, treating our clients' success as our own. If you possess a strong sense of purpose, a dedication to excellence, and the willingness to work diligently to transform the future of intelligence, we welcome you to join us in shaping what lies ahead.About the RoleWe are seeking a Product Manager to spearhead New Product Introduction (NPI) for our GPU infrastructure. You will collaborate with our datacenter, infrastructure, and networking teams to launch new GPU SKUs and compute solutions. Your role will involve defining the frameworks through which Fluidstack assesses, qualifies, and brings new GPU generations to market—from NVIDIA Blackwell and Rubin to AMD MI300X and future accelerators. This highly cross-functional position demands strong technical acumen, adept vendor relationship management, and a clear understanding of how hardware capabilities align with customer workload requirements. By doing so, you will help ensure that Fluidstack remains a leader in providing optimal compute options tailored for training, inference, and specialized AI workloads.Key ResponsibilitiesManage the NPI roadmap for GPU SKUs, including evaluation criteria, qualification timelines, and market strategies for new hardware generations.Collaborate with datacenter teams to establish requirements for power delivery (HVDC/LVDC), cooling systems (liquid vs. air), rack architecture, and the physical infrastructure necessary for next-gen GPUs.Engage with infrastructure engineers to validate hardware performance across essential metrics: training throughput (MFU), inference latency (TTFT, TBT), memory bandwidth, and interconnect topology (NVLink, InfiniBand).Foster vendor relationships with NVIDIA, AMD, and emerging XPU providers—conducting in-depth technical discussions, negotiating supply agreements, and overseeing early access programs.Define product specifications for system configurations: single-GPU instances, multi-GPU nodes, full rack deployments, and megacluster architectures.Analyze customer workload profiles to identify the optimal GPU mix: H100 for large model training, L40S for inference, B200 for frontier research, and MI300X for cost-sensitive workloads.Develop business cases for new SKU introductions.
Join Our Mission to Build Open Superintelligence InfrastructureAt Prime Intellect, we are pioneering the development of an open superintelligence stack that encompasses cutting-edge agentic models and the infrastructure that empowers anyone to create, train, and deploy these advanced AI systems. Our innovative approach aggregates and orchestrates global computational resources into a cohesive control plane, complemented by a comprehensive reinforcement learning (RL) post-training toolkit that includes environments, secure sandboxes, verifiable evaluations, and our asynchronous RL trainer. We provide researchers, startups, and enterprises with the capabilities to execute end-to-end reinforcement learning at unparalleled scale, adapting models to real-world tools, workflows, and deployment scenarios.As a Solutions Architect for GPU Infrastructure, you will be the technical authority responsible for translating customer needs into robust, production-ready systems designed to train the world’s most sophisticated AI models.With a recent funding round raising $15 million (totaling $20 million) led by Founders Fund, alongside contributions from Menlo Ventures and illustrious angels such as Andrej Karpathy (Tesla, OpenAI), Tri Dao (Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), and Emad Mostaque (Stability AI), we are poised for significant growth and innovation.Key Technical ResponsibilitiesThis role requires a blend of deep technical knowledge and hands-on implementation skills. Your contributions will be crucial in:Customer Architecture & DesignCollaborating with clients to comprehend workload specifications and architect optimal GPU cluster solutions.Drafting technical proposals and conducting capacity planning for clusters ranging from 100 to over 10,000 GPUs.Formulating deployment strategies for large language model (LLM) training, inference, and high-performance computing (HPC) tasks.Delivering architectural recommendations to both technical teams and executive stakeholders.Infrastructure Deployment & OptimizationImplementing and configuring orchestration frameworks such as SLURM and Kubernetes for distributed workloads.Establishing high-performance networking through InfiniBand, RoCE, and NVLink interconnects.Enhancing GPU utilization, memory management, and inter-node communication.Setting up parallel file systems (Lustre, BeeGFS, GPFS) to maximize I/O efficiency.Tuning system performance, from kernel parameters to CUDA configurations.Production Operations & SupportEnsuring the reliability and performance of GPU infrastructure through continuous monitoring and support.Collaborating with cross-functional teams to troubleshoot and optimize operational workflows.Documenting processes and creating training materials for team members and clients.
At Magic, our mission is to create safe AGI that propels humanity forward in addressing the world’s most critical challenges. We believe that the key to achieving safe AGI lies in automating research and code generation to enhance models and resolve alignment issues more effectively than humans alone. Our unique approach integrates frontier-scale pre-training, domain-specific reinforcement learning, ultra-long context, and inference-time computation to realize this vision.Role OverviewAs a vital member of our Supercomputing Platform & Infrastructure team, you will be instrumental in designing, constructing, and managing the extensive GPU infrastructure that underpins Magic’s model training and inference processes.A key aspect of your role will involve leveraging Terraform-driven infrastructure-as-code methodologies to build and maintain our infrastructure, ensuring reproducibility, reliability, and operational clarity across clusters comprising thousands of GPUs.Magic’s long-context models exert continuous demands on compute, networking, and storage systems. The infrastructure must support long-running distributed jobs, high-throughput data movement, and stringent availability requirements, necessitating designs that are automated, observable, and resilient. You will take ownership of the systems and IaC foundations that facilitate these capabilities.This position has the potential to expand into broader responsibilities encompassing supercomputing platform architecture, influencing how Magic scales GPU clusters and enhances infrastructure reliability as model workloads expand.Key ResponsibilitiesDesign and manage large-scale GPU clusters for model training and inference.Construct and sustain infrastructure utilizing Terraform across both cloud and hybrid environments.Develop modular, scalable IaC frameworks for provisioning compute, networking, and storage resources.Enhance deployment reproducibility, maintain environment consistency, and ensure operational safety.Optimize networking and storage architectures for high-throughput AI workloads.Automate fault detection and recovery mechanisms across distributed clusters.Diagnose complex cross-layer issues involving hardware, drivers, networking, storage, operating systems, and cloud environments.Enhance observability, monitoring, and reliability of essential platform systems.QualificationsStrong foundation in systems engineering principles.Extensive hands-on experience with Terraform, including module design, state management, environment isolation, and large-scale implementations.
About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.
Our VisionAt Reflection AI, we are on a mission to develop open superintelligence and democratize its access for everyone.Our team, hailing from renowned organizations like DeepMind, OpenAI, Google Brain, Meta, Character.AI, and Anthropic, is dedicated to creating open weight models that cater to individuals, enterprises, and even nations.Role OverviewDesign, construct, and manage state-of-the-art GPU infrastructure for high-throughput model inference and mid-training processes.Develop systems that facilitate synthetic data generation and reinforcement learning pipelines at scale.Create high-performance inference platforms capable of serving and evaluating models across thousands of GPUs.Optimize throughput, latency, and GPU utilization for large language model inference and deployment tasks.Construct infrastructure that enhances reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy enhancement loops.Collaborate closely with research teams to support distributed reinforcement learning workloads and extensive model evaluation infrastructure.Enhance model execution performance through kernel-level optimization, model parallelism strategies, and GPU runtime improvements.Develop distributed systems that enable large-scale synthetic data generation and reinforcement learning-driven training workflows.Identify and address performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed computing systems.
Full-time|Remote|Global Remote / San Francisco, CA
Location: North America Remote / San Francisco · Full-TimeAbout AndromedaFounded by Nat Friedman and Daniel Gross, Andromeda Cluster provides early-stage startups with access to scaled AI infrastructure, once exclusive to hyperscalers. Our journey began with a single managed cluster that rapidly gained demand, leading us to develop a robust system, network, and orchestration layer to democratize AI infrastructure.Today, we partner with leading AI labs, data centers, and cloud providers to efficiently deliver compute resources wherever needed. Our platform expertly routes training and inference jobs across global supply chains, promoting flexibility and efficiency in one of the fastest-growing markets in the world.Our vision is to create a liquidity layer for global AI compute, and we are on the lookout for bright minds in AI infrastructure, research, and engineering to join our expanding team.The OpportunityWe are seeking a dedicated Global GPU Commodity Manager to enhance the supply and demand matching on our platform. This role is an Individual Contributor position reporting to the Head of Infrastructure. The Infrastructure team is pivotal to our operations, responsible for acquiring and facilitating compute resources across the organization while collaborating closely with compute providers, sales, and technical teams to align supply with demand.With a solid foundation established with our providers, we are now scaling to expand our network and liquidity, broaden our service offerings, and accelerate our growth trajectory.What You'll DoMatch incoming leads from the sales team to internal and external market capacity.Maximize utilization of compute resources.Source and onboard new compute suppliers globally.Identify capacity based on customer requirements and market trends.Resolve customer and supplier challenges in a fast-paced environment.Analyze technical and commercial differences between suppliers to optimize our capacity funnel.Develop a proactive compute strategy driven by market intelligence.Negotiate costs with suppliers and other vendors.Create and implement processes around capacity planning.
Join Our MissionAt Hyperbolic Labs, we are dedicated to democratizing artificial intelligence by eliminating barriers to computing power through our Open-Access AI Cloud. We aggregate global computing resources to provide an innovative GPU marketplace and AI inference service, making AI affordable and accessible for everyone. As pioneers at the crossroads of AI and open-source technology, we envision a future where AI innovation is driven by imagination, not resource limitations. We invite forward-thinking individuals who share our vision of making AI universally accessible, secure, and cost-effective to join us in crafting a platform that empowers innovators to realize their groundbreaking AI projects.As we gear up for expansion following our Series A funding, our team, led by co-founders with PhDs in AI, Mathematics, and Computer Science, is set to transform the landscape of computing.The RoleWe are on the lookout for a Senior Infrastructure Engineer to drive the development and scaling of Hyperbolic's GPU Cloud Marketplace. In this pivotal role, you will create a multi-tenancy provisioning and virtualization solution that transforms raw GPUs from diverse global suppliers into a programmable, orchestrated resource pool serving thousands of AI developers and researchers. You will work at the forefront of cloud infrastructure, building the core orchestration layer that allows our platform to deliver cost savings of up to 75% compared to traditional cloud providers.
At Genmo, we are at the forefront of advancing artificial intelligence through innovative research in video generation. Our mission is to construct open, cutting-edge models that will ultimately contribute to the realization of Artificial General Intelligence (AGI). As part of our dynamic team, you will play a pivotal role in redefining the future of AI and expanding the horizons of video creation.We are looking for a skilled GPU Performance Engineer who can extract maximum performance from our H100 infrastructure and fine-tune our model serving stack to achieve unparalleled efficiency. If you are passionate about optimizing performance, particularly at the microsecond level, and thrive on pushing hardware to its limits, this is the perfect opportunity for you.Key ResponsibilitiesUtilize advanced profiling tools such as Nsight Systems and nvprof to analyze and enhance GPU workloads.Develop high-performance CUDA and Triton kernels to optimize essential model functions.Reduce cold start latency from seconds to mere milliseconds in our serving infrastructure.Optimize memory access patterns, implement kernel fusion, and maximize GPU utilization.Collaborate closely with machine learning engineers to optimize model implementations.Diagnose and resolve performance issues throughout the application and hardware stack.Implement custom memory pooling and allocation strategies to enhance performance.Promote performance optimization techniques and foster a culture of excellence across teams.
Join our dynamic team at Reka as a GPU Performance Engineer, where you will leverage your expertise in Python and large-scale model training to enhance our training infrastructure. You will play a pivotal role in optimizing model performance, contributing to critical technical decisions, and improving our post-training processes, including reinforcement learning and fine-tuning. Your contributions will also focus on enhancing the efficiency and scalability of our model serving infrastructure.
About UsAt Parallel, we are a pioneering web infrastructure company dedicated to empowering businesses across various sectors, including sales, marketing, insurance, and software development. Our innovative products enable organizations to create cutting-edge AI agents with robust and flexible programmatic access to the web.Having successfully raised $130 million from esteemed investors such as Kleiner Perkins, Index Ventures, and Spark Capital, our mission is to reshape the web for AI applications. We are assembling a talented team of engineers, designers, marketers, and operational experts to help us achieve this vision.Job Overview: As a member of our technical staff, you will play a crucial role in building, operating, and scaling our infrastructure, particularly around large language models. Your responsibilities will include ensuring system reliability and cost-efficiency as we expand, anticipating potential bottlenecks, evolving our architecture to meet growing demands, and developing the tools that enhance engineering productivity.About You: You possess a deep understanding of distributed systems, cloud platforms, performance optimization, and scalable architecture. You are adept at balancing trade-offs between cost, reliability, and speed, and you are passionate about enabling teams to innovate rapidly and confidently while supporting products that serve millions of users seamlessly.
Full-time|$225K/yr - $315K/yr|On-site|San Francisco
About UsAt Lavendo, we are at the forefront of AI cloud infrastructure, rapidly expanding with a significant global presence that includes R&D centers in North America, Europe, and Israel. Our exceptional team of engineers and AI researchers is dedicated to creating innovative solutions that provide the essential infrastructure for the next wave of AI-driven enterprises.We empower organizations, from Fortune 500 companies to pioneering AI startups and research institutions, allowing them to address complex AI challenges without incurring heavy infrastructure costs or the need to develop extensive in-house AI/ML teams.Our MissionWe aim to democratize access to top-tier AI infrastructure, enabling organizations of all sizes to transform ambitious AI goals into tangible outcomes. Our culture fosters creativity, embraces challenges, and thrives on teamwork.Your RoleAs a Cloud Solutions Architect (Pre-Sales), you will serve as a vital technical partner to some of the most forward-thinking AI teams globally. You will engage directly with cutting-edge GPU infrastructure, including the latest NVIDIA technology, to assist clients in designing, deploying, and optimizing AI workloads at scale. This high-profile position lies at the intersection of deep technical expertise and strategic customer interaction, significantly shaping customer experiences and platform adoption.Key ResponsibilitiesAct as a trusted technical advisor to customers throughout the entire pre-sales and onboarding process.Lead proof-of-concept initiatives, architectural workshops, presentations, and training on GPU cloud technologies and industry best practices.Work closely with customers to understand their business needs and translate them into scalable solution architectures.Develop and document Infrastructure as Code solutions, reference architectures, and technical guides in collaboration with support engineers and technical writers.Assist clients in optimizing machine learning pipeline performance and resource efficiency.Serve as a cross-functional technical expert, connecting product, technical support, and marketing teams with customer requirements.Represent Lavendo at external events, including hackathons, conferences, and industry showcases.
About Sygaldry Technologies Sygaldry Technologies develops quantum-accelerated AI servers in San Francisco, focusing on faster AI training and inference. By combining quantum technology with artificial intelligence, the team addresses challenges in computing costs and energy efficiency. Their AI servers integrate multiple qubit types within a fault-tolerant system, aiming for a balance of cost, scalability, and speed. The company values optimism, rigor, and a drive to solve complex problems in physics, engineering, and AI. Role Overview: ML Infrastructure Engineer The ML Infrastructure Engineer joins the AI & Algorithms team, which includes research scientists, applied mathematicians, and quantum algorithm specialists. This role centers on building and maintaining the compute infrastructure that powers advanced research. The systems you build will support reliable GPU access, reproducible experiments, and scalable workloads, so researchers can focus on their core work without needing deep cloud expertise. Expect to design and manage compute platforms for a range of tasks, including quantum circuit simulation, large-scale numerical optimization, model training, tensor network contractions, and high-throughput data generation. These workloads span multiple cloud providers and on-premises GPU servers. Key Responsibilities Develop compute abstractions for diverse workloads, such as GPU-accelerated simulations, distributed training, high-throughput CPU jobs, and interactive analyses using frameworks like PyTorch and JAX. Set up infrastructure to support experiment tracking and reproducibility. Create developer tools that make cloud computing feel local, streamlining environment setup, job submission, monitoring, and artifact management. Scale experiments from single-GPU prototypes to large, multi-node production runs. Multi-Cloud GPU Orchestration Design orchestration strategies for workloads across multiple cloud providers, optimizing job routing for cost, availability, and capability. Monitor and improve cloud spending, keeping track of credit balances, burn rates, and expiration dates.
Apr 14, 2026
Sign in to browse more jobs
Create account — see all 5,157 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.