Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
Required QualificationsBachelor's degree or equivalent experience in computer science, engineering, or a related field. In-depth understanding of transformer architectures and their derivatives. Proficient programming skills in Python, with a strong background in PyTorch internals. Experience with LLM inference systems (e.g., vLLM, TensorRT-LLM, SGLang, TGI). Ability to interpret and implement model architectures and inference techniques as presented in academic papers. Proven capability to produce high-performance, maintainable code and troubleshoot complex machine learning codebases. Preferred QualificationsComprehensive knowledge of KV-cache memory management, prefix caching, and hybrid model serving. Familiarity with reinforcement learning frameworks and algorithms for large language models. Experience in multimodal inference across various media types (audio, image, video, text). Previous contributions to open-source machine learning or systems infrastructure projects. Additionally, bonus points if you have:Successfully implemented core features in vLLM or other inference engine projects. Contributed to vLLM integrations (e.g., verl, OpenRLHF, Unsloth, LlamaFactory). Authored widely-shared technical blogs or side projects focusing on vLLM or LLM inference.
About the job
At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.
Role Overview
We are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.
About Inferact
Inferact is dedicated to advancing the field of artificial intelligence through innovative solutions in inference technology. Our team, comprised of the original architects of vLLM, is committed to shaping the future of AI by creating tools that make inference faster and more cost-effective.
Similar jobs
1 - 20 of 7,352 Jobs
Search for Staff Senior Software Engineer Cloud Inference
On-site|On-site|San Francisco, CA | New York City, NY | Seattle, WA
About AnthropicAt Anthropic, our mission is to develop AI systems that are safe, interpretable, and controllable. We believe in harnessing AI for the greater good of our users and society at large. Our dynamic team comprises dedicated researchers, engineers, policy experts, and business leaders who collaborate to create beneficial AI systems.About the RoleThe Cloud Inference team is responsible for scaling and optimizing Claude to cater to a vast array of developers and enterprise clients across platforms such as AWS, GCP, Azure, and future cloud service providers (CSPs). We manage the complete lifecycle of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and daily operations.Our engineers wield significant influence, driving multiple key revenue streams while optimizing one of Anthropic's most valuable resources—compute power. As we expand to additional cloud providers, the intricacies of efficiently managing inference across diverse platforms with varying hardware, networking frameworks, and operational models grow substantially. We seek engineers adept at navigating these variances, developing strong abstractions that are effective across providers, and making informed infrastructure choices that keep us cost-effective at scale.Your contributions will enhance the operational scale of our services, expedite our capacity to launch cutting-edge models and innovative features to clients across all platforms, and ensure our large language models (LLMs) adhere to stringent safety, performance, and security standards.
Join our innovative team at Anthropic as a Software Engineer specializing in Cloud Inference Safeguards. In this role, you will play a crucial part in developing and enhancing the systems that ensure the robustness and security of our cloud-based inference services. You will collaborate with cross-functional teams to design, implement, and maintain scalable solutions that meet our high standards for reliability and performance.
Full-time|$190.9K/yr - $232.8K/yr|On-site|San Francisco, California
P-1285 About This Role Join Databricks as a Staff Software Engineer specializing in GenAI inference, where you will spearhead the architecture, development, and optimization of the inference engine that powers the Databricks Foundation Model API. Your role will be crucial in bridging cutting-edge research with real-world production requirements, ensuring exceptional throughput, minimal latency, and scalable solutions. You will work across the entire GenAI inference stack, including kernels, runtimes, orchestration, memory management, and integration with various frameworks and orchestration systems. What You Will Do Take full ownership of the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLM inference. Work closely with researchers to integrate new model architectures or features, such as sparsity, activation compression, and mixture-of-experts into the engine. Lead comprehensive optimization efforts focused on latency, throughput, memory efficiency, and hardware utilization across GPUs and other accelerators. Establish and uphold standards for building and maintaining instrumentation, profiling, and tracing tools to identify performance bottlenecks and drive optimizations. Design scalable solutions for routing, batching, scheduling, memory management, and dynamic loading tailored to inference workloads. Guarantee reliability, reproducibility, and fault tolerance in inference pipelines, including capabilities for A/B testing, rollbacks, and model versioning. Collaborate cross-functionally to integrate with federated and distributed inference infrastructure, ensuring effective orchestration across nodes, load balancing, and minimizing communication overhead. Foster collaboration with cross-functional teams, including platform engineers, cloud infrastructure, and security/compliance professionals. Represent the team externally through benchmarks, whitepapers, and contributions to open-source projects. What We Look For A BS/MS/PhD in Computer Science or a related discipline. A solid software engineering background with 6+ years of experience in performance-critical systems. A proven ability to own complex system components and influence architectural decisions from conception to execution. A deep understanding of ML inference internals, including attention mechanisms, MLPs, recurrent modules, quantization, and sparse operations. Hands-on experience with CUDA, GPU programming, and essential libraries (cuBLAS, cuDNN, NCCL, etc.). A strong foundation in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning. Demonstrated proficiency in diagnosing and resolving performance bottlenecks across multiple layers (kernel, memory, networking, scheduler).
Who are we?At Cohere, our mission is to elevate intelligence to benefit humanity. We specialize in training and deploying cutting-edge models for developers and enterprises focused on creating AI systems that deliver extraordinary experiences such as content generation, semantic search, retrieval-augmented generation, and intelligent agents. We view our work as pivotal to the broad acceptance of AI technologies.We are passionate about our creations. Every team member plays a vital role in enhancing our models' capabilities and the value they provide to our customers. We thrive on hard work and speed, always prioritizing our clients' needs.Cohere is a diverse team of researchers, engineers, designers, and more, all dedicated to their craft. Each individual is a leading expert in their field, and we recognize that a variety of perspectives is essential to developing exceptional products.Join us in our mission and help shape the future of AI!Why this role?Are you excited about architecting high-performance, scalable, and reliable machine learning systems? Do you aspire to shape and construct the next generation of AI platforms that enhance advanced NLP applications? We are seeking talented Members of Technical Staff to join our Model Serving team at Cohere. This team is responsible for the development, deployment, and operation of our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. In this role, you will collaborate with multiple teams to deploy optimized NLP models in production settings characterized by low latency, high throughput, and robust availability. Additionally, you will have the opportunity to work directly with customers to create tailored deployments that fulfill their unique requirements.
Full-time|$200K/yr - $400K/yr|Remote|San Francisco
At Inferact, we are on a mission to establish vLLM as the premier AI inference engine, revolutionizing AI progress by making inference both more accessible and efficient. Our founding team consists of the original creators and key maintainers of vLLM, positioning us uniquely at the nexus of cutting-edge models and advanced hardware.Role OverviewWe are seeking a passionate inference runtime engineer eager to explore and expand the frontiers of LLM and diffusion model serving. As models evolve and grow in complexity with new architectures like mixture-of-experts and multimodal designs, the demand for innovative solutions in our inference engine intensifies. This role places you at the heart of vLLM, where you will enhance model execution across a variety of hardware platforms and architectures. Your contributions will have a direct influence on the future of AI inference.
Join Anthropic as an Engineering Manager to lead our innovative Cloud Inference team utilizing AWS technologies. In this pivotal role, you will drive efforts to enhance the efficiency and scalability of our cloud systems while ensuring robust performance and reliability. Your leadership will inspire a talented team of engineers to solve complex challenges, implement best practices, and foster a culture of continuous improvement.
Join DigitalOcean as a Senior Engineer focused on Inference Optimizations, where you will play a pivotal role in enhancing our AI and machine learning capabilities. Collaborate with a talented team to develop cutting-edge solutions that optimize inference processes across various applications.
OverviewAt Pulse, we are revolutionizing the way data infrastructure operates by addressing the critical challenge of accurately extracting structured information from intricate documents on a large scale. Our innovative document understanding technique merges intelligent schema mapping with advanced extraction models, outperforming traditional OCR and parsing methods.Located in the heart of San Francisco, we are a dynamic team of engineers dedicated to empowering Fortune 100 enterprises, YC startups, public investment firms, and growth-stage companies. Backed by top-tier investors, we are rapidly expanding our footprint in the industry.What sets our technology apart is our sophisticated multi-stage architecture, which includes:Specialized models for layout understanding and component detectionLow-latency OCR models designed for precise extractionAdvanced algorithms for reading-order in complex document structuresProprietary methods for table structure recognition and parsingFine-tuned vision-language models for interpreting charts, tables, and figuresIf you possess a strong passion for the convergence of computer vision, natural language processing, and data infrastructure, your contributions at Pulse will significantly impact our clients and help shape the future of document intelligence.
About Our TeamJoin the Inference team at OpenAI, where we leverage cutting-edge research and technology to deliver exceptional AI products to consumers, enterprises, and developers. Our mission is to empower users to harness the full potential of our advanced AI models, enabling unprecedented capabilities. We prioritize efficient and high-performance model inference while accelerating research advancements.About the RoleWe are seeking a passionate Software Engineer to optimize some of the world's largest and most sophisticated AI models for deployment in high-volume, low-latency, and highly available production and research environments.Key ResponsibilitiesCollaborate with machine learning researchers, engineers, and product managers to transition our latest technologies into production.Work closely with researchers to enable advanced research initiatives through innovative engineering solutions.Implement new techniques, tools, and architectures that enhance the performance, latency, throughput, and effectiveness of our model inference stack.Develop tools to identify bottlenecks and instability sources, designing and implementing solutions for priority issues.Optimize our code and Azure VM fleet to maximize every FLOP and GB of GPU RAM available.You Will Excel in This Role If You:Possess a solid understanding of modern machine learning architectures and an intuitive grasp of performance optimization strategies, especially for inference.Take ownership of problems end-to-end, demonstrating a willingness to acquire any necessary knowledge to achieve results.Bring at least 5 years of professional software engineering experience.Have or can quickly develop expertise in PyTorch, NVidia GPUs, and relevant optimization software stacks (such as NCCL, CUDA), along with HPC technologies like InfiniBand, MPI, and NVLink.Have experience in architecting, building, monitoring, and debugging production distributed systems, with bonus points for working on performance-critical systems.Have successfully rebuilt or significantly refactored production systems multiple times to accommodate rapid scaling.Are self-driven, enjoying the challenge of identifying and addressing the most critical problems.
About Our TeamJoin OpenAI’s dynamic Inference team, where we empower the deployment of cutting-edge AI models, including our renowned GPT models, advanced Image Generation capabilities, and Whisper, across diverse platforms. Our mission is to ensure these models are not only high-performing and scalable but also available for real-world applications. Collaborating closely with our Research team, we’re committed to bringing the next generation of AI innovations to fruition. As a compact, agile team, we prioritize delivering an exceptional developer experience while continuously pushing the frontiers of artificial intelligence.As we expand our focus into multimodal inference, we are building the necessary infrastructure to support models that process images, audio, and other non-text modalities. This work involves tackling diverse model sizes and interactions, managing complex input/output formats, and ensuring seamless collaboration between product and research teams.About The RoleWe are seeking a passionate Software Engineer to aid in the large-scale deployment of OpenAI’s multimodal models. You will join a small yet impactful team dedicated to creating robust, high-performance infrastructure for real-time audio, image, and various multimodal workloads in production environments.This position is inherently collaborative; you will work directly with researchers who develop these models and with product teams to define novel interaction modalities. Your contributions will enable users to generate speech, interpret images, and engage with models in innovative ways that extend beyond traditional text-based interactions.Key Responsibilities:Design and implement advanced inference infrastructure for large-scale multimodal models.Optimize systems for high-throughput and low-latency processing of image and audio inputs and outputs.Facilitate the transition of experimental research workflows into dependable production services.Engage closely with researchers, infrastructure teams, and product engineers to deploy state-of-the-art capabilities.Contribute to systemic enhancements, including GPU utilization, tensor parallelism, and hardware abstraction layers.You May Excel In This Role If You:Have a proven track record of building and scaling inference systems for large language models or multimodal architectures.Possess experience with GPU-based machine learning workloads and a solid understanding of the performance dynamics associated with large models, particularly with intricate data types like images or audio.Thrive in a fast-paced, experimental environment and enjoy collaborating with cross-functional teams to drive impactful results.
At Lyft, we are dedicated to connecting people and creating a community where every team member feels valued and empowered to reach their full potential.As a pivotal player in transforming how our communities move, Lyft's engineering team is rapidly expanding. We are seeking passionate Software Engineers specialized in Security to join our dynamic Security team. Together, we will enhance our ability to deliver secure services at scale.Lyft is entrusted with the sensitive information of both drivers and passengers, and we take the responsibility of safeguarding that data seriously. Our Security team spearheads initiatives across the organization to protect our systems and uphold user trust.Our work encompasses designing and building a robust security architecture, collaborating with various teams during the development and launch of new products, anticipating potential challenges, and managing security incidents effectively. Our impact spans the entire organization, covering all aspects of the technology stack, including infrastructure, web applications, mobile apps, IT, and even autonomous vehicles. We adopt an engineering-focused approach to security, aiming to automate and streamline our processes while ensuring frequent updates. Explore more about our innovations on our blog at https://eng.lyft.com/tagged/security.The Cloud Security team is dedicated to enhancing Lyft's security posture by architecting a comprehensive security model tailored for our cloud infrastructure, protecting both our employees and intellectual property.As a Senior Software Engineer, you will play a crucial role in shaping this team and driving high-impact security initiatives. Your responsibilities will include leading security reviews, implementing detection measures, addressing vulnerabilities, enforcing the principle of least privilege, and establishing secure configurations for our multi-cloud and container environments.
At Magic, we are driven by our mission to develop safe Artificial General Intelligence (AGI) that propels humanity forward in addressing the most critical challenges. We firmly believe that the future of safe AGI lies in automating research and code generation, allowing us to enhance models and tackle alignment issues more effectively than humans alone can manage. Our innovative approach combines cutting-edge pre-training, domain-specific reinforcement learning (RL), ultra-long context, and efficient inference-time computation to realize this vision.Position OverviewAs a Software Engineer within the Inference & RL Systems team, you will play a pivotal role in designing and managing the distributed systems that enable our models to function seamlessly in production, supporting extensive post-training workflows.This position operates at the intersection of model execution and distributed infrastructure, focusing on systems that influence inference latency, throughput, stability, and the reliability of RL and post-training training loops.Our long-context models impose significant execution demands, including KV-cache scaling, managing memory constraints for lengthy sequences, batching strategies, long-horizon trajectory rollouts, and ensuring consistent throughput under real-world workloads. You will be responsible for the infrastructure that ensures both production inference and large-scale RL iterations are efficient and dependable.Key ResponsibilitiesCraft and scale high-performance inference serving systems.Optimize KV-cache management, batching methods, and scheduling processes.Enhance throughput and latency for long-context tasks.Develop and sustain distributed RL and post-training infrastructure.Boost reliability across rollout, evaluation, and reward pipelines.Automate fault detection and recovery mechanisms for serving and RL systems.Analyze and eliminate performance bottlenecks across GPU, networking, and storage components.Collaborate with Kernel and Research teams to ensure alignment between execution systems and model architecture.QualificationsSolid foundation in software engineering and distributed systems.Proven experience in building or managing large-scale inference or training systems.In-depth understanding of GPU execution constraints and memory trade-offs.Experience troubleshooting performance issues in production machine learning systems.Capability to analyze system-level trade-offs between latency, throughput, and cost.
About Our TeamThe Inference team at OpenAI is dedicated to translating our cutting-edge research into accessible, transformative technology for consumers, enterprises, and developers. By leveraging our advanced AI models, we enable users to achieve unprecedented levels of innovation and productivity. Our primary focus lies in enhancing model inference efficiency and accelerating progress in research through optimized inference capabilities.About the RoleWe are seeking talented engineers to expand and optimize OpenAI's inference infrastructure, specifically targeting emerging GPU platforms. This role encompasses a wide range of responsibilities from low-level kernel optimization to high-level distributed execution. You will collaborate closely with our research, infrastructure, and performance teams to ensure seamless operation of our largest models on cutting-edge hardware.This position offers a unique opportunity to influence and advance OpenAI’s multi-platform inference capabilities, with a strong emphasis on optimizing performance for AMD accelerators.Your Responsibilities Include:Overseeing the deployment, accuracy, and performance of the OpenAI inference stack on AMD hardware.Integrating our internal model-serving infrastructure (e.g., vLLM, Triton) into diverse GPU-backed systems.Debugging and optimizing distributed inference workloads across memory, network, and compute layers.Validating the correctness, performance, and scalability of model execution on extensive GPU clusters.Collaborating with partner teams to design and optimize high-performance GPU kernels for accelerators utilizing HIP, Triton, or other performance-centric frameworks.Working with partner teams to develop, integrate, and fine-tune collective communication libraries (e.g., RCCL) to parallelize model execution across multiple GPUs.Ideal Candidates Will:Possess experience in writing or porting GPU kernels using HIP, CUDA, or Triton, with a strong focus on low-level performance.Be familiar with communication libraries like NCCL/RCCL, understanding their importance in high-throughput model serving.Have experience with distributed inference systems and be adept at scaling models across multiple accelerators.Enjoy tackling end-to-end performance challenges across hardware, system libraries, and orchestration layers.Be eager to join a dynamic, agile team focused on building innovative infrastructure from the ground up.
Full-time|$204K/yr - $247K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to enhance the availability of energy and intelligence. We are developing the driving force behind a future where individuals can harness the power of AI without compromising on scale, speed, or sustainability.Join the AI revolution with sustainable technology at Crusoe. This is your chance to lead impactful innovations, contribute to meaningful projects, and collaborate with a team dedicated to pioneering responsible and transformative cloud infrastructure.Role Overview:As an integral member of the Crusoe Managed AI Services team, you will oversee the entire product lifecycle for our Managed Inference services. From conceptualization and strategic planning to execution and market introduction, you will be the driving force behind our inference service offerings. Your ability to translate market demands and technical details into succinct product specifications and narratives will be crucial in fostering business growth for Crusoe Cloud.This position is a Staff-level individual contributor role that offers considerable autonomy and influence. You will act as a senior product owner for a pivotal segment of our platform, collaborating closely with engineering, infrastructure, and go-to-market teams to expand and enhance Crusoe’s inference capabilities as the organization evolves.This is a unique opportunity to shape and develop a foundational product area within a rapidly growing and innovative company.Key Responsibilities:Lead the complete product lifecycle for Crusoe’s Managed Inference services, encompassing roadmap creation, execution, and iterative improvements.Convert customer feedback, market insights, and technical limitations into clear product requirements and prioritization strategies.Collaborate effectively with Engineering, Infrastructure, and Platform teams to provide scalable and dependable inference services.Influence product decisions regarding performance, reliability, cost-effectiveness, and user experience for developers.Establish and monitor success metrics for inference services operating in production environments.Work alongside go-to-market teams to facilitate product launches, brand positioning, and customer engagement.Articulate product strategy and decisions clearly to cross-functional partners and leadership.
We are seeking a highly skilled Senior Engineer to join our Inference Data Plane team at DigitalOcean. In this pivotal role, you will be responsible for designing and implementing advanced data processing solutions that facilitate machine learning inference at scale. You will work collaboratively with cross-functional teams to optimize our data infrastructure and ensure reliable performance.
Full-time|On-site|San Francisco, CA | New York City, NY
Role overview Anthropic seeks a Technical Program Manager to support the Cloud Inference team. This position centers on steering technical projects that influence the development of cloud inference solutions. The role is located in either San Francisco, CA or New York City, NY. What you will do Oversee complex initiatives that move Anthropic’s cloud inference technologies forward Collaborate with engineers and partner teams to ensure delivery of dependable solutions Organize and synchronize work across different functions to achieve project objectives and deadlines
Full-time|$209K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to catalyze the proliferation of energy and intelligence. We are engineering the driving force behind a future where individuals can ambitiously create with AI without compromising on scale, speed, or sustainability.Join us at Crusoe as we lead the charge in the AI revolution through sustainable technology. You will play a pivotal role in fostering meaningful innovation, making a significant impact, and collaborating with a team that is pioneering the development of responsible and transformative cloud infrastructure.Position Overview:We are in search of experienced Staff/Senior Staff Software Engineers who will be tasked with the architecture, design, and development of advanced Cloud Infrastructure management systems and platforms. You will be vital in delivering end-to-end use cases and workflows for our integrated AI-First Crusoe Cloud. Your contributions will be essential in constructing systems and platforms that effectively plan, monitor, deploy, and operate Crusoe Cloud, achieving key business revenue metrics.Your expertise will be crucial in evaluating, implementing, and building platforms, tools, and frameworks that prioritize reliability, scalability, operational efficiency, and user-friendliness. You will enhance our infrastructure planning and management workflows, driving efficiency and improving the overall performance and reliability of our cloud platform as we ambitiously scale our Crusoe Cloud products and services by more than 10X.In this role, you will also develop and refine technical designs and architectures, mentor fellow engineers, and actively contribute to the growth of the team in partnership with engineering managers.Your Key Responsibilities:Engage collaboratively across teams to design, architect, and implement physical infrastructure management software systems and availability platforms that meet end-to-end customer use cases, ensuring an exceptional customer experience.Champion the reliability, scalability, and security of our systems and platforms, acting as the guardian of our infrastructure!Create workflows designed to enhance efficiency and achieve key business objectives and metrics.Design and implement high-performance, highly available cloud architectures, optimizing for both performance and cost-effectiveness.Enhance cloud deployment, configuration management, and operations by developing and maintaining effective platforms, interfaces, and automation tools.Actively participate in the evolution of our platform, working closely with cross-functional teams.
Full-time|On-site|CA - San Francisco; WA - Seattle
Join Sofi as a Senior Software Engineer specializing in Cloud Efficiency, where you will play a crucial role in enhancing the performance and scalability of our cloud-based applications. You will collaborate with cross-functional teams to design and implement innovative solutions that drive operational excellence and customer satisfaction.Your expertise in cloud technologies and software development will be instrumental in optimizing our infrastructure, ensuring reliability, and supporting the growth of our dynamic platform.
Full-time|$180K/yr - $250K/yr|On-site|San Francisco
Join our innovative team at fal as a Staff Software Engineer specializing in large-scale computation platforms. We are seeking a seasoned software engineer with extensive experience in developing backend systems that efficiently orchestrate workloads and manage resource constraints. Your expertise in foundational cloud infrastructure and Linux provisioning will be crucial as you work towards achieving high reliability and scalability with minimal operational overhead.
Full-time|$240K/yr - $310K/yr|On-site|San Francisco, CA - US
At Crusoe, we are dedicated to accelerating the abundance of energy and intelligence. As a pioneering AI infrastructure company, we control every aspect of our operations — from energy generation to the digital tokens that power the world’s most ambitious AI workloads. Joining Crusoe means being part of a team that is shaping the future at an unprecedented pace.We are amid a transformative industrial revolution. The endless demand for AI computing power poses significant challenges, particularly concerning energy supply. Our energy-first strategy not only enhances AI infrastructure but also contributes positively to the environment, empowering innovators in the AI sector.We seek proactive, problem-solving team members who recognize the scale of our mission and are eager to navigate uncharted territories. If you aspire to advance your career alongside experts in energy, manufacturing, data center construction, and cloud services, we invite you to become part of our dynamic team.If you are ready to engage in the most impactful work of your career, assist our customers and partners in elevating their AI strategies, and contribute to a high-performing, supportive team, we welcome you to build the future with us at Crusoe.About This RoleThe Cloud Storage team at Crusoe is searching for a Senior Staff Software Engineer to act as the principal architect for our storage strategy. Unlike a Staff Engineer who leads feature development, a Senior Staff Engineer will define the long-term technical roadmap essential for our AI-scale infrastructure. You will play a crucial role in establishing the architectural strategy, ensuring the integrity and global scalability of our specialized storage services. Your work will focus on the underlying physics of the stack, bridging high-performance NVMe hardware with globally distributed object storage solutions that compete with S3.Your ResponsibilitiesArchitectural Vision & Strategy: Lead the development and execution of the long-term technical strategy for Crusoe's storage engine, while identifying and integrating industry trends such as CXL and NVMe-oF into a unified roadmap.System Programming Expertise: Utilize your extensive experience in system programming with languages such as C, C++, Go, and Rust to lay the groundwork for our V2 storage re-architecture.Storage Protocols: Design and implement solutions employing industry-standard storage protocols, including NFS, SMB, iSCSI, and NVMe/TCP.
Apr 8, 2026
Sign in to browse more jobs
Create account — see all 7,352 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.