Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
QualificationsSuccessful candidates will have extensive experience in software engineering, particularly in infrastructure and platforms. Strong proficiency in languages such as Python, Go, or Java is essential, along with a solid understanding of cloud services (AWS, Azure, Google Cloud). Experience with container technologies (Docker, Kubernetes) and orchestration frameworks is a plus. A proven track record in building scalable systems and a passion for AI technologies will distinguish you as an ideal candidate. A Bachelor's or Master's degree in Computer Science or a related field is preferred.
In the role of Senior Software Engineer, Infrastructure & Platform at AfterQuery, you will take on the exciting challenge of designing and constructing the essential infrastructure that drives our innovative data generation, evaluation, and agentic systems.
Your responsibilities will include developing shared platforms that empower our engineering and research teams to execute large-scale human-in-the-loop workflows, evaluation harnesses, and automated data pipelines essential for training cutting-edge AI models.
This position demands a high level of technical expertise and offers extensive ownership. You will be responsible for architecting and building the foundational infrastructure relied upon by numerous engineers, ensuring that systems are scalable, reliable, and capable of handling high-throughput workloads.
Collaboration with the founding team will be key as you define system architecture, establish best engineering practices, and create the infrastructure that supports the evolution of AI development.
About AfterQuery
AfterQuery is at the forefront of AI research, pushing the limits of what is possible through innovative datasets and experimentation. We cater to foundational model labs and serve the most advanced AI laboratories globally. Located in San Francisco, CA, we have secured funding from prominent investors including Y Combinator and BoxGroup, alongside industry veterans from Google DeepMind and Meta GenAI. Our founding team boasts expertise from prestigious organizations such as Jane Street, Meta, Citadel Securities, Google, and Stanford's AI labs.
Similar jobs
1 - 20 of 7,155 Jobs
Search for Senior Software Engineer Infrastructure
Who We AreServal is an innovative AI-driven automation platform redefining operational efficiency for enterprises. Our intelligent agents seamlessly comprehend and execute real-world workflows, replacing outdated manual processes with adaptive, self-learning software. Since our inception in early 2024, we have garnered the trust of industry leaders such as General Motors, Notion, Perplexity, Vercel, Mercor, LangChain, and Verkada, streamlining high-volume operational tasks across their organizations.At the heart of Serval is a cutting-edge agentic AI platform that transforms natural language into actionable workflows. Our agents not only respond to queries but also reason, act across various systems, and continuously enhance their performance. What started as a solution for operational tasks has rapidly expanded into a versatile AI automation layer utilized across IT, HR, Finance, Security, Legal, and Engineering sectors.Our mission is to eradicate repetitive, manual tasks within enterprises, empowering teams through intelligent automation. In the long run, we aim to establish a universal AI operations layer—a system of agents that integrates across business functions, maintaining the momentum of modern companies.We are proud to be backed by renowned investors including Sequoia Capital, Redpoint Ventures, Meritech, First Round, General Catalyst, and Elad Gil, and founded by seasoned product and engineering leaders from Verkada.Role OverviewAs a Senior Software Engineer in Infrastructure at Serval, you will be pivotal in developing and scaling the core systems that empower our AI agents and workflow automation platform. A crucial aspect of this role involves enabling and supporting self-hosted deployments for enterprise clients needing on-premises or private cloud environments. We are looking for engineers with profound expertise in distributed systems, infrastructure-as-code, production operations, and customer-facing support, who aspire to influence the technical architecture of a rapidly evolving platform.What You'll DoDesign, implement, and operate large-scale distributed systems that power Serval's AI agents, workflow orchestration, and data pipelines.Create and maintain Terraform modules to provision and manage cloud infrastructure across AWS, GCP, or Azure environments.Develop and sustain deployment packages, installation scripts, and infrastructure templates, enabling customers to self-host Serval in their own environments.Provide technical support and guidance to enterprise customers during installation and deployment phases.
Join our innovative team at Astranis as a Senior Software Engineer specializing in Infrastructure. In this role, you will be responsible for designing, implementing, and maintaining robust infrastructure solutions that support our cutting-edge satellite technology. Your expertise will play a crucial role in enhancing the reliability and scalability of our systems.
Full-time|$190K/yr - $280K/yr|Hybrid|San Francisco, California
About SentrySentry is dedicated to eliminating poor software experiences. Our mission is to empower developers to create high-quality software swiftly, allowing everyone to enjoy technology to its fullest.With over $217 million raised in funding and a community of over 100,000 organizations, including giants like Disney, Microsoft, and Atlassian, we are developing state-of-the-art performance and error monitoring tools. Our solutions help our partners minimize time spent on bug fixes and maximize product development.In our commitment to collaboration, Sentry follows a hybrid work model across our global offices. We have designated Mondays, Tuesdays, and Thursdays as in-office days to foster effective teamwork. If you are passionate about building tools that enhance the digital experience, join us in creating the next generation of software monitoring solutions.About the RoleAt Sentry.io, we offer vital services for diagnosing application health issues. Our tools are crucial for organizations aiming to respond adeptly in dynamic markets. We ensure a seamless and enjoyable experience in the development and deployment of these tools through a robust continuous integration environment and an insightful deployment pipeline.As part of the Infrastructure Engineering team, your contributions will be instrumental in supporting Sentry's growth and enabling engineering teams to operate with agility and confidence.Your responsibilities will include designing, developing, and maintaining internal software and platform capabilities that alleviate the cognitive load associated with infrastructure and developer tooling. You will create dependable, reusable abstractions that facilitate rapid shipping of features while incorporating durability, security, and operational excellence into service development and management.This role demands strong engineering judgment: selecting reliable technologies, planning for scalability from the outset, and crafting solutions that serve multiple teams. Your focus will be on practical systems that enhance reliability and ownership across the organization, driving adoption through comprehensive documentation, well-designed APIs, and seamless developer experiences that integrate into daily workflows.Ultimately, you will empower engineering teams to flourish within a culture of ownership—enabling them to deploy, manage, and evolve services confidently while minimizing operational burdens.Key ResponsibilitiesDesign systems that scale with company growth, ensuring a balance of reliability, performance, and cost-efficiency.Develop platform services that enhance internal operations and developer productivity.
Compensation: Competitive base salary + substantial equityBenefits: Health & dental insurance, gym reimbursement, daily team lunches, 401(K)About JuliusAt Julius, we're pioneering advancements in applied AI by developing cutting-edge coding agents. Our platform executes approximately 1 million lines of code every 36 hours, serving over 1 million users and generating 3 million+ visualizations. We manage all code in isolated remote containers. As a revenue-generating entity, we are backed by AI Grant and founders with remarkable backgrounds from companies like Vercel, Notion, Perplexity, Palantir, Replit, Zapier, Intercom, and Dropbox.The RoleJoin us in building and scaling the robust code-execution platform that powers Julius, across both cloud and on-prem environments. We orchestrate over 500,000 containers/month and the demand is growing rapidly. You will take ownership of reliability, performance, and security within our multi-tenant compute environment.Your ResponsibilitiesDesign and manage a secure, multi-tenant container infrastructure that ensures quick startup and intelligent autoscaling.Implement on-prem/private cloud deployments using Helm and Terraform, integrating SSO, network controls, and audit logging.Enhance observability (metrics, traces, logs) with well-defined SLOs and lead incident response initiatives.Optimize images, scheduling, networking, and costs, while developing fair-use and rate-limiting controls.Your QualificationsStrong experience with production Kubernetes and container internals (Docker/containerd); solid understanding of networking principles.Familiarity with cloud environments (AWS/GCP/Azure) and Infrastructure as Code (Terraform/Helm).Proficiency in monitoring and logging tools (Prometheus, Grafana, OpenTelemetry, ELK/Vector).Understanding of security best practices for containerized, multi-tenant systems.Preferred QualificationsExperience with gVisor, Kata, Firecracker; Cilium/eBPF; GPU scheduling; serverless autoscaling (KEDA/Knative/Karpenter).Proven experience delivering on-prem or air-gapped enterprise software solutions.A passion for AI, with experience building side projects involving LLMs.Why Join Julius?Be part of a small, senior team where your contributions will have a massive impact. Tackle challenging infrastructure problems at a meaningful scale.
Senior Software Engineer, Infrastructure & PlatformRole OverviewIn the role of Senior Software Engineer, Infrastructure & Platform at AfterQuery, you will take on the exciting challenge of designing and constructing the essential infrastructure that drives our innovative data generation, evaluation, and agentic systems.Your responsibilities will include developing shared platforms that empower our engineering and research teams to execute large-scale human-in-the-loop workflows, evaluation harnesses, and automated data pipelines essential for training cutting-edge AI models.This position demands a high level of technical expertise and offers extensive ownership. You will be responsible for architecting and building the foundational infrastructure relied upon by numerous engineers, ensuring that systems are scalable, reliable, and capable of handling high-throughput workloads.Collaboration with the founding team will be key as you define system architecture, establish best engineering practices, and create the infrastructure that supports the evolution of AI development.
Join our dynamic team at Parafin as a Senior Software Engineer specializing in Infrastructure. In this pivotal role, you will design, develop, and maintain robust infrastructure solutions that support our scalable applications. Your expertise will help us enhance system performance, reliability, and security.We are looking for innovative thinkers who thrive in a collaborative environment. You will work closely with cross-functional teams to implement cutting-edge technologies that drive our product forward.
Full-time|$200K/yr - $200K/yr|On-site|San Francisco
Join Convex in revolutionizing application development!At Convex, we are on a mission to redefine how software is constructed on the Internet. Our innovative platform enables developers to create swift, dependable, and dynamic applications without the need for a backend team. We offer a comprehensive full-stack application platform, meticulously designed with abstractions for databases, computing, and backend services, allowing both developers and LLMs to innovate rapidly, ensuring products that are scalable and maintain simplicity throughout their lifecycle.About Our Team:Our Convex team comprises engineers who have architected and built some of the largest backends globally, managing exabytes of data and millions of transactions per second. We are a friendly, collaborative group of passionate individuals who thrive on in-person collaboration in our San Francisco office.Position Overview:As Convex evolves, we are seeking outstanding senior or staff-level engineers to help us architect and sustain the future of our infrastructure at scale. If you have a passion for distributed systems and a robust background in designing and managing web infrastructure, we want to connect with you!We value robust architecture, effective collaboration, and simplicity. Our team embraces high ownership and places significant emphasis on operational excellence. This role is not solely focused on operations; we seek individuals who are dedicated to designing and constructing systems in the most effective manner possible, especially in a startup environment.Your Responsibilities:Architect, construct, and oversee Convex’s global cloud infrastructure.Analyze and enhance the performance and reliability of our systems.Independently prioritize projects, collaborating closely with the engineering team and CTO.Establish best practices and reliability standards as we expand our team and systems.Develop sophisticated systems and database code.Engage with feedback from leadership regarding seeking simpler and more elegant solutions.What We Value:A strong enthusiasm for distributed systems and backend infrastructure.A collaborative spirit and a desire to grow with the team.A commitment to best practices and maintaining high standards in engineering.
Databricks is looking for a Senior Software Engineer focused on Compute Infrastructure in San Francisco, California. This position centers on building and improving compute architecture to support greater performance and scalability across Databricks' platform. What you will do Develop and optimize compute infrastructure to handle demanding data processing and analytics workloads. Work closely with teams from different disciplines to deliver reliable, high-quality solutions for customers. Impact Your contributions will help define how data processing and analytics evolve at Databricks. The work directly supports customers’ ability to scale and perform complex tasks in the cloud. Who we’re looking for Strong background in cloud technologies and compute systems. Enjoys tackling complex technical challenges. Collaborative approach to problem-solving with cross-functional teams.
At Hover, we empower individuals to conceptualize, enhance, and safeguard the spaces they cherish. Utilizing proprietary AI and over a decade's worth of real property data, we provide answers to pivotal questions such as, 'What will it look like?' and 'What will it cost?' Our platform offers homeowners, contractors, and insurance professionals accurately measured, interactive 3D models of properties — all achievable from a smartphone scan in mere minutes.Driven by curiosity and purpose, we maintain a strong commitment to our customers, communities, and one another. We believe that diverse perspectives foster the best ideas, and we take pride in nurturing an inclusive, high-performance culture that encourages growth, accountability, and excellence. Supported by premier investors like Google Ventures and Menlo Ventures, and trusted by industry leaders such as Travelers, State Farm, and Nationwide, we are revolutionizing how individuals perceive and interact with their environments.About the RoleAs a Senior Software Engineer specializing in Infrastructure, you will delve into cloud infrastructure challenges unique to a company focused on 3D data, computer vision, and machine learning. Your enthusiasm for building internal tools and your talent for crafting elegant solutions to complex issues will be crucial in this role.Our Infrastructure team is responsible for everything beyond the application binary, serving as a critical partner to the rest of the engineering department. Through automation, we aim to streamline processes, ensuring that the simplest path is also the fastest and most secure. We manage and optimize all cloud infrastructure components including our Kubernetes environment, databases, networks, storage, and caching systems. Collaborating with engineering peers, we establish consistent solutions to common architectural challenges, particularly those involving rich geospatial and machine learning workloads. We are well-versed in best practices for cloud architecture and CI/CD, leveraging application development as a means to implement these practices.Your ContributionsYou will play a pivotal role in developing straightforward solutions to intriguing problems, thereby enhancing the foundation upon which our engineering teams build. Collaborating closely with engineers across the organization, you will help make their applications faster, easier to manage, and more reliable in production. Your work will span frontend, backend, computer vision, data, security, and machine learning teams to scale new ideas into production effectively. Given the small and highly collaborative nature of our team, you can expect a varied and impactful workload, which may include:Designing scalable cloud architectureEnhancing CI/CD pipelines and developer tooling
Join the Space Exploration Journey!As a Senior Software Engineer specializing in Space Infrastructure, you will play a pivotal role in enhancing our capabilities to manage a diverse fleet of satellites, including dedicated, rideshare, and constellation missions. Your work will involve the integration of automated satellite operations, both ground and flight software, while tackling challenges encountered in orbit.Our team is dedicated to ensuring the dependable, efficient, and standardized performance of Loft’s space infrastructure. You will oversee the operational stability of Loft satellites, focusing on the satellite bus, the Hub, and Loft's payloads, which serve as platforms for executing customer missions.Reliability is the cornerstone of Loft's business model and that of our clients. This role offers you the flexibility to engage with various systems, from coding for Cockpit, our mission control system, to writing software that runs onboard our satellites. Additionally, you may have the opportunity to serve as a Flight Director, overseeing the health and safety of our satellite fleet.
Full-time|$170K/yr - $220K/yr|On-site|San Francisco, CA
At Siftstack, we are revolutionizing the development, testing, and operation of modern machines. Our innovative platform provides engineers with real-time observability over high-frequency telemetry, effectively removing bottlenecks and accelerating the development process.Emerging from our groundbreaking work at SpaceX on projects including Dragon, Falcon, Starlink, and Starship, Siftstack was founded by an exceptional team with experience from SpaceX, Google, and Palantir. We are dedicated to building mission-critical systems where precision and scalability are essential.As a senior engineer at Siftstack, you will not just write code; you will have a significant role in shaping the architecture, guiding the product's evolution, and influencing the culture of a company that tackles real engineering challenges. If you are eager to face complex technical issues and contribute to foundational systems for innovative machines, we want to connect with you.
Join the Crew at Ivo! Engineering ExcellenceAt Ivo, we are pioneers in the realm of technology. Our engineers are the architects of innovation, creating groundbreaking solutions that redefine the industry. Recent triumphs include:An AI-powered assistant integrated into MS Word that edits documents with remarkable precision.Revolutionizing embedding models with cutting-edge agentic RAG technology.Pioneering legal fact extraction with large-scale LLM applications.Developing an intelligent legal assistant capable of navigating vast contract databases while ensuring accuracy.Implementing advanced clustering techniques for legal documents based on familial relationships.Introducing automatic deviation analysis to uncover hidden risks across extensive contract databases.Merging contracts with amendments to create comprehensive timelines that have left clients in tears of joy. Your RoleAs a Senior Infrastructure Engineer, you will lay the groundwork for Ivo's platform. Your responsibilities will include:Shaping the future of our infrastructure while enjoying the flexibility to design our systems.Managing a multitude of customer deployments, each with unique containers, databases, and VPCs.Instrumenting the system to identify performance bottlenecks and errors.Creating intuitive dashboards and alerts to aggregate metrics, logs, and health checks.Leading incident responses related to infrastructure challenges.Optimizing our CI/CD pipeline to improve deployment times significantly. We seek someone passionate about LLMs and eager to push the boundaries of DevOps innovation. Join us and be a vital part of our engineering team!
Full-time|$172K/yr - $209K/yr|On-site|San Francisco, CA - US
About Crusoe Energy Systems Crusoe Energy Systems manages every layer of AI infrastructure, from energy generation to advanced computational resources. The team focuses on making AI infrastructure more efficient and environmentally conscious, addressing the growing global demand for computing power. Based in San Francisco, Crusoe brings together experts in energy, manufacturing, data center construction, and cloud services. Role Overview: Senior Software Engineer - Cloud Infrastructure This Senior Software Engineer position centers on designing and building cloud infrastructure management systems for Crusoe Cloud, a vertically integrated, AI-focused platform. The engineer will help deliver complete solutions that support the company’s business goals, including system planning, monitoring, deployment, and operations. The role involves hands-on work developing platforms, tools, and frameworks that emphasize reliability, scalability, operational efficiency, and ease of use. As Crusoe Cloud grows, this engineer will play a key part in streamlining infrastructure planning and management processes. What You Will Do Work closely with cross-functional teams to design and implement infrastructure management software and availability platforms for customers using Crusoe’s AI infrastructure. Help improve the reliability, scalability, and security of systems and platforms. Develop workflows that support business objectives and performance targets. Build and maintain high-performing, highly available cloud solutions to meet expanding infrastructure needs. Who Thrives Here Engineers who enjoy solving complex problems, move quickly, and want to work alongside a diverse, supportive team will find this role rewarding. Crusoe values collaboration and a shared drive to advance AI infrastructure. Location San Francisco, CA - US
Full-time|$166K/yr - $225K/yr|On-site|San Francisco, California
P-78 While candidates in the listed locations are encouraged for this role, candidates in other locations will be considered. At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging problems—from realizing the next mode of transportation to advancing medical breakthroughs. We accomplish this by creating and managing the premier data and AI infrastructure platform, enabling customers to leverage deep data insights for business enhancement. Founded by engineers and with a strong customer focus, we eagerly embrace every opportunity to address technical challenges, from crafting cutting-edge UI/UX for data interaction to scaling our services and infrastructure across millions of virtual machines. And this is just the beginning. As a Senior Software Engineer on the Infrastructure teams, you will develop scalable systems that underpin the Databricks platform, positioning it as the go-to solution for executing Big Data and AI workloads. Your role will involve enhancing the Databricks infrastructure platform, encompassing multi-cloud systems and services designed to manage thousands of Kubernetes clusters at scale, storing petabytes of data, providing highly scalable and distributed API gateways, implementing a rate limiting framework, ensuring network security and encryption, and creating developer tools and infrastructure (we utilize Bazel), testing frameworks, and scalable CI/CD systems, among many other responsibilities. The impact you will have: Expand and enhance key components of the core Databricks infrastructure. Design multi-cloud systems and abstractions to enable the Databricks product to operate across existing Cloud providers. Enhance software development workflows to improve engineering and operational efficiency. Utilize our own data and AI platform to analyze build and test logs and metrics, identifying areas for enhancement. Create automated build, test, and release infrastructures. Establish and maintain engineering process standards to support our growth and success.
Full-time|$196K/yr - $220.5K/yr|On-site|San Francisco Bay Area
At Discord, we connect over 200 million users monthly for diverse experiences, with gaming being the predominant activity. Our platform supports more than 90% of our users in enjoying games, collectively logging 1.5 billion hours each month across various titles. As we shape the future of gaming, our mission is to enhance interactions before, during, and after gaming sessions.The Platform Infrastructure teams are pivotal in constructing and upholding the essential systems that energize Discord's core functionalities. We manage systems that process hundreds of thousands of requests per second and handle tens of billions of transactions daily, enabling seamless connections for millions of users. By developing foundational platform components, we empower internal developers to deploy new features swiftly and securely, ensuring Discord remains reliable, efficient, and scalable.As a Senior Software Engineer on our team, you will play a crucial role in continuously refining our codebase, processes, and infrastructure, directly impacting user interactions on Discord!
At Plaid, we believe in the power of data-driven decision-making. Our data culture demands robust and scalable data systems that ensure accuracy and completeness. As a Senior Software Engineer focusing on Data Infrastructure, you will play a pivotal role in empowering teams across engineering, product, and business sectors to swiftly and securely extract valuable data insights. Your work will directly enhance our ability to serve customers effectively. You will be responsible for building and optimizing our data and machine learning infrastructure, allowing Plaid engineers to innovate and iterate on products built on consumer-permissioned financial data. Our Data Infrastructure engineers are experts in Data Warehousing, Data Lakehouse architecture, Spark, Workflow Orchestration, and Streaming technologies. You will enhance our existing data pipelines for performance and cost efficiency while creating intuitive abstractions that simplify the development process for other engineers at Plaid.
About UsAt Sierra, we are revolutionizing the way businesses engage with their customers by building a cutting-edge platform that harnesses the power of AI. Our headquarters is located in the vibrant city of San Francisco, with additional offices expanding in Atlanta, New York, London, France, Singapore, and Japan.Our company culture is deeply rooted in our core values: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These principles guide our actions and foster an environment where innovation thrives.Sierra was co-founded by visionary leaders Bret Taylor, who currently serves as the Board Chair of OpenAI and has a rich history with Salesforce and Facebook, and Clay Bavor, who previously led Google Labs and spearheaded initiatives like Google Lens and Project Starline.Your RoleAs a Software Engineer focusing on Infrastructure at Sierra, you will play a pivotal role in designing, constructing, and maintaining the foundational systems that empower our AI platform. Your expertise will ensure that our infrastructure is not only secure and reliable but also scalable, allowing product teams to execute their work with agility and confidence.Guarantee the reliability, scalability, and performance of our platform and LLM inference serving in response to increasing traffic demands.Develop and oversee cloud infrastructure using Terraform to create secure, scalable, and reproducible environments.Establish and manage a self-service infrastructure platform to empower engineering teams in deploying and operating services independently.Take ownership of and improve CI/CD pipelines and release management processes, facilitating rapid and reliable deployments across Sierra’s platform.Design and manage distributed systems utilizing distributed databases, retrieval systems, and machine learning models.Develop and sustain core data serving abstractions along with essential authentication and security features (SSO, RBAC, authentication controls).Effectively navigate and integrate our technology stack with enterprise customer environments in a scalable and maintainable manner.
Why Join Harvey?At Harvey, we are revolutionizing the landscape of legal and professional services — not through minor adjustments, but by rethinking the process from the ground up. By harnessing cutting-edge agentic AI, a robust enterprise-level platform, and profound sector expertise, we are redefining how essential knowledge work is performed for years to come.This is a unique opportunity to contribute to the foundation of a transformative company at a pivotal moment. With over 1000 clients spanning more than 58 countries, proven product-market alignment, and exceptional investor backing, we are growing rapidly and establishing a new category in real-time. The challenges are ambitious, the standards are high, and the potential for personal, professional, and financial growth is unparalleled.Our team is intelligent, driven, and passionately dedicated to our mission. We operate with urgency, take true ownership of our challenges, and deliver results from initial concepts to long-term goals. We maintain close relationships with our customers — from executives to engineers — collaborating to address real-world problems with urgency and care. If you excel in uncertain environments, strive for excellence, and want to influence the future of work alongside like-minded individuals, we invite you to join us in our mission.At Harvey, we are actively shaping the future of professional services — and we’re just getting started.Role OverviewAs a Senior Software Engineer on the Core Infrastructure team at Harvey, you will be pivotal in architecting and constructing new infrastructure systems while enhancing and fortifying our existing frameworks. Our infrastructure underpins every user interaction with Harvey — managing billions of prompt tokens and millions of daily requests across our global legal AI platform.You will thrive in a balanced environment focused on innovation — building new systems — and operational excellence, ensuring Harvey remains resilient and efficient as we scale our products, regions, clientele, and usage. Your contributions will directly influence the reliability, scalability, and security of our platform, which serves the world’s leading law firms and professional service providers.This position is located in San Francisco, CA. We utilize an in-person work model and provide relocation assistance for new hires.What You Will DoDesign and develop scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud environments.Manage and enhance our multi-cloud infrastructure (Azure, GCP), focusing on Kubernetes orchestration, networking, and container management.Lead key technical initiatives concerning observability, incident response, and performance optimization.
At Exa, we are on a mission to create a cutting-edge search engine from the ground up, designed to cater to the diverse needs of AI applications. Our team is building a robust infrastructure that enables us to crawl the internet, train advanced embedding models for indexing, and develop high-performance vector databases using Rust. Additionally, we manage a significant $5M H200 GPU cluster that powers tens of thousands of machines.The Infrastructure Team at Exa is responsible for developing the essential tools and infrastructure that support our entire system. We are looking for talented infrastructure engineers to help us scale our capabilities rapidly. Your work could involve orchestrating GPU clusters with Kubernetes, implementing map-reduce batch jobs on Ray, or creating top-tier observability tools that set industry standards.
Join Decagon as a Senior Software Engineer specializing in Machine Learning Infrastructure. In this pivotal role, you will be responsible for designing and optimizing systems that support machine learning models and applications. Your expertise will help drive innovation and efficiency in our ML pipelines, ensuring that our algorithms are fast, scalable, and reliable.You'll collaborate with cross-functional teams to implement cutting-edge solutions that enhance our product offerings. If you are passionate about advancing machine learning technologies and thrive in a dynamic environment, we want to hear from you!
Mar 26, 2026
Sign in to browse more jobs
Create account — see all 7,155 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.