Data Infrastructure Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About Zaimler
Zaimler is at the forefront of transforming AI capabilities by integrating fragmented enterprise data into a cohesive system. Our innovative approach fosters the development of intelligent agents capable of autonomous reasoning, setting the stage for the future of AI technology.
Similar jobs
Search for Senior Software Engineer Engine Infrastructure
569 results
Skydio, a premier drone manufacturer based in the United States, stands at the forefront of autonomous flight technology, paving the way for the future of drones and aerial mobility. Our diverse team merges profound expertise in artificial intelligence with top-tier hardware and software development, operational excellence, and a relentless focus on customer satisfaction. We empower a wide array of drone users, from utility inspectors to first responders and military personnel, to leverage our cutting-edge technology in various scenarios.About the Team: The Skydio Cloud Infrastructure team is dedicated to ensuring the Skydio Cloud platform is consistently available to our users at critical moments, whether conducting routine inspections or supporting rescue missions during emergencies. With a global fleet of thousands of drones, we are committed to continuous improvement, emphasizing robust delivery and testing pipelines as vital components of our operations.About the Role: As a Senior Infrastructure Engineer focused on an innovative product, you will play a pivotal role in maintaining our Kubernetes fleet and enhancing the core product software to meet evolving use cases. This position blends software engineering and infrastructure management, allowing you to address product deficiencies directly rather than solely relying on automation. We seek a professional who thrives on the autonomy to influence architecture, security, and functionality across the entire stack.Your Impact:Re-engineer and sustain the expanding requirements of our Kubernetes fleet and its underlying infrastructure.Enhance and broaden the continuous delivery processes for our product.Collaborate across teams (hardware to cloud) to introduce new capabilities to the platform.Engage directly with security teams to refine practices and controls that safeguard our customers' data and drones.Lead cost-saving initiatives early in the product lifecycle to ensure scalability.
Roblox Corporation
Roblox Corporation seeks a Senior Software Engineer focused on Data Infrastructure and Safety in San Mateo, CA. This position plays a key part in maintaining the reliability and performance of the Roblox platform, with a strong focus on user protection and a secure environment. Role overview This engineer will design and build scalable data infrastructure to support Roblox’s continued growth. The work centers on improving data quality and reliability, ensuring the platform remains robust as user numbers increase. Collaboration with teams from various disciplines is essential to identify, investigate, and resolve safety-related issues. System architecture decisions made in this role will directly influence user safety and experience. Responsibilities Develop and implement scalable data infrastructure solutions for the Roblox platform Enhance data quality and reliability across systems Work with cross-functional teams to address and resolve safety issues Contribute to architectural decisions that impact user safety and overall experience Requirements Significant experience in software development, data management, and system architecture Proven ability to design solutions that scale with the platform’s growth Strong collaboration skills, especially for addressing safety concerns across teams This role directly influences the safety and experience of millions of Roblox users, supporting the company’s ongoing commitment to a secure and engaging platform.
Skydio builds autonomous drones for a wide range of users, from utility inspectors and first responders to military personnel in the field. Based in San Mateo, California, Skydio combines artificial intelligence expertise with advanced hardware and software development, always focused on customer needs. About the Cloud Infrastructure Team The Cloud infrastructure group keeps Skydio’s platform available whenever and wherever it’s needed, whether for routine inspections or urgent disaster response. With thousands of drones deployed worldwide, the team continually improves how infrastructure is delivered and updated. Role Overview The Infrastructure Software Engineer manages and evolves Skydio’s Kubernetes fleet, making key software changes to support new and changing requirements. This hybrid role spans both infrastructure and software, offering the chance to shape product architecture, security, and performance. The position suits someone who enjoys working across the stack and tackling a mix of challenges. What You’ll Do Redesign and maintain a growing Kubernetes fleet and its supporting systems. Improve and expand the continuous delivery pipeline for Skydio’s products. Work with teams from hardware to cloud to introduce new platform features. Partner with security experts to strengthen data and drone protection measures. Introduce cost-saving strategies early in the product lifecycle to support long-term growth. What We’re Looking For At least 2 years of experience in infrastructure or software engineering. Hands-on knowledge of Kubernetes and cloud platforms. Strong analytical and problem-solving skills, with a collaborative approach. Drive for innovation and a high standard of quality in your work. Location: San Mateo, California, United States
At Roblox, we empower millions of users to explore, create, play, and connect through immersive 3D digital experiences, all crafted by our vibrant community of developers and creators.Our mission is to redefine how people come together from any corner of the globe and on any device. We are committed to fostering a community that connects a billion people with optimism and civility, and we are on the lookout for exceptional talent to join us in this endeavor.A career at Roblox means you will be at the forefront of shaping human interaction, tackling unique technical challenges at scale, and contributing to the creation of safe, civil shared experiences for all.As a Senior Software Engineer on our Systems team, your contributions will directly impact the foundational elements of Roblox. You will develop robust software that supports our extensive community of over 2 million developers, enabling them to push the boundaries of interactive gaming. Your work will encompass various aspects of the engine, including threading, memory management, physics, and scripting. If you possess a passion for C++ and thrive in high-performance, real-time applications, you will find a welcoming environment within our Game Engine team.Your responsibilities will include:Crafting engine code in C++ for critical sub-systems within the Roblox game engine.Developing systems for data synchronization between server and client, implementing game-aware data streaming, and optimizing resource usage to ensure seamless gameplay across diverse devices.Contributing to every application developed by Roblox, from mobile clients to cloud-based servers.Your qualifications include:A minimum of 4 years of experience working on intricate challenges in extensive C++ codebases.Proficiency with system-level pointers, templates, and raw memory blocks.A strong understanding of game engine architecture or a desire to learn.Experience in building reliable, functioning systems.A curiosity and commitment to staying updated with technological advancements.
Verkada seeks a Senior Staff Software Engineer to join the Platform Infrastructure team in San Mateo, CA. This senior technical position focuses on shaping and improving the core infrastructure that underpins Verkada's products, with particular attention to performance and scalability. Key Responsibilities Lead the design, building, and ongoing support of platform infrastructure. Partner with engineering teams to deliver software solutions that enable the platform to grow and adapt. Spot opportunities to improve reliability and efficiency, and guide initiatives to address them. Collaboration Frequent collaboration with cross-functional teams is central to this role. The goal is to ensure that infrastructure evolves to meet the demands of both current and future products.
Generalist
About the RoleThis position is pivotal in overseeing infrastructure across our entire tech stack. If it exists in the cloud, it falls under your purview. In the world of robotics, data is essential, and we require robust, scalable infrastructure to manage, store, and process vast amounts of this data. The APIs, services, and monitoring systems you will manage are critical to our operations.Your Responsibilities Include:Managing compute resources (both CPU and GPU) to efficiently process petabytes of data at high throughput.Overseeing the infrastructure required for data processing and storage.Ensuring the security and integrity of our infrastructure and data.You Will Excel in This Role If You Have:A minimum of 5 years of experience in managing large-scale cloud infrastructure using tools such as Kubernetes and Terraform, with a primary focus on Python services.Deep understanding of AWS services (or their equivalents) and their permission models.Strong perspectives on the effective use of coding agents within an infrastructure context.
Genesis Therapeutics
At Genesis Therapeutics, we are on a mission to revolutionize drug discovery by harnessing the power of machine learning, biophysical simulation, and computational chemistry. We are assembling a top-tier computational team and seek a passionate Infrastructure Engineer to contribute to the development of innovative medicines while playing a pivotal role in enhancing our AI platform.Your ResponsibilitiesCollaborate with the infrastructure team to sustain and expand our multi-cloud compute infrastructure that underpins ML model training, computational chemistry research, and ongoing drug discovery initiatives.Develop configuration and procedures for monitoring, resource allocation, and deployment automation to scale our autoscaling compute clusters for larger workloads.Enhance our orchestration scheduling framework to boost execution throughput, reliability, and compute utilization across diverse pipelines.Your QualificationsA minimum of 5 years of experience in building and maintaining large-scale cloud infrastructure, preferably in AWS or GCP.Strong proficiency in Python, Bash, Terraform, Ray, and Kubernetes.Experience in constructing and maintaining compute clusters for distributed ML training jobs utilizing 1,000+ GPUs is highly desirable.Hands-on experience with physical hardware and datacenter management is a plus.What We OfferAn opportunity to work on impactful infrastructure that accelerates the discovery of new medicines.Join a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity, along with comprehensive medical, dental, and vision insurance, and a 401(k) program.
Zaimler
About ZaimlerIn a world where AI agents struggle to reason over fragmented data, Zaimler emerges as the solution. Our mission is to unify disparate enterprise data across countless systems, providing a shared context, meaning, and structure. This transformation is essential as we transition from traditional copilots to fully autonomous agents, necessitating a new infrastructure layer that we are dedicated to building.At Zaimler, we are pioneering context infrastructure for the agentic era—a platform that autonomously discovers domain knowledge, maps intricate relationships, and equips AI agents with the semantic understanding required for precise and scalable operations. Envision knowledge graphs that facilitate real-time inference, tailored for systems that need to reason rather than merely retrieve data.Founded by industry veterans Biswajit Das (former VP Engineering at Truera and Chief Architect at Visa) and Sofus Macskassy (ex-Director of Engineering at LinkedIn), who notably built one of the largest knowledge graphs in production, Zaimler is a small, senior team at the seed stage, collaborating with major enterprises in sectors like insurance, travel, and technology. If you are passionate about creating the infrastructure that will support the next decade of AI advancements, we are eager to connect with you.The RoleWe are in search of a talented Data Infrastructure Engineer to establish the foundational distributed data layer that will power our semantic platform. In this role, you will be responsible for designing, building, and scaling systems that enable high-throughput data ingestion, transformation, and real-time processing.
Join Our Team at MaximaAt Maxima, we are pioneering an innovative AI platform designed to automate enterprise accounting processes. Our solution effectively manages vast amounts of financial data, intricate accounting workflows, and guarantees robust execution that is both precise and dependable.We are addressing some of the most challenging issues in enterprise automation, and we have attracted a stellar engineering team comprised of industry experts from renowned companies like Robinhood, Glean, Google, Netflix, and Meta. Backed by prestigious investors such as Kleiner Perkins and Redpoint Ventures, we are proud to serve leading clients like Scale AI and Rippling.Your RoleAs a Software Engineer specializing in Data Infrastructure, you will be instrumental in developing and scaling systems that efficiently ingest and process large financial datasets. This foundational platform supports all accounting workflows and the agentic system. Your responsibilities will include designing our data lakes, constructing multi-tenant relational databases, and exploring optimal search engine and vector database configurations to enhance various workflows.
Genesis Therapeutics
Join Our Innovative TeamAt Genesis Therapeutics, we are a dynamic and passionate group of drug discovery experts, deep learning researchers, and software engineers dedicated to revolutionizing biochemistry through AI. Our mission is clear: to uncover and develop transformative therapies for patients with severe medical conditions.Our AI team is at the forefront of creating foundational models for small molecule drug discovery. We conduct cutting-edge research that bridges machine learning, physics, and computational chemistry, while building resilient software systems capable of executing large-scale simulations and training advanced generative and predictive AI models utilizing our powerful cluster of thousands of GPUs and tens of thousands of CPUs.Your RoleWe are on the lookout for skilled ML infrastructure engineers to propel our machine learning research initiatives, particularly in generative modeling of molecular systems, which is vital to our overarching goals.In this position, you will spearhead the rapid advancement of our AI platform and infrastructure, enhancing performance, efficiency, and scalability to unprecedented levels. You will construct expansive distributed training and inference pipelines, essential MLOps tools and frameworks, and fine-tune GPU operations to accelerate ML model performance.Genesis fosters a collaborative and interdisciplinary environment, allowing you to work closely with our talented engineers, researchers, and scientists.Your ResponsibilitiesDrive engineering initiatives aimed at the continuous enhancement of our AI platform, focusing on the rapid development of scalable and robust distributed infrastructures for ML training, inference, and evaluation.Facilitate model training and deployment across various clusters and cloud environments, optimizing for throughput and cost-effectiveness.Maximize the efficiency of ML models and other workloads in terms of latency, throughput, and memory usage, particularly through GPU performance engineering, pushing the boundaries of current hardware capabilities.Contribute to the long-term strategic vision for Genesis’ infrastructure platform.Your ProfileA strong engineering background with a focus on machine learning infrastructure.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their people and property through an integrated, AI-powered platform. As a frontrunner in cloud physical security, Verkada empowers more than 30,000 organizations globally—including over 100 Fortune 500 companies—to enhance their safety and operational efficiency via a unified software platform that offers solutions for video surveillance, access management, air quality monitoring, alarms, intercoms, and visitor management.Founded in 2016, Verkada has experienced rapid growth, boasting 15 offices and a dedicated team of over 2,200 employees.The RoleJoin our innovative cloud infrastructure team, where you will play a crucial role in designing, building, and maintaining highly scalable, reliable systems that power Verkada’s services. You will have the chance to work on exciting projects such as scaling microservice clusters, automating serverless deployments, adopting a full service mesh, and enhancing system observability. Take charge of a subdomain and lead collaborative efforts across teams.This position requires your presence at our headquarters located in San Mateo, CA, as we are dedicated to fostering a vibrant in-office culture.
Fireworks AI
About Us:At Fireworks AI, we are at the forefront of creating next-generation generative AI infrastructure. Our cutting-edge platform is recognized for delivering the highest-quality models with unparalleled speed and scalability in inference. Independently benchmarked as a leader in LLM inference speed, we drive significant advancements through innovative projects, including our proprietary function calling and multimodal models. As a Series C company valued at $4 billion and backed by leading investors such as Benchmark, Sequoia, Lightspeed, Index, and Evantic, we are a dynamic team of builders, comprised of veterans from Meta PyTorch and Google Vertex AI.The Role:We are seeking a talented Software Engineer to join our AI Infrastructure team. In this pivotal role, you will contribute to designing and developing the foundational systems that power Fireworks AI’s generative AI platform. Your focus will be on building robust infrastructure and tools that guarantee the reliability, performance, quality, and availability of our AI systems.Our mission is to establish Fireworks AI as the most dependable and user-friendly generative AI platform globally. You will collaborate closely with our cloud infrastructure, product, and performance teams to create infrastructure solutions that connect our customers with the high-performance proprietary Fireworks inference engine.Key Responsibilities:Design and develop scalable backend infrastructure supporting distributed training, inference, and data pipelines.Build and maintain essential backend services, including LLM CI/CD pipelines, control planes, and model serving systems.Enhance performance optimization, cost efficiency, and reliability across compute, storage, and networking layers.Create frameworks and safeguards to ensure Fireworks AI maintains the highest model quality in the industry.Work alongside performance, training, and product teams to translate research and product requirements into effective infrastructure solutions.Engage in code reviews, technical discussions, and continuous integration and deployment processes.
Genesis Therapeutics
At Genesis Therapeutics, we are at the forefront of revolutionizing drug discovery by harnessing the power of machine learning, biophysical simulations, and computational chemistry. We are actively seeking a passionate Infrastructure Engineer to join our elite computational team. In this role, you will contribute to the development of groundbreaking medicines and play a pivotal part in the expansion of our advanced AI platform.Your Role:Collaborate with our infrastructure team to enhance and maintain our extensive multi-cloud compute infrastructure, which is vital for ML model training, computational chemistry research, and drug discovery initiatives.Develop and implement configurations and procedures for monitoring, resource allocation, and deployment automation to adapt to the growing demands of our autoscaling compute clusters.Contribute to the orchestration scheduling framework to boost execution throughput, increase reliability, and optimize compute utilization across diverse pipelines.Your Qualifications:Minimum of 5 years of experience in building and maintaining scalable cloud infrastructure, particularly with AWS or GCP.Proficient in Python, Bash, Terraform, Ray, and Kubernetes.Experience with distributed ML training jobs on compute clusters with over 1,000 GPUs is highly desirable.Hands-on experience with physical hardware and data center management is a plus.What We Offer:An opportunity to engage with impactful infrastructure that accelerates the discovery of new medicines.Be part of a world-class, close-knit team of dedicated professionals across software, machine learning, computational chemistry, medicinal chemistry, and biology.Competitive salary and equity options, alongside comprehensive medical, dental, and vision coverage, plus a 401(k) retirement plan.
About the RoleAt Generalist, we are at the forefront of training expansive robot foundation models, leveraging cutting-edge GPU hardware, primarily from Nvidia, to execute distributed training tasks and experimental research. Our operations demand exceptional storage solutions and optimized data loading processes, necessitating the full utilization of cloud infrastructure alongside custom-built solutions.In this role, you will take charge of our inference infrastructure. Our robotic systems rely on a dedicated fleet of on-premises GPUs designed for demanding real-time computations and latency-sensitive applications within resource-constrained environments.Your Responsibilities:Manage and optimize our GPU compute fleets.Facilitate user-friendly access to GPUs for researchers, ensuring optimal utilization.Enhance ML data loading, transport, and storage systems in extensively utilized distributed environments.Oversee the orchestration of our robot inference fleets.You May Excel in This Position If You:Have experience managing large GPU fleets for large-scale, distributed training or inference.Possess significant expertise in using Slurm or Kubernetes for ML workload orchestration.Have developed high-scale ML data loaders and preparation systems.Understand the intricacies of ML hardware, storage, and networking systems.Are familiar with the Nvidia GPU ecosystem.
Join Notable as a Staff Software Engineer specializing in Cloud Infrastructure and Applications. In this pivotal role, you will lead the design, development, and implementation of scalable cloud solutions that drive our innovative projects forward. Collaborate with cross-functional teams to optimize application performance and enhance user experience.
Roblox Corporation
Join Roblox as a Senior Software Engineer, specializing in Machine Learning Infrastructure. In this role, you will be a key player in designing and developing robust ML systems to enhance our gaming platform. You will collaborate with cross-functional teams to implement scalable solutions that elevate user experiences. Your expertise will directly influence the future of gaming at Roblox.
Roblox Corporation
Join Roblox as a Senior Software Engineer - Engine Reliability and be a key player in ensuring the robustness and performance of our gaming engine. In this role, you will collaborate with talented engineers to develop innovative solutions that enhance the reliability of our systems, allowing millions of users to enjoy seamless gaming experiences.
About UsAt Verkada, we are revolutionizing the way organizations safeguard their personnel and properties through an integrated, AI-driven platform. As a frontrunner in cloud-based physical security, we empower over 30,000 organizations worldwide, including more than 100 Fortune 500 companies, to enhance safety and operational efficiency via a single, connected software solution. Our offerings encompass video security, access control, air quality sensors, alarms, intercoms, and visitor management. Established in 2016, Verkada has experienced remarkable growth, now boasting 15 offices and a dedicated workforce of over 2,200 employees.Role OverviewThe Verkada Security Team, primarily composed of software engineers, is dedicated to establishing optimal software security practices. In this pivotal role, you will enhance Verkada’s security throughout the software development lifecycle (SDLC) by utilizing automation, libraries, tools, and frameworks. Your responsibilities will span various technology stacks and involve collaborating with engineering teams across Verkada’s Command platform.
Notable
Join Notable as a Senior Fullstack Software Engineer and play a pivotal role in revolutionizing healthcare through cutting-edge AI technology. We seek an experienced engineer to design, develop, and maintain innovative web applications that enhance patient care and streamline healthcare delivery. Collaborate with cross-functional teams, including product managers and UX/UI designers, to build high-performance solutions that integrate seamlessly with various healthcare systems. Uphold best practices in code quality and deployment, and work alongside our data science team to bring machine learning and natural language processing algorithms to life in production environments. Become part of a mission-driven culture committed to transforming healthcare for millions.
Skydio is a pioneering force in the drone industry, recognized as the leading autonomous flight company in the United States and globally. Our team merges advanced expertise in artificial intelligence with cutting-edge hardware and software development, operational excellence, and an unwavering commitment to customer satisfaction. We empower a diverse range of drone users, from utility inspectors and first responders to soldiers in complex battlefield situations.About the Role: We are in search of a skilled Software Engineer to spearhead the development of innovative tools that enhance the autonomy lifecycle. In this role, you will be responsible for creating and refining essential internal platforms that enable engineers to test new concepts, analyze system behaviors, and comprehend intricate interactions between software and the physical environment.Your work will encompass both backend systems and front-end visualization, necessitating a strong foundation in software design, developer experience, and a genuine passion for empowering others through effective tooling. The systems you develop will be utilized daily by autonomy developers, test engineers, and various stakeholders throughout the organization.Areas of Responsibility:Design and construct robust replay and analysis systems that allow engineers to inspect and replicate recorded autonomy behaviors with detailed control over system states, perception outputs, and decision-making processes across the entire stack.Develop scalable infrastructure for automated testing and failure triage, enhancing our simulation and log-driven test coverage while expediting root-cause analysis through automated log processing and diagnostics.Collaborate at the intersection of autonomy software and core robotics middleware, establishing clear APIs, data contracts, and performance benchmarks for messaging, state propagation, and inter-subsystem coordination, while partnering closely with downstream teams to facilitate their implementation and integration.Create and implement high-quality developer infrastructure and tools that emphasize reliability, performance, and usability, fostering rapid iteration, safe experimentation, and sustained productivity across the autonomy division.What You’ll Do:Lead the design and execution of scalable tools utilized throughout autonomy development and testing workflows.Engage with autonomy, QA, and infrastructure teams to gather requirements, prioritize tasks, and deliver impactful solutions.
Sign in to browse more jobs
Create account — see all 569 results

