About the job
Join Fluidstack as a Network Engineer!
At Fluidstack, we are at the forefront of building cutting-edge infrastructure designed for abundant intelligence. Collaborating with leading AI labs, government entities, and major enterprises such as Mistral, Poolside, and Meta, we strive to deliver compute capabilities at unprecedented speeds.
Our mission is to accelerate the realization of Artificial General Intelligence (AGI). We are urgently seeking passionate individuals who are committed to delivering exceptional infrastructure. At Fluidstack, we take pride in our work, treating our customer outcomes as our own. If you are driven by purpose and excellence, and ready to put in the effort necessary to shape the future of intelligence, we invite you to join us!
Position Overview
Fluidstack is on the lookout for a Network Engineer specializing in Reliability & Observability. In this pivotal role, you will act as a reliability engineer, leading the charge in developing processes, collecting data, and establishing reliability metrics aimed at enhancing the quality and dependability of AI networks throughout all operational phases.
Your primary focus will be on creating systems, tools, and data pipelines to boost network quality, while also automating metrics reporting (24/7) and generating periodic reliability assessments for both internal teams and customers.
This position is perfect for seasoned network operators who possess a deep passion for reliability and have experience in designing and implementing full lifecycle software, including conducting Quality Assurance audits and analyzing failure rates. A strong interest in hardware (both electronics and optics) and software development is essential, alongside a commitment to leveraging data for informed decision-making in deployment and operations.
We encourage experienced Site Reliability Engineers (SREs) with a strong networking background to apply!
Key Responsibilities
Quality Assurance Ownership: Design and implement QA processes tailored for network hardware and networks.
Data Pipelines: Develop and deploy both serverless and manually triggered workflows to generate network quality and reliability observability for our clients.
Deployment and Operations Assistance: Collaborate with various teams to support full lifecycle data collection, analysis, and process enhancements aimed at meeting service level agreements (SLAs) and objectives (SLOs).
Process Engineering: Innovate and implement process improvements to streamline deployment and operational workflows.

