About the job
Join Fluidstack: Pioneering the Future of Intelligence
At Fluidstack, we're transforming the landscape of artificial intelligence infrastructure. Collaborating with leading AI research labs, government entities, and major corporations—including Mistral, Poolside, Black Forest Labs, and Meta—we are dedicated to delivering computing solutions at unprecedented speeds. Our mission is to expedite the realization of Artificial General Intelligence (AGI), and we are seeking passionate individuals who thrive on purpose and excellence.
We take immense pride in the systems we develop and the trust we build with our clients. If you are ready to roll up your sleeves and contribute to shaping the future of intelligence, we invite you to join our innovative team.
Position Overview
Fluidstack, a prominent player in the cloud services arena, is on the lookout for a Software Engineer specializing in Infrastructure Platform Development. In this role, you will be instrumental in constructing the foundational platforms that support our global infrastructure and data center operations. Your focus will be on developing robust internal tools across various domains, including Configuration Management Database (CMDB), asset management, Data Center Infrastructure Management (DCIM), monitoring, observability, security, and operational automation. Collaborating with cross-functional teams, you will craft scalable and user-friendly solutions that enhance our ability to provide top-tier infrastructure services.
Key Responsibilities
Infrastructure Platform Development
Design and implement a next-generation CMDB system to serve as the definitive source of truth for infrastructure assets, network architecture, and configuration data.
Develop DCIM platforms for managing rack operations, server/GPU deployments, operating system installations, quality assurance, and white-screen activities.
Create comprehensive asset lifecycle management systems encompassing receiving, racking, inventory, break-fix, and decommissioning workflows.
Build monitoring and observability platforms that integrate telemetry from Building Management Systems (BMS), Environmental Power Monitoring Systems (EPMS), and IT devices, featuring intelligent alerting and incident management capabilities.
Develop self-service portals and automation tools for new region initialization, post-deployment operations, and fleet-scale management.
Operational Excellence & Automation
Minimize manual tasks through workflow automation and self-service tools that empower our operations and engineering teams.
Create workflow orchestration systems to streamline complex multi-step processes that encompass incident, problem, and change management.

