Qualifications
Key ResponsibilitiesDesign, build, and maintain the cloud infrastructure for our distributed build acceleration platform. Automate everything: Develop deployment pipelines and streamline monitoring and recovery processes. Manage scalability and reliability for high-throughput, low-latency systems. Implement and maintain observability through logging, metrics, tracing, and alerting. Collaborate with product and engineering teams to integrate reliability into every feature. Quickly diagnose and resolve production incidents and provide feedback to enhance system design. Optimize cost, performance, and resilience across multi-cloud environments. QualificationsMinimum of 4 years of experience in SRE, DevOps, or Production Engineering roles. Proven experience managing Kubernetes in a production environment. Solid background in cloud infrastructure (preferably GCP or AWS) and Infrastructure as Code (Terraform preferred). Strong understanding of networking and security principles and practices. Experience with monitoring and logging systems such as Prometheus, Grafana, or ELK stack. Excellent problem-solving skills and ability to work in a fast-paced environment.
About the job
Join Our Team at EngFlow
EngFlow is revolutionizing the software development process by enabling developers to save valuable time in their build and test cycles. Our innovative cloud-based distributed service optimizes workflows through advanced remote execution and caching, significantly enhancing efficiency, productivity, and product quality.
Supported by esteemed investors, EngFlow is at the forefront of transforming how organizations develop software and deliver thoroughly tested products. Our solutions can accelerate builds by tenfold or more, and our observability platform provides crucial insights for ongoing optimization. Founded by leading contributors to Bazel, we create tools that empower engineering teams, from startups to Fortune 500 companies, to boost developer velocity and build performance.
Discover more about our mission, culture, and team: EngFlow | Watch Our Video
We are seeking a talented and experienced Site Reliability Engineer to join our dynamic engineering team. In this pivotal role, you will bridge the gap between software engineering and systems operations, ensuring our distributed infrastructure is highly available, performant, and scalable, thereby allowing our engineers to work swiftly and with confidence.
About EngFlow
EngFlow is a cutting-edge technology company dedicated to improving the software development lifecycle. Our focus on cloud-based solutions allows engineering teams to enhance their workflows and deliver high-quality software faster. We pride ourselves on our innovative approach and commitment to excellence, fostering a culture that values collaboration and continuous improvement.