About the job
Senior Cloud Infrastructure Engineer
About the Role
Join Hatch’s dynamic engineering team as a Senior Cloud Infrastructure Engineer, where you will play a pivotal role in architecting resilient, secure, and scalable cloud infrastructure that supports our primary platform and cutting-edge AI products. Collaborating with engineers, machine learning experts, and product leaders, you will ensure that our systems can grow rapidly and effectively to meet our ambitious goals.
About Hatch
Hatch is an innovative team dedicated to solving real-world challenges through artificial intelligence. We embrace speed, accountability, and a strong commitment to delivering impactful results. Our engineering culture emphasizes operational excellence, clean architectural practices, and rapid execution while maintaining reliability. If you thrive on scaling infrastructure that drives AI workflows from end-to-end, this opportunity is tailored for you.
What You’ll Do
Infrastructure at Scale
• Enhance our cloud infrastructure (AWS & GCP) using infrastructure-as-code tools such as Terraform or Ansible.
• Create systems that cater to the compute and storage demands of machine learning and data processing workflows.
• Oversee scalable, secure, and cost-effective environments across development, staging, and production.
• Participate in a rotational on-call schedule.
ML Platform Support
• Collaborate with ML engineers to operationalize models and manage workflows throughout training, testing, and deployment.
• Establish infrastructure for versioning, orchestrating, and monitoring ML models in production using tools like Kubeflow, SageMaker, VertexAI, or custom pipelines.
• Optimize data pipelines and model serving infrastructure to achieve low-latency and high-throughput performance.
Reliability & Observability
• Formulate strategies for observability, logging, and alerting across distributed systems.
• Lead incident response initiatives, root cause analyses, and system enhancements for sustained resiliency.
• Implement infrastructure security best practices, container hardening, and robust network architecture.
Platform Enablement
• Collaborate with engineering teams to integrate DevOps best practices throughout the development lifecycle.
• Develop tools and automation that enhance developer efficiency, release stability, and system visibility.
What We’re Looking For
• 5+ years of experience in DevOps, SRE, or platform engineering roles within fast-paced environments.
