About the job
Location: London, Waterloo (Hybrid, 4 days in-office - Wednesday is our designated work from home day, though you are welcome to join us in the office on Wednesdays if you prefer)
At getground, we are revolutionizing one of the world's most significant asset classes: property. With over £2 billion in assets on our platform and a community of more than 30,000 users across 70 countries, we are shaping the future of asset ownership and tackling wealth inequality.
Our innovative product streamlines property investing from start to finish, making real estate investment accessible to everyone.
Your Key Responsibilities:
Collaborating within cross-functional product teams to transition infrastructure and reliability initiatives from concept to live deployment.
Thriving in a dynamic environment where autonomy and ownership are fundamental to our operations.
Developing and sustaining a robust, scalable infrastructure within our GCP cloud ecosystem. Utilizing Kubernetes, Terraform, Cloudflare, and cutting-edge observability tools to ensure seamless platform functionality.
Working closely with engineering teams to formulate CI/CD pipelines, enhance deployment methodologies, and advocate for reliability as a core engineering principle.
Contributing to the establishment of SRE practices for a rapidly growing fintech platform. Mentoring fellow engineers as we expand our teams and influence.
Your Day-to-Day Activities:
Designing, implementing, and maintaining cloud infrastructure on Google Cloud Platform (GCP), ensuring it meets scalability, reliability, and security standards.
Taking ownership of our Kubernetes clusters and containerization strategy, including Docker image optimization, cluster management, and deployment orchestration.
Creating and optimizing Infrastructure as Code using Terraform, producing modular, testable, and well-documented configurations that adapt to our rapid growth.
Managing and enhancing our Cloudflare infrastructure, including Workers for edge computing, DNS, CDN, security policies, and performance optimization.
Implementing AI-powered product features in isolated and secure serverless environments.
Establishing comprehensive monitoring and observability with Prometheus and Grafana, defining SLIs/SLOs, and proactively identifying potential issues before they affect users.
Designing and maintaining CI/CD pipelines with appropriate quality gates, testing strategies, and deployment methodologies (blue-green, canary) to facilitate rapid deployments.

