About the job
Site Reliability Engineer
Overview:
Join Weedmaps as a Site Reliability Engineer and collaborate across departments, including application, infrastructure, and quality teams, to elevate the performance, reliability, resilience, and scalability of our web services at Weedmaps.com. As a cloud-native organization, we run 100% of our services in Docker on Kubernetes within AWS's public cloud. Our operations utilize observability, monitoring, CI/CD automation, and custom tooling, enabling us to deploy multiple production releases daily.
Your daily responsibilities will focus on applying your engineering expertise to enhance system monitoring, minimize developer toil, configure CI workflows, and optimize our deployment pipelines. You will serve as a knowledge reference for development teams, ensuring they utilize consistent tools for metrics, logging, building, and deployment. Collaborating closely with both development and infrastructure teams, you will identify critical service-specific metrics that require monitoring, and you will help application development teams create libraries for seamless service instrumentation.
The impact you'll make:
- Collaborate with stakeholders to establish and promote best practices for monitoring and CI/CD pipelines.
- Troubleshoot issues related to deployment within our CI pipeline.
- Actively promote the DevOps culture at Weedmaps.
- Identify opportunities for automation and advocate for the codification of processes.
- Promote best practices regarding collaboration, reliability, security, and performance across all partner teams.
- Take ownership of application configuration and scaling for specified services, ensuring adherence to organizational practices.
- Develop and optimize synthetic monitoring flows.
What you've accomplished:
- A minimum of 2 years of development experience in startup or mid-sized environments.
- Proficiency in programming languages such as Python, Go, Node, Ruby, or Elixir.
- Knowledge of containerization technologies, particularly Docker (Kubernetes experience is a plus).
- Strong communication skills, a positive demeanor, and the ability to provide and receive constructive feedback.
- Professional experience with cloud-native observability standards including OpenMetrics, OpenTracing, and OpenCensus.
- Expertise in using and configuring modern CI/CD workflows.
- Deep understanding of SLIs, SLOs, and SLAs at both service and business levels.
- Familiarity with golden signals and their significance in monitoring.

