About the job
Lightspeed Commerce Inc. is on the lookout for a Senior Site Reliability Engineer to become a vital part of our Retail organization.
Our Site Reliability Engineering (SRE) team is integral to designing, constructing, and managing Lightspeed's infrastructure. We are dedicated to creating a platform that empowers our diverse product teams. This platform encompasses the complete software delivery lifecycle, from CI/CD pipelines to highly available and scalable production environments.
Your Responsibilities
- Lead initiatives aimed at continuously enhancing our software delivery processes, developing tools that expedite product development across geographically distributed teams.
- Establish and maintain resilient, self-service platforms for building, deploying, and operating services, with an emphasis on automation, scalability, and cost-effectiveness.
- Collaborate with cross-functional teams to improve cost efficiency and product reliability.
- Utilize your engineering knowledge to guide strategic decisions and promote best practices within your unit, including Infrastructure as Code, monitoring, high availability, disaster recovery, and security measures.
- Oversee the creation and upkeep of comprehensive technical documentation for the team.
- Direct incident response and resolution processes for critical production challenges, facilitating blameless post-mortems.
- Proactively assess system performance and capacity, establish and uphold Service Level Objectives (SLOs), and eliminate repetitive tasks through effective automation and tooling.

