companyTecsys Inc. logo

Infrastructure Reliability Engineer

Tecsys Inc.Remote — Montreal, Quebec, Canada
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

We are looking for candidates who possess a strong background in reliability engineering, cloud infrastructure, and automation. Ideally, you should have:Experience with AWS and Kubernetes environments. A solid understanding of monitoring tools, particularly Datadog. Skills in scripting and automation to enhance operational efficiency. Strong analytical and problem-solving capabilities. Excellent collaboration and communication skills.

About the job

At Tecsys, we recognize the transformative power of remote work on employee well-being and the environment. Our commitment to remote work fosters enhanced employee morale, productivity, and reduced commuting times. We are proud to be a remote-first organization, supported by cutting-edge technologies and programs that create a fantastic foundation for our team. Our flexible remote environment, complemented by well-located offices and collaborative workspaces, empowers our staff to work in ways that maximize their productivity.

About Tecsys

Tecsys is a rapidly growing innovator in supply chain solutions for leading healthcare systems, hospitals, pharmacies, distributors, retailers, and 3PLs. We collaborate with industry leaders to transform their supply chains through technology. If you thrive on tackling challenges and seek continuous learning opportunities, we invite you to join our dynamic team!

Position Overview

We are in search of an Infrastructure Reliability Engineer to join our Network Operations and Security Center (NOC) team, which is pivotal to the reliability of our critical SaaS platforms. In this role, you will contribute to the maintenance, optimization, and assurance of the reliability and performance of the systems that drive our cloud infrastructure on AWS and Kubernetes. A strong focus will be placed on automation, observability, and continuous improvement.

This position amalgamates reliability engineering with incident management, placing you in a key role responsible for availability, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and the continuous enhancement of resilience through automation and engineering.

Your Responsibilities

  • Collaborate with engineering teams to support services prior to their launch through activities such as systems design consultation, platform and software framework development, capacity planning, and launch reviews.
  • Continuously innovate by identifying weaknesses, proposing creative solutions, and driving initiatives that simplify, scale, and strengthen the platform.
  • Maintain services post-launch by measuring and monitoring availability, latency, and overall system health.
  • Ensure optimized observability: enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that yield reliability outcomes.
  • Develop and enhance...

About Tecsys Inc.

Tecsys is a trailblazer in supply chain management, delivering innovative solutions to healthcare providers and various industries. Our focus on technology-driven transformation positions us at the forefront of supply chain innovation.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.