Senior Database Reliability Engineer (SQL) - Remote
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
About PointClickCare
PointClickCare is a leading health tech company focused on empowering providers to deliver top-notch healthcare. With a commitment to innovation and employee growth, we are dedicated to transforming the healthcare landscape.
Similar jobs
Search for Senior Database Reliability Engineer (SQL) - Remote
6,554 results
PointClickCare
At PointClickCare, our mission is clear: to empower healthcare providers to deliver exceptional care. This commitment begins with our people. As a prominent health tech organization, privately held and founded by visionary leaders, we enable our team members to innovate, challenge the status quo, and contribute to the evolution of healthcare.With the largest dataset in long-term and post-acute care and a network of over 400 integrated partners, our platform supports more than 30,000 provider organizations, significantly impacting millions of lives. We prioritize innovation, reinvesting a considerable portion of our revenue into research and development to provide our employees with the tools necessary for meaningful contributions. Recognized by Forbes as a leading private cloud company and celebrated as one of Canada’s Most Admired Corporate Cultures, we foster an environment of flexibility, growth, and purpose.At PointClickCare, we believe in a future of healthcare that puts people first, accelerated by AI to drive significant and lasting change. Our employees leverage AI as a catalyst for creativity, efficiency, and informed decision-making. By incorporating AI tools into our workflows, we enhance collaboration and improve outcomes, ensuring that every team member has the capability to maximize their impact. Our hiring practices focus on identifying AI expertise that aligns with our mission, and we continually invest in training and development to nurture innovation throughout the employee journey.Join us in transforming healthcare — not just to survive, but to thrive. To discover more about what it's like to work at PointClickCare, visit Life at PointClickCare and connect with us on Glassdoor and LinkedIn.
Wavelo is a Software as a Service (SaaS) company dedicated to simplifying telecom operations. Our innovative software solutions modernize the way communication service providers (CSPs) operate, allowing them to enhance customer experiences, maximize value, and scale their businesses efficiently. As a proud member of Tucows (NASDAQ:TCX, TSX:TC)—one of the largest internet services firms worldwide—Wavelo is supported by exceptional resources and talent. We prioritize a people-centric work culture grounded in respect, trust, and flexibility. We believe that accommodating our employees leads to our overall success. Most of our positions are remote-first, enabling you to work from anywhere with an internet connection! The work we do has a meaningful impact on lives. If you're excited about this opportunity, we look forward to hearing from you!
info-ways
We are seeking a skilled SQL Database Administrator to join our dynamic team at info-ways. The ideal candidate will have a strong background in managing and optimizing SQL databases, ensuring high levels of performance, availability, and security.As a SQL DBA, you will be responsible for installing, configuring, and maintaining SQL Server instances, performing troubleshooting and recovery, and implementing backup and restore strategies. Your expertise will help us to enhance our database solutions and support our business objectives effectively.
Join info-ways as a talented AXIOM SQL Consultant in a fully remote role where you'll leverage your SQL expertise to shape data-driven solutions. We are looking for a professional who can work collaboratively with our team to optimize database performance and ensure seamless data integration. If you are passionate about SQL and thrive in a remote work environment, this is an excellent opportunity for you!
Instacart
Transforming the Grocery IndustryAt Instacart, we believe in sharing love through food, ensuring everyone has access to their favorite groceries and quality time with loved ones. We don’t just see grocery delivery as a necessity; we recognize the exciting complexities and opportunities it presents to meet the diverse needs of our community. We provide an essential service that customers depend on for groceries and household goods, while also offering safe and flexible earning opportunities to our Personal Shoppers.Instacart has become a vital resource for millions, and we’re assembling a dynamic team to propel our shopping cart forward. If you are ready to deliver your best work, we invite you to join our team.Flex First Work EnvironmentWe embrace a flexible approach in how we perform our best work. Our team members can choose their work location—whether from home, an office, or their favorite coffee shop—while fostering connections and community through regular in-person events. Discover more about our flexible work approach.OverviewAbout the RoleAs a Senior Site Reliability Engineer II, you will be instrumental in ensuring the stability and performance of our platform. You will tackle challenges head-on, ensuring optimal performance and fostering a culture that emphasizes reliable and effective practices. We are seeking a proactive individual who is adept at solving complex problems and is enthusiastic about exploring innovative solutions to support our teams and services.About the TeamThe Site Reliability Engineering (SRE) team merges software and systems engineering to design and maintain large-scale, distributed, and fault-tolerant systems. Our mission is to guarantee high reliability, optimal performance, and continuous improvement for Instacart’s critical internal services and customer-facing systems.The SRE team focuses on enhancing existing systems, constructing robust infrastructure, and automating processes to reduce manual efforts. Joining the SRE team means facing unique scaling challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.
info-ways
Join our dynamic team at info-ways as a Database Developer. In this pivotal role, you will be responsible for designing, implementing, and maintaining our database systems to ensure optimal performance and data integrity. Your expertise will play a critical role in supporting various applications and business processes.
About ClickHouseRecognized on the 2025 Forbes Cloud 100 list, ClickHouse stands out as a leading innovator in the realm of private cloud technology. With a rapidly expanding customer base exceeding 3,000 and an astounding annual recurring revenue (ARR) growth of over 250% year-on-year, ClickHouse is at the forefront of real-time analytics, data warehousing, observability, and AI workloads.Our recent $400M Series D financing round validates our sustained momentum. Notable clients such as Capital One, Lovable, Decagon, Polymarket, and Airwallex have recently adopted or expanded their use of our platform, joining a prestigious roster of AI pioneers and global brands including Meta, Cursor, Sony, and Tesla.Join us in our mission to revolutionize the way companies leverage data!About the RoleAs we enhance our commitment to delivering dependable and secure services, we are expanding our Site Reliability Engineering team. In this role, you will spearhead initiatives to maintain and improve the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborate across various teams, including Control Plane, Data Plane, Core, Security, Support, and Operations, to design and implement robust, secure, and highly available distributed systems. You will take charge of incident management and response processes, conducting blameless postmortems and driving continuous improvements in our Cloud services. Your software engineering expertise will be vital in developing tools and platforms to enhance operational and engineering efficiencies within ClickHouse Cloud. This is a unique opportunity to make a substantial impact on our high-performance, elastic ClickHouse Cloud.Your ResponsibilitiesCollaborate with diverse engineering teams at ClickHouse to architect and implement scalable, secure, and high-availability systems.Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.Ensure all infrastructure components within ClickHouse Cloud, including Data Plane, Control Plane, and ClickHouse Core, have effective monitoring and alerting systems in place for timely incident detection and resolution.Refine incident response processes and post-mortem analyses for outages in ClickHouse Cloud, including communication with impacted customers through the support team.Continuously enhance the reliability and performance of ClickHouse services.
Fivetran is seeking a Senior Database Software Engineer to join the team in Toronto, Ontario. This role centers on developing and refining advanced database systems that form the backbone of Fivetran’s products and services. Key responsibilities Design and architect scalable database solutions to enable both new and existing product features Collaborate with engineering, product, and other teams to deliver reliable and efficient database systems Use in-depth knowledge of database technologies to address complex technical issues and enhance system performance Location This position is based in Toronto, Ontario, Canada.
MongoDB, Inc.
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Unity Technologies
Unity Technologies is seeking a Senior Production Reliability Engineer based in Montreal, Canada. This role centers on keeping production systems stable and responsive, directly supporting the products and services that power Unity's platform. Key responsibilities Diagnose and resolve production issues quickly to reduce downtime and service disruptions Work closely with engineering, operations, and product teams to improve system reliability Develop and refine processes that help maintain a stable, high-performing infrastructure Support Unity's growth by contributing to scaling strategies as services and user numbers increase Role impact This position plays a vital part in Unity's ability to deliver reliable products. Efforts in this role help ensure systems can handle both current operations and future expansion, keeping performance strong as demand grows.
About the PositionJoin the innovative Cloud FinOps team at Hopper as a Senior Site Reliability Engineer. In this role, you'll oversee a vast infrastructure on Google Cloud, supporting hundreds of engineers and delivering exceptional experiences to millions of users globally.Your enthusiasm for automation and system optimization will be crucial as you work to create scalable, reliable, and secure infrastructure.You will tackle problems pragmatically, developing solutions that are not only effective but also user-friendly and economical.Daily ResponsibilitiesDrive cost efficiency projects, including:Minimizing network egress costs by eliminating unnecessary headers.Optimizing data storage by ensuring efficient use of warehouse data, such as utilizing cold storage for infrequently accessed buckets.Enhancing autoscaling for both databases and compute services.Enhance cost attribution processes to provide all teams with transparent cost visibility.Participate in incident support and share on-call responsibilities for platform incidents, collaborating with a geographically diverse engineering team.Contribute to a dynamic and efficient team of SREs.
jobgether
As a Senior Site Reliability Engineer at jobgether, the focus is on maintaining and improving the reliability and performance of cloud infrastructure and services. This position is based in Canada and works closely with multiple teams across the company. Role overview The Senior Site Reliability Engineer monitors systems, implements improvements, and automates key processes. The goal is to support a platform that scales smoothly as demands grow. What you will do Ensure the ongoing reliability and performance of cloud-based systems Collaborate with other teams to address infrastructure needs and challenges Automate operational processes to reduce manual work and improve efficiency Identify and implement ways to improve scalability across the platform
Bounteous
About the Role Bounteous is looking for a Database Engineer in Montreal, QC. This role focuses on designing, building, and maintaining database systems that support client projects. The work directly impacts how teams deliver reliable and secure data solutions. Main Responsibilities Design and implement database architectures for new and existing projects Maintain database systems to ensure reliability and performance Collaborate with cross-functional teams to meet project requirements Monitor data integrity and enforce security compliance standards Apply current technologies and methodologies to improve database solutions Location This position is based in Montreal, QC.
System Canada Technologies
Join our dynamic team at System Canada Technologies as an SQL Tester. In this role, you will be responsible for ensuring the quality and performance of our database systems. You will work closely with developers to design test plans and execute various testing strategies to ensure robust SQL database functionality.
StackAdapt
Join StackAdapt as a Senior Database Administrator and help us elevate our data management systems! In this pivotal role, you will be responsible for ensuring the integrity, availability, and security of our database systems. Your expertise will drive improvements to our database performance, enabling our teams to leverage data effectively.You will work closely with cross-functional teams to design, implement, and maintain robust database solutions that support our business goals.
At Relay, we empower self-made business owners with a digital banking platform that transforms financial management into a source of clarity, confidence, and control. Our mission is to replace financial uncertainty with genuine visibility, enabling entrepreneurs to convert their hard work into enduring success. By alleviating the stress of cash flow management, we provide the tools necessary for owners to operate robust and resilient businesses.As Relay continues its growth trajectory, the reliability, performance, and resilience of our platform have become integral to both our customer experience and overall business success.This senior leadership position is crucial in steering a team of Site Reliability Engineers while shaping how reliability strategies influence engineering and product decisions throughout the organization. You will determine the future direction of the SRE function, promote operational excellence, and assist the company in anticipating and managing scale challenges before they pose risks.If you thrive on tackling complex systems, leading organizations, and building resilient platforms that customers depend on daily, we are eager to connect with you!Key ResponsibilitiesLead and enhance Relay’s Site Reliability Engineering function, establishing strategic direction as the company scales.Define and implement a long-term reliability roadmap, making informed trade-offs under real business and capacity constraints.Act as the senior reliability voice in discussions involving engineering and product leadership.Influence the integration of reliability considerations into product planning, architectural decisions, and delivery processes.Serve as a senior escalation point during critical production incidents, ensuring effective communication and thorough follow-up actions.Enhance Relay’s observability, performance, and operational maturity practices across teams.Establish and uphold standards concerning SLOs, operational readiness, incident management, and continuous improvement.Collaborate with stakeholders in Engineering, Product, Data, and Finance to balance velocity, risk, performance, and cost.Build and nurture a high-performing SRE organization capable of supporting future growth.
Join StackAdapt, a premier technology firm dedicated to empowering marketers with advanced tools to effectively engage and convert audiences. Our AI-driven StackAdapt Marketing Platform executes an astounding 465 billion optimizations per second, bridging brand and performance marketing for tangible results throughout the customer journey. Leading marketers trust StackAdapt to execute dynamic campaigns across programmatic advertising and various marketing channels.The Core Services Site Reliability Engineering (Core Services SRE) team establishes the essential principles of reliable engineering within StackAdapt's core service teams.This team underpins StackAdapt's software backend, where all data processing takes place. The Core Services SRE team is committed to ensuring our software is robust and reliable by constructing self-healing systems that automatically deploy and manage the scaling of core services software, while promoting the principles of reliable software and Site Reliability Engineering throughout the organization.As a Remote First company, we welcome applicants from anywhere in the United States or Canada!
About FableFable collaborates with global enterprises to enhance accessibility for over one billion individuals with disabilities. Our esteemed clients include industry leaders such as Walmart, Slack, and Shopify. Recognized on the Forbes Accessibility 100 list in 2025, we have also been honored as one of Fast Company’s Most Innovative Companies in Design, receiving accolades from prestigious organizations including the World Summit Awards and the UN-endorsed Zero Project.About the RoleAs a Senior Site Reliability Engineer at Fable, you will be instrumental in ensuring the reliability, scalability, and efficiency of our platform during our growth phase. Our products empower organizations to create more accessible digital experiences, and the robustness of our infrastructure is key to achieving this mission. You will engage with various platform and product systems to ensure stability, performance, and cost-effectiveness, enabling teams to operate swiftly and securely.With the integration of AI capabilities in contemporary product experiences, you will also help prepare Fable’s infrastructure to handle AI workloads, balancing reliability, performance, and cost while enabling teams to innovate and scale new features safely.Reporting to the Director of Technical Operations, you will collaborate closely with teams across Engineering and Product. This role is perfect for those who thrive on hands-on technical work, take pride in system health, tooling, and operational excellence, and are eager to influence Fable’s infrastructure and reliability strategy moving forward.Key ResponsibilitiesReliability, Infrastructure & PlatformDesign, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services.Enhance system observability, monitoring, and alerting to ensure high availability and rapid incident response.Contribute to and refine SRE practices, including SLIs/SLOs, incident management, and postmortems.Support and optimize CI/CD pipelines and deployment processes.Identify and minimize operational complexity across systems and tooling.Collaborate across infrastructure and application layers to diagnose and resolve reliability and performance issues, making targeted improvements to application code when necessary.Support infrastructure and platform capabilities required for AI/ML-powered features, including considerations for scaling, performance, and reliability.Cost Efficiency & PerformanceMonitor and optimize infrastructure costs across cloud environments.
Tecsys Inc.
At Tecsys, we recognize the transformative power of remote work on employee well-being and the environment. Our commitment to remote work fosters enhanced employee morale, productivity, and reduced commuting times. We are proud to be a remote-first organization, supported by cutting-edge technologies and programs that create a fantastic foundation for our team. Our flexible remote environment, complemented by well-located offices and collaborative workspaces, empowers our staff to work in ways that maximize their productivity.About TecsysTecsys is a rapidly growing innovator in supply chain solutions for leading healthcare systems, hospitals, pharmacies, distributors, retailers, and 3PLs. We collaborate with industry leaders to transform their supply chains through technology. If you thrive on tackling challenges and seek continuous learning opportunities, we invite you to join our dynamic team!Position OverviewWe are in search of an Infrastructure Reliability Engineer to join our Network Operations and Security Center (NOC) team, which is pivotal to the reliability of our critical SaaS platforms. In this role, you will contribute to the maintenance, optimization, and assurance of the reliability and performance of the systems that drive our cloud infrastructure on AWS and Kubernetes. A strong focus will be placed on automation, observability, and continuous improvement.This position amalgamates reliability engineering with incident management, placing you in a key role responsible for availability, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and the continuous enhancement of resilience through automation and engineering.Your ResponsibilitiesCollaborate with engineering teams to support services prior to their launch through activities such as systems design consultation, platform and software framework development, capacity planning, and launch reviews.Continuously innovate by identifying weaknesses, proposing creative solutions, and driving initiatives that simplify, scale, and strengthen the platform.Maintain services post-launch by measuring and monitoring availability, latency, and overall system health.Ensure optimized observability: enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that yield reliability outcomes.Develop and enhance...
About Syndio Syndio is a Series C technology company based in Calgary, Alberta, focused on helping organizations create smarter, fairer compensation strategies. Our platform uses advanced technology and ethical AI to support decision-making, simplify compliance, and provide insights that help companies maintain equitable pay practices worldwide. Syndio analyzes compensation data for more than 10 million employees across many countries, working with leading enterprises to ensure fair and defensible pay. Role Overview: Senior Site Reliability Engineer The Senior Site Reliability Engineer (SRE) will help design, implement, maintain, and evolve solutions that improve the reliability and availability of Syndio’s applications and systems. This role blends software engineering with systems engineering, focusing on eliminating single points of failure, maximizing observability, and responding quickly to incidents. The SRE will work closely with other engineers and teams, sharing ownership and promoting a culture of collaboration and continuous learning. What You Will Do Design and maintain systems that support high availability and reliability for Syndio’s cloud-based applications. Apply software engineering principles to infrastructure and operations challenges. Identify and resolve single points of failure in the stack. Maximize observability and monitoring across platforms. Respond to and resolve failures efficiently to minimize downtime. Explore and implement new tools and techniques to improve reliability and performance. Work across platform, data, security, and software engineering as needed. Manage Kubernetes applications and infrastructure, primarily using Kubernetes and Terraform in a fully cloud-based environment. What We’re Looking For Experience managing Kubernetes applications in an SRE or similar capacity. Comfort working with Terraform and cloud-native environments. Interest in SRE practices and methodologies, with a drive to learn and adapt. Ability to work in a startup environment and handle tasks that may extend beyond traditional SRE responsibilities. Collaborative mindset and willingness to share ownership of systems and solutions. Why Join Syndio as an SRE? Play a key role in a growing engineering organization. Work on meaningful challenges that impact fair pay for millions of employees worldwide. Grow your skills across platform, data, security, and software engineering. Be part of a team that values learning, innovation, and ethical technology. Location: Calgary, Alberta, Canada
Sign in to browse more jobs
Create account — see all 6,554 results
Search for Senior Database Reliability Engineer (SQL) - Remote
6,554 results
PointClickCare
At PointClickCare, our mission is clear: to empower healthcare providers to deliver exceptional care. This commitment begins with our people. As a prominent health tech organization, privately held and founded by visionary leaders, we enable our team members to innovate, challenge the status quo, and contribute to the evolution of healthcare.With the largest dataset in long-term and post-acute care and a network of over 400 integrated partners, our platform supports more than 30,000 provider organizations, significantly impacting millions of lives. We prioritize innovation, reinvesting a considerable portion of our revenue into research and development to provide our employees with the tools necessary for meaningful contributions. Recognized by Forbes as a leading private cloud company and celebrated as one of Canada’s Most Admired Corporate Cultures, we foster an environment of flexibility, growth, and purpose.At PointClickCare, we believe in a future of healthcare that puts people first, accelerated by AI to drive significant and lasting change. Our employees leverage AI as a catalyst for creativity, efficiency, and informed decision-making. By incorporating AI tools into our workflows, we enhance collaboration and improve outcomes, ensuring that every team member has the capability to maximize their impact. Our hiring practices focus on identifying AI expertise that aligns with our mission, and we continually invest in training and development to nurture innovation throughout the employee journey.Join us in transforming healthcare — not just to survive, but to thrive. To discover more about what it's like to work at PointClickCare, visit Life at PointClickCare and connect with us on Glassdoor and LinkedIn.
Wavelo is a Software as a Service (SaaS) company dedicated to simplifying telecom operations. Our innovative software solutions modernize the way communication service providers (CSPs) operate, allowing them to enhance customer experiences, maximize value, and scale their businesses efficiently. As a proud member of Tucows (NASDAQ:TCX, TSX:TC)—one of the largest internet services firms worldwide—Wavelo is supported by exceptional resources and talent. We prioritize a people-centric work culture grounded in respect, trust, and flexibility. We believe that accommodating our employees leads to our overall success. Most of our positions are remote-first, enabling you to work from anywhere with an internet connection! The work we do has a meaningful impact on lives. If you're excited about this opportunity, we look forward to hearing from you!
info-ways
We are seeking a skilled SQL Database Administrator to join our dynamic team at info-ways. The ideal candidate will have a strong background in managing and optimizing SQL databases, ensuring high levels of performance, availability, and security.As a SQL DBA, you will be responsible for installing, configuring, and maintaining SQL Server instances, performing troubleshooting and recovery, and implementing backup and restore strategies. Your expertise will help us to enhance our database solutions and support our business objectives effectively.
Join info-ways as a talented AXIOM SQL Consultant in a fully remote role where you'll leverage your SQL expertise to shape data-driven solutions. We are looking for a professional who can work collaboratively with our team to optimize database performance and ensure seamless data integration. If you are passionate about SQL and thrive in a remote work environment, this is an excellent opportunity for you!
Instacart
Transforming the Grocery IndustryAt Instacart, we believe in sharing love through food, ensuring everyone has access to their favorite groceries and quality time with loved ones. We don’t just see grocery delivery as a necessity; we recognize the exciting complexities and opportunities it presents to meet the diverse needs of our community. We provide an essential service that customers depend on for groceries and household goods, while also offering safe and flexible earning opportunities to our Personal Shoppers.Instacart has become a vital resource for millions, and we’re assembling a dynamic team to propel our shopping cart forward. If you are ready to deliver your best work, we invite you to join our team.Flex First Work EnvironmentWe embrace a flexible approach in how we perform our best work. Our team members can choose their work location—whether from home, an office, or their favorite coffee shop—while fostering connections and community through regular in-person events. Discover more about our flexible work approach.OverviewAbout the RoleAs a Senior Site Reliability Engineer II, you will be instrumental in ensuring the stability and performance of our platform. You will tackle challenges head-on, ensuring optimal performance and fostering a culture that emphasizes reliable and effective practices. We are seeking a proactive individual who is adept at solving complex problems and is enthusiastic about exploring innovative solutions to support our teams and services.About the TeamThe Site Reliability Engineering (SRE) team merges software and systems engineering to design and maintain large-scale, distributed, and fault-tolerant systems. Our mission is to guarantee high reliability, optimal performance, and continuous improvement for Instacart’s critical internal services and customer-facing systems.The SRE team focuses on enhancing existing systems, constructing robust infrastructure, and automating processes to reduce manual efforts. Joining the SRE team means facing unique scaling challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.
info-ways
Join our dynamic team at info-ways as a Database Developer. In this pivotal role, you will be responsible for designing, implementing, and maintaining our database systems to ensure optimal performance and data integrity. Your expertise will play a critical role in supporting various applications and business processes.
About ClickHouseRecognized on the 2025 Forbes Cloud 100 list, ClickHouse stands out as a leading innovator in the realm of private cloud technology. With a rapidly expanding customer base exceeding 3,000 and an astounding annual recurring revenue (ARR) growth of over 250% year-on-year, ClickHouse is at the forefront of real-time analytics, data warehousing, observability, and AI workloads.Our recent $400M Series D financing round validates our sustained momentum. Notable clients such as Capital One, Lovable, Decagon, Polymarket, and Airwallex have recently adopted or expanded their use of our platform, joining a prestigious roster of AI pioneers and global brands including Meta, Cursor, Sony, and Tesla.Join us in our mission to revolutionize the way companies leverage data!About the RoleAs we enhance our commitment to delivering dependable and secure services, we are expanding our Site Reliability Engineering team. In this role, you will spearhead initiatives to maintain and improve the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborate across various teams, including Control Plane, Data Plane, Core, Security, Support, and Operations, to design and implement robust, secure, and highly available distributed systems. You will take charge of incident management and response processes, conducting blameless postmortems and driving continuous improvements in our Cloud services. Your software engineering expertise will be vital in developing tools and platforms to enhance operational and engineering efficiencies within ClickHouse Cloud. This is a unique opportunity to make a substantial impact on our high-performance, elastic ClickHouse Cloud.Your ResponsibilitiesCollaborate with diverse engineering teams at ClickHouse to architect and implement scalable, secure, and high-availability systems.Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.Ensure all infrastructure components within ClickHouse Cloud, including Data Plane, Control Plane, and ClickHouse Core, have effective monitoring and alerting systems in place for timely incident detection and resolution.Refine incident response processes and post-mortem analyses for outages in ClickHouse Cloud, including communication with impacted customers through the support team.Continuously enhance the reliability and performance of ClickHouse services.
Fivetran is seeking a Senior Database Software Engineer to join the team in Toronto, Ontario. This role centers on developing and refining advanced database systems that form the backbone of Fivetran’s products and services. Key responsibilities Design and architect scalable database solutions to enable both new and existing product features Collaborate with engineering, product, and other teams to deliver reliable and efficient database systems Use in-depth knowledge of database technologies to address complex technical issues and enhance system performance Location This position is based in Toronto, Ontario, Canada.
MongoDB, Inc.
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Unity Technologies
Unity Technologies is seeking a Senior Production Reliability Engineer based in Montreal, Canada. This role centers on keeping production systems stable and responsive, directly supporting the products and services that power Unity's platform. Key responsibilities Diagnose and resolve production issues quickly to reduce downtime and service disruptions Work closely with engineering, operations, and product teams to improve system reliability Develop and refine processes that help maintain a stable, high-performing infrastructure Support Unity's growth by contributing to scaling strategies as services and user numbers increase Role impact This position plays a vital part in Unity's ability to deliver reliable products. Efforts in this role help ensure systems can handle both current operations and future expansion, keeping performance strong as demand grows.
About the PositionJoin the innovative Cloud FinOps team at Hopper as a Senior Site Reliability Engineer. In this role, you'll oversee a vast infrastructure on Google Cloud, supporting hundreds of engineers and delivering exceptional experiences to millions of users globally.Your enthusiasm for automation and system optimization will be crucial as you work to create scalable, reliable, and secure infrastructure.You will tackle problems pragmatically, developing solutions that are not only effective but also user-friendly and economical.Daily ResponsibilitiesDrive cost efficiency projects, including:Minimizing network egress costs by eliminating unnecessary headers.Optimizing data storage by ensuring efficient use of warehouse data, such as utilizing cold storage for infrequently accessed buckets.Enhancing autoscaling for both databases and compute services.Enhance cost attribution processes to provide all teams with transparent cost visibility.Participate in incident support and share on-call responsibilities for platform incidents, collaborating with a geographically diverse engineering team.Contribute to a dynamic and efficient team of SREs.
jobgether
As a Senior Site Reliability Engineer at jobgether, the focus is on maintaining and improving the reliability and performance of cloud infrastructure and services. This position is based in Canada and works closely with multiple teams across the company. Role overview The Senior Site Reliability Engineer monitors systems, implements improvements, and automates key processes. The goal is to support a platform that scales smoothly as demands grow. What you will do Ensure the ongoing reliability and performance of cloud-based systems Collaborate with other teams to address infrastructure needs and challenges Automate operational processes to reduce manual work and improve efficiency Identify and implement ways to improve scalability across the platform
Bounteous
About the Role Bounteous is looking for a Database Engineer in Montreal, QC. This role focuses on designing, building, and maintaining database systems that support client projects. The work directly impacts how teams deliver reliable and secure data solutions. Main Responsibilities Design and implement database architectures for new and existing projects Maintain database systems to ensure reliability and performance Collaborate with cross-functional teams to meet project requirements Monitor data integrity and enforce security compliance standards Apply current technologies and methodologies to improve database solutions Location This position is based in Montreal, QC.
System Canada Technologies
Join our dynamic team at System Canada Technologies as an SQL Tester. In this role, you will be responsible for ensuring the quality and performance of our database systems. You will work closely with developers to design test plans and execute various testing strategies to ensure robust SQL database functionality.
StackAdapt
Join StackAdapt as a Senior Database Administrator and help us elevate our data management systems! In this pivotal role, you will be responsible for ensuring the integrity, availability, and security of our database systems. Your expertise will drive improvements to our database performance, enabling our teams to leverage data effectively.You will work closely with cross-functional teams to design, implement, and maintain robust database solutions that support our business goals.
At Relay, we empower self-made business owners with a digital banking platform that transforms financial management into a source of clarity, confidence, and control. Our mission is to replace financial uncertainty with genuine visibility, enabling entrepreneurs to convert their hard work into enduring success. By alleviating the stress of cash flow management, we provide the tools necessary for owners to operate robust and resilient businesses.As Relay continues its growth trajectory, the reliability, performance, and resilience of our platform have become integral to both our customer experience and overall business success.This senior leadership position is crucial in steering a team of Site Reliability Engineers while shaping how reliability strategies influence engineering and product decisions throughout the organization. You will determine the future direction of the SRE function, promote operational excellence, and assist the company in anticipating and managing scale challenges before they pose risks.If you thrive on tackling complex systems, leading organizations, and building resilient platforms that customers depend on daily, we are eager to connect with you!Key ResponsibilitiesLead and enhance Relay’s Site Reliability Engineering function, establishing strategic direction as the company scales.Define and implement a long-term reliability roadmap, making informed trade-offs under real business and capacity constraints.Act as the senior reliability voice in discussions involving engineering and product leadership.Influence the integration of reliability considerations into product planning, architectural decisions, and delivery processes.Serve as a senior escalation point during critical production incidents, ensuring effective communication and thorough follow-up actions.Enhance Relay’s observability, performance, and operational maturity practices across teams.Establish and uphold standards concerning SLOs, operational readiness, incident management, and continuous improvement.Collaborate with stakeholders in Engineering, Product, Data, and Finance to balance velocity, risk, performance, and cost.Build and nurture a high-performing SRE organization capable of supporting future growth.
Join StackAdapt, a premier technology firm dedicated to empowering marketers with advanced tools to effectively engage and convert audiences. Our AI-driven StackAdapt Marketing Platform executes an astounding 465 billion optimizations per second, bridging brand and performance marketing for tangible results throughout the customer journey. Leading marketers trust StackAdapt to execute dynamic campaigns across programmatic advertising and various marketing channels.The Core Services Site Reliability Engineering (Core Services SRE) team establishes the essential principles of reliable engineering within StackAdapt's core service teams.This team underpins StackAdapt's software backend, where all data processing takes place. The Core Services SRE team is committed to ensuring our software is robust and reliable by constructing self-healing systems that automatically deploy and manage the scaling of core services software, while promoting the principles of reliable software and Site Reliability Engineering throughout the organization.As a Remote First company, we welcome applicants from anywhere in the United States or Canada!
About FableFable collaborates with global enterprises to enhance accessibility for over one billion individuals with disabilities. Our esteemed clients include industry leaders such as Walmart, Slack, and Shopify. Recognized on the Forbes Accessibility 100 list in 2025, we have also been honored as one of Fast Company’s Most Innovative Companies in Design, receiving accolades from prestigious organizations including the World Summit Awards and the UN-endorsed Zero Project.About the RoleAs a Senior Site Reliability Engineer at Fable, you will be instrumental in ensuring the reliability, scalability, and efficiency of our platform during our growth phase. Our products empower organizations to create more accessible digital experiences, and the robustness of our infrastructure is key to achieving this mission. You will engage with various platform and product systems to ensure stability, performance, and cost-effectiveness, enabling teams to operate swiftly and securely.With the integration of AI capabilities in contemporary product experiences, you will also help prepare Fable’s infrastructure to handle AI workloads, balancing reliability, performance, and cost while enabling teams to innovate and scale new features safely.Reporting to the Director of Technical Operations, you will collaborate closely with teams across Engineering and Product. This role is perfect for those who thrive on hands-on technical work, take pride in system health, tooling, and operational excellence, and are eager to influence Fable’s infrastructure and reliability strategy moving forward.Key ResponsibilitiesReliability, Infrastructure & PlatformDesign, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services.Enhance system observability, monitoring, and alerting to ensure high availability and rapid incident response.Contribute to and refine SRE practices, including SLIs/SLOs, incident management, and postmortems.Support and optimize CI/CD pipelines and deployment processes.Identify and minimize operational complexity across systems and tooling.Collaborate across infrastructure and application layers to diagnose and resolve reliability and performance issues, making targeted improvements to application code when necessary.Support infrastructure and platform capabilities required for AI/ML-powered features, including considerations for scaling, performance, and reliability.Cost Efficiency & PerformanceMonitor and optimize infrastructure costs across cloud environments.
Tecsys Inc.
At Tecsys, we recognize the transformative power of remote work on employee well-being and the environment. Our commitment to remote work fosters enhanced employee morale, productivity, and reduced commuting times. We are proud to be a remote-first organization, supported by cutting-edge technologies and programs that create a fantastic foundation for our team. Our flexible remote environment, complemented by well-located offices and collaborative workspaces, empowers our staff to work in ways that maximize their productivity.About TecsysTecsys is a rapidly growing innovator in supply chain solutions for leading healthcare systems, hospitals, pharmacies, distributors, retailers, and 3PLs. We collaborate with industry leaders to transform their supply chains through technology. If you thrive on tackling challenges and seek continuous learning opportunities, we invite you to join our dynamic team!Position OverviewWe are in search of an Infrastructure Reliability Engineer to join our Network Operations and Security Center (NOC) team, which is pivotal to the reliability of our critical SaaS platforms. In this role, you will contribute to the maintenance, optimization, and assurance of the reliability and performance of the systems that drive our cloud infrastructure on AWS and Kubernetes. A strong focus will be placed on automation, observability, and continuous improvement.This position amalgamates reliability engineering with incident management, placing you in a key role responsible for availability, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and the continuous enhancement of resilience through automation and engineering.Your ResponsibilitiesCollaborate with engineering teams to support services prior to their launch through activities such as systems design consultation, platform and software framework development, capacity planning, and launch reviews.Continuously innovate by identifying weaknesses, proposing creative solutions, and driving initiatives that simplify, scale, and strengthen the platform.Maintain services post-launch by measuring and monitoring availability, latency, and overall system health.Ensure optimized observability: enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that yield reliability outcomes.Develop and enhance...
About Syndio Syndio is a Series C technology company based in Calgary, Alberta, focused on helping organizations create smarter, fairer compensation strategies. Our platform uses advanced technology and ethical AI to support decision-making, simplify compliance, and provide insights that help companies maintain equitable pay practices worldwide. Syndio analyzes compensation data for more than 10 million employees across many countries, working with leading enterprises to ensure fair and defensible pay. Role Overview: Senior Site Reliability Engineer The Senior Site Reliability Engineer (SRE) will help design, implement, maintain, and evolve solutions that improve the reliability and availability of Syndio’s applications and systems. This role blends software engineering with systems engineering, focusing on eliminating single points of failure, maximizing observability, and responding quickly to incidents. The SRE will work closely with other engineers and teams, sharing ownership and promoting a culture of collaboration and continuous learning. What You Will Do Design and maintain systems that support high availability and reliability for Syndio’s cloud-based applications. Apply software engineering principles to infrastructure and operations challenges. Identify and resolve single points of failure in the stack. Maximize observability and monitoring across platforms. Respond to and resolve failures efficiently to minimize downtime. Explore and implement new tools and techniques to improve reliability and performance. Work across platform, data, security, and software engineering as needed. Manage Kubernetes applications and infrastructure, primarily using Kubernetes and Terraform in a fully cloud-based environment. What We’re Looking For Experience managing Kubernetes applications in an SRE or similar capacity. Comfort working with Terraform and cloud-native environments. Interest in SRE practices and methodologies, with a drive to learn and adapt. Ability to work in a startup environment and handle tasks that may extend beyond traditional SRE responsibilities. Collaborative mindset and willingness to share ownership of systems and solutions. Why Join Syndio as an SRE? Play a key role in a growing engineering organization. Work on meaningful challenges that impact fair pay for millions of employees worldwide. Grow your skills across platform, data, security, and software engineering. Be part of a team that values learning, innovation, and ethical technology. Location: Calgary, Alberta, Canada
Sign in to browse more jobs
Create account — see all 6,554 results

