Site Reliability Engineer at TextNow | Canada
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Entry Level
Qualifications
About TextNow Inc.
TextNow is committed to making communication accessible to everyone. We are at the forefront of transforming phone service into a free and democratic resource for all. Our team is composed of innovative thinkers dedicated to creating a more connected world.
Similar jobs
Browse all companies, explore by city & role, or SEO search pages.
Search for Site Reliability Engineer at TextNow | Canada
5,519 results
TextNow Inc.
At TextNow, we believe that communication should be accessible to everyone. Our mission is to democratize phone service, and we are reshaping the way the world connects. As the largest provider of free phone service in Canada, we are just getting started. Join our team and help us break down communication barriers, enabling conversations to flow freely for i…
TextNow Inc.
At TextNow, we believe that communication is a fundamental right. Our mission is to make phone service accessible to everyone. As the largest provider of free phone service in Canada, we are driven by a team of innovative thinkers who approach challenges with both curiosity and optimism. We invite you to join us in our pursuit to eliminate barriers to communication and foster meaningful conversations across the globe. We are seeking a skilled Data Developer who has substantial experience in designing and implementing data platforms. In this role, you will be responsible for the architecture, development, and ongoing management of TextNow's data ecosystem, empowering our team to make informed, data-driven decisions. Your contributions will support various cross-functional projects aimed at building reliable and scalable frameworks for all of TextNow's business and data initiatives.This position offers an opportunity to create a significant impact at scale. You will have a hand in shaping the way TextNow constructs and operates its systems in an AI-first setting, where leveraging intelligent tools is part of our daily engineering processes. Utilizing AI is not just encouraged; it is essential. From system design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will harness AI tools to enhance development speed, elevate code quality, and support superior technical decision-making. We provide an extensive suite of AI-driven development tools and methods, and we expect you to continuously innovate in your use of these resources to enhance efficiency, clarity, and overall product excellence across our organization.
Join Our Team as a Site Reliability EngineerSeize the opportunity to create a robust reliability practice from the ground up at SiteTracker. You will be instrumental in establishing engineering standards—including Service Level Objectives (SLOs), error budgets, and observability—designed to safeguard our platform as we scale for enterprise clients and enhance our AI capabilities. With full autonomy to define strategy and the trust to implement it, your role will ensure our AI workloads (Evaluations, RAG, and LLM processing) consistently meet premier reliability benchmarks. If you thrive on solving challenges proactively and view toil as an engineering problem, this position promises to be a pivotal step in your career.Your ResponsibilitiesAs a Staff or Senior Staff Site Reliability Engineer, you will collaborate with existing engineers to shift our organization from a reactive approach to a proactive, methodical reliability practice. You will spearhead the intentional advancement of our infrastructure, recognizing the right moments to adopt new tools and transitioning from manual scripts and templates only when necessary. Whether architecting incident response frameworks or tackling unique reliability issues for AI agents, your contributions will amplify the effectiveness of the entire engineering team.By approaching every challenge with a consultative perspective, you will inform technical decisions grounded in data rather than instinct, ensuring our multi-region or service mesh adoption roadmap is future-ready. You will not merely receive tasks; rather, you will take ownership of strategies for production-readiness and deployment safety, fostering the organizational trust essential for making reliability a key differentiator for our product.Required Skills and QualificationsExtensive SRE ExpertiseDefine SLIs and SLOs for critical user journeys to drive proactive engineering choices.Lead live production incident response as an Incident Commander and conduct blameless postmortems that inspire actionable outcomes.Develop observability tools that narrate a system's behavior, creating intuitive dashboards and actionable alerts.Transform an organization from reactive incident management to a structured reliability practice, significantly improving paging volume.Establish error-budget policies to inform data-driven decisions between feature deployment and reliability maintenance.Advanced Technical Proficiency in AWSCompetently design and operate AWS services—VPC, IAM, compute (ECS/EC2/Lambda), managed data services, and load balancing.Effectively manage our existing CloudFormation and bash scripts through GitHub Actions without automatically resorting to Terraform.
At TextNow, we believe that communication should be accessible to everyone. We are pioneering a new way for individuals to connect, merging simplicity, intelligence, and affordability. Our team is composed of innovative builders who leverage technology to empower millions of users with cost-effective communication solutions every day.As a Backend Software Developer, your role will extend beyond mere service creation; you will be instrumental in shaping the systems, architecture, and tooling that underpin our services. Our Technical Staff members are expected to blend leadership with hands-on coding to maximize opportunities for impact. The ability to think strategically while also engaging with intricate details is essential. You will take technical ownership of pivotal backend domains and collaborate across mobile, web, and data to develop systems that are faster, smarter, and more reliable.AI and automation are integral to our development processes. You'll utilize these technologies to expedite development, enhance performance, swiftly identify and resolve issues, and continuously elevate the standard for backend development excellence.We are seeking Technical Staff members at various levels (intermediate/senior/staff+). Whether you're a seasoned developer ready to lead complex systems or an early-career developer eager to advance, we will tailor your title and responsibilities according to your experience and impact.This position promises to create significant impact at scale. You will influence how TextNow constructs and manages its backend systems, employing AI and automation to streamline development, enhance decision-making, and provide seamless experiences for millions of users globally.
At TextNow, we champion the belief that communication is a fundamental right for all. As a pioneer in redefining connectivity, we blend simplicity, intelligence, and accessibility to empower millions of users every day. Our innovative team of builders leverages technology to make communication more affordable and impactful.As an iOS Software Developer, you'll do more than just develop applications; you'll significantly influence the systems, architecture, and tools that enable their creation. Our Technical Staff members blend leadership with hands-on coding, driving high-impact opportunities. The ability to think strategically while also engaging with intricate details is essential.You will take technical ownership of critical mobile domains, collaborating across mobile, backend, and data teams to deliver faster, smarter, and more reliable products. Embracing AI and automation is central to our development process. Your role will involve utilizing these technologies to accelerate development, enhance performance, rapidly identify and solve issues, and continually elevate the standards of mobile development excellence.We are looking for Technical Staff members at various levels (intermediate/senior/staff+). Whether you are a seasoned developer prepared to lead complex systems or an emerging developer eager to grow, we will tailor your title and scope to align with your experience and impact.This position is about making a significant impact at scale. You will influence how TextNow constructs and operates its systems in an AI-first environment, where intelligent tools are integrated into the daily engineering workflow. Leveraging AI is not merely encouraged; it is an expectation. From design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will actively employ AI tools to enhance velocity, improve code quality, and make informed technical decisions. We offer a comprehensive suite of AI-powered development tools and workflows to support you in continually evolving your approach to foster efficiency, clarity, and product excellence across the organization.
Join our innovative team at Newton as a Site Reliability Engineer, where you'll play a crucial role in ensuring the reliability and performance of our systems. In this fully remote position, you will collaborate with engineering and operations teams to develop solutions that enhance system uptime and efficiency.Your expertise will help us transition and maintain our infrastructure, ensuring our services are resilient and scalable. This is an exciting opportunity to contribute to a company that values innovation and teamwork.
jobgether
As a Senior Site Reliability Engineer at jobgether, the focus is on maintaining and improving the reliability and performance of cloud infrastructure and services. This position is based in Canada and works closely with multiple teams across the company. Role overview The Senior Site Reliability Engineer monitors systems, implements improvements, and automates key processes. The goal is to support a platform that scales smoothly as demands grow. What you will do Ensure the ongoing reliability and performance of cloud-based systems Collaborate with other teams to address infrastructure needs and challenges Automate operational processes to reduce manual work and improve efficiency Identify and implement ways to improve scalability across the platform
Instacart
Transforming the Grocery IndustryAt Instacart, we believe in sharing love through food, ensuring everyone has access to their favorite groceries and quality time with loved ones. We don’t just see grocery delivery as a necessity; we recognize the exciting complexities and opportunities it presents to meet the diverse needs of our community. We provide an essential service that customers depend on for groceries and household goods, while also offering safe and flexible earning opportunities to our Personal Shoppers.Instacart has become a vital resource for millions, and we’re assembling a dynamic team to propel our shopping cart forward. If you are ready to deliver your best work, we invite you to join our team.Flex First Work EnvironmentWe embrace a flexible approach in how we perform our best work. Our team members can choose their work location—whether from home, an office, or their favorite coffee shop—while fostering connections and community through regular in-person events. Discover more about our flexible work approach.OverviewAbout the RoleAs a Senior Site Reliability Engineer II, you will be instrumental in ensuring the stability and performance of our platform. You will tackle challenges head-on, ensuring optimal performance and fostering a culture that emphasizes reliable and effective practices. We are seeking a proactive individual who is adept at solving complex problems and is enthusiastic about exploring innovative solutions to support our teams and services.About the TeamThe Site Reliability Engineering (SRE) team merges software and systems engineering to design and maintain large-scale, distributed, and fault-tolerant systems. Our mission is to guarantee high reliability, optimal performance, and continuous improvement for Instacart’s critical internal services and customer-facing systems.The SRE team focuses on enhancing existing systems, constructing robust infrastructure, and automating processes to reduce manual efforts. Joining the SRE team means facing unique scaling challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.
About Syndio Syndio is a Series C technology company based in Calgary, Alberta, focused on helping organizations create smarter, fairer compensation strategies. Our platform uses advanced technology and ethical AI to support decision-making, simplify compliance, and provide insights that help companies maintain equitable pay practices worldwide. Syndio analyzes compensation data for more than 10 million employees across many countries, working with leading enterprises to ensure fair and defensible pay. Role Overview: Senior Site Reliability Engineer The Senior Site Reliability Engineer (SRE) will help design, implement, maintain, and evolve solutions that improve the reliability and availability of Syndio’s applications and systems. This role blends software engineering with systems engineering, focusing on eliminating single points of failure, maximizing observability, and responding quickly to incidents. The SRE will work closely with other engineers and teams, sharing ownership and promoting a culture of collaboration and continuous learning. What You Will Do Design and maintain systems that support high availability and reliability for Syndio’s cloud-based applications. Apply software engineering principles to infrastructure and operations challenges. Identify and resolve single points of failure in the stack. Maximize observability and monitoring across platforms. Respond to and resolve failures efficiently to minimize downtime. Explore and implement new tools and techniques to improve reliability and performance. Work across platform, data, security, and software engineering as needed. Manage Kubernetes applications and infrastructure, primarily using Kubernetes and Terraform in a fully cloud-based environment. What We’re Looking For Experience managing Kubernetes applications in an SRE or similar capacity. Comfort working with Terraform and cloud-native environments. Interest in SRE practices and methodologies, with a drive to learn and adapt. Ability to work in a startup environment and handle tasks that may extend beyond traditional SRE responsibilities. Collaborative mindset and willingness to share ownership of systems and solutions. Why Join Syndio as an SRE? Play a key role in a growing engineering organization. Work on meaningful challenges that impact fair pay for millions of employees worldwide. Grow your skills across platform, data, security, and software engineering. Be part of a team that values learning, innovation, and ethical technology. Location: Calgary, Alberta, Canada
At TextNow, we envision a world where communication is accessible to everyone. Our mission is to revolutionize connectivity by blending simplicity, innovation, and affordability. Our team comprises passionate creators harnessing technology to empower millions of users with efficient communication tools every day.As an Android Software Developer, your role will extend beyond app development; you will influence the systems, architecture, and tools that bring our applications to life. At TextNow, our Technical Staff members embody a blend of leadership and hands-on coding, unlocking high-leverage opportunities. A strategic mindset, coupled with an attention to intricate details, is essential. You will take technical ownership of pivotal mobile areas and collaborate across mobile, backend, and data teams to deliver faster, smarter, and more reliable products.Embracing AI and automation is central to our development approach. You will leverage these technologies to enhance development speed, boost performance, quickly identify and resolve issues, and consistently elevate mobile development standards.We are actively seeking Technical Staff at various levels (intermediate/senior/staff+). Whether you are a seasoned developer eager to lead complex systems or a budding developer excited to expand your skills, we will tailor your title and responsibilities based on your experience and impact.This position revolves around creating significant impact at scale. You will influence how TextNow constructs and operates its systems within an AI-first environment, where intelligent tools are integrated into daily engineering practices. Using AI is not just encouraged; it is expected. From design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will actively utilize AI tools to enhance productivity, improve code quality, and make informed technical decisions. We offer a comprehensive suite of AI-powered development tools and workflows to assist you, and we anticipate that you will continuously refine your usage of these tools to elevate efficiency, clarity, and product excellence throughout the organization.
Coalition
Join Coalition as a Staff Site Reliability Engineer, where you will play a pivotal role in enhancing the reliability and performance of our systems. We are seeking a passionate engineer who thrives in a collaborative environment and is dedicated to ensuring seamless operations across our platforms.
Axon Enterprise, Inc.
Become a Force for Good with Axon.At Axon, our mission is to Protect Life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. We believe collaboration is key; we connect with honesty and empathy, valuing diverse perspectives from our customers, communities, and one another.Life at Axon is dynamic, demanding, and deeply rewarding. Here, you'll take the initiative and drive meaningful change while continuously growing in a mission-driven environment.Your ImpactAs a senior member of the APX Site Reliability Engineering (SRE) team, your passion for delivering solutions to real-time challenges faced by our mission-critical cloud-native services will shine through. You will ensure the high standards of quality, reliability, and security that our customers expect. Your contributions will not only be pivotal within the APX SRE team but will also empower the entire engineering organization to deliver cutting-edge features consistently.In this role, you will significantly influence Identity and Security by assisting teams in building and managing systems that safeguard user identity, enhance authentication and authorization processes, and comply with regulatory standards. You will collaborate closely with engineering, security, and identity stakeholders to elevate secure-by-default reliability practices across the organization.
MongoDB, Inc.
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Tyk Technologies
About Tyk TechnologiesTyk Technologies is at the forefront of API Management, paving the way for a connected world and enabling innovative products and services. Our platform transforms how organizations connect their systems and services, whether they are internal, external, public, or highly encrypted. We empower businesses in various sectors including retail, finance, telecommunications, healthcare, and media.Founded in 2015, Tyk has expanded globally with offices in London, Ontario, Atlanta, and Singapore, serving thousands of users worldwide. Our platform is trusted by renowned brands such as Lotte, Bell, T-Mobile, RBS, Capital One, and Vinci, with a diverse user base that spans every continent.Our VisionAt Tyk, we are committed to connecting every system in the world through our comprehensive API Management platform.Work Culture: Flexibility and ResponsibilityWe believe in providing unlimited paid holidays and the flexibility to work from anywhere. Our remote-first philosophy is built on the principles of autonomy and flexibility, enabling our employees to perform at their best and fostering a diverse team without barriers to location or working hours.The Role: Site Reliability EngineerWe are seeking a proactive Site Reliability Engineer to oversee, enhance, and provide support for our platform. Your curiosity and problem-solving skills will drive improvements, as you will be responsible for identifying reliability issues and collaborating with your team to address them. As the first line of incident management for our clients, you will define our response strategies.This position offers a unique opportunity to collaborate with a leading distributed team and shape the future of Tyk as we continue to expand our Cloud platform.
About FableFable collaborates with global enterprises to enhance accessibility for over one billion individuals with disabilities. Our esteemed clients include industry leaders such as Walmart, Slack, and Shopify. Recognized on the Forbes Accessibility 100 list in 2025, we have also been honored as one of Fast Company’s Most Innovative Companies in Design, receiving accolades from prestigious organizations including the World Summit Awards and the UN-endorsed Zero Project.About the RoleAs a Senior Site Reliability Engineer at Fable, you will be instrumental in ensuring the reliability, scalability, and efficiency of our platform during our growth phase. Our products empower organizations to create more accessible digital experiences, and the robustness of our infrastructure is key to achieving this mission. You will engage with various platform and product systems to ensure stability, performance, and cost-effectiveness, enabling teams to operate swiftly and securely.With the integration of AI capabilities in contemporary product experiences, you will also help prepare Fable’s infrastructure to handle AI workloads, balancing reliability, performance, and cost while enabling teams to innovate and scale new features safely.Reporting to the Director of Technical Operations, you will collaborate closely with teams across Engineering and Product. This role is perfect for those who thrive on hands-on technical work, take pride in system health, tooling, and operational excellence, and are eager to influence Fable’s infrastructure and reliability strategy moving forward.Key ResponsibilitiesReliability, Infrastructure & PlatformDesign, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services.Enhance system observability, monitoring, and alerting to ensure high availability and rapid incident response.Contribute to and refine SRE practices, including SLIs/SLOs, incident management, and postmortems.Support and optimize CI/CD pipelines and deployment processes.Identify and minimize operational complexity across systems and tooling.Collaborate across infrastructure and application layers to diagnose and resolve reliability and performance issues, making targeted improvements to application code when necessary.Support infrastructure and platform capabilities required for AI/ML-powered features, including considerations for scaling, performance, and reliability.Cost Efficiency & PerformanceMonitor and optimize infrastructure costs across cloud environments.
Veeva Systems Inc.
Veeva Systems is a mission-driven leader in industry cloud technology, dedicated to accelerating the delivery of therapies to patients in the life sciences sector. As one of the fastest-growing SaaS companies ever, we surpassed $2 billion in revenue last fiscal year with significant growth prospects ahead.Central to Veeva's mission are our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. Notably, we made history in 2021 by becoming a public benefit corporation (PBC), which legally commits us to balance the interests of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your work environment, whether it's from home or in our office, enabling you to excel in your preferred setting.Be part of our journey in transforming the life sciences industry and making a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be instrumental in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge of Java and modern open-source technologies to create a meaningful impact on our production systems.The ideal candidate will possess substantial experience with Java applications and cutting-edge open-source technologies, particularly within the context of enterprise software development or a high-growth tech environment. As a Senior SRE, you should have a natural curiosity and a strong aptitude for problem-solving. Your unique engineering perspective will be critical as you understand how systems integrate in production to function efficiently on a global scale, supporting hundreds of customers across North America, Europe, and Asia.
Veeva Systems Inc.
Veeva Systems Inc. is a pioneering mission-driven company in the industry cloud, dedicated to accelerating the delivery of therapies to patients within the life sciences sector. As one of the fastest-growing SaaS companies ever, we achieved over $2 billion in revenue last fiscal year and maintain significant growth potential.Our core values drive us: Do the Right Thing, Customer Success, Employee Success, and Speed. Not just a public company, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we embrace flexible work arrangements, allowing you to excel in an environment that suits you best, whether that’s from home or in the office.Join us in transforming the life sciences industry, and make a meaningful impact on our customers, employees, and communities.
Axon Enterprise, Inc.
Become a Force for Good with Axon.At Axon, we are driven by our mission to Protect Life. We are innovators, tackling society's most pressing safety and justice challenges through our advanced ecosystem of devices and cloud-based software. Just like our products, we thrive on collaboration, embracing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. You will take initiative and drive substantial change, growing continually as you contribute to a mission that truly matters at a company where your contributions are valued.Your ImpactAs a vital member of the Site Reliability Engineering (SRE) team, you are dedicated to providing solutions to the real-time challenges faced by our mission-critical cloud-native services. You are committed to ensuring the high quality and reliability that our customers expect. Collaborating closely not only within the SRE team, your technical contributions will empower the entire engineering organization, enabling product teams to consistently deliver cutting-edge features.Location: Remote in CanadaYour ResponsibilitiesDevelop robust, user-friendly foundational platforms and tools that allow engineering teams to provision services quickly, consistently, securely, and cost-effectively.Implement best practices in cloud-native site reliability.Write clean, maintainable, and efficient code.Utilize strong problem-solving abilities to debug issues in cloud-native distributed systems.Guide and educate the engineering organization in adopting innovative architectural patterns.Create thorough documentation to facilitate self-service for engineers.Embrace calculated risks, advocate for new ideas, and enhance your craft.
Pinterest, Inc.
Pinterest is hiring a Senior Site Reliability Engineer in Toronto, ON, Canada. The focus of this role is to ensure that Pinterest’s services remain reliable, scalable, and perform well as the platform grows. Working closely with software engineers, this position involves designing and implementing solutions that strengthen system reliability and efficiency. Key responsibilities Partner with engineering teams to maintain and enhance the reliability of Pinterest’s services Design and implement improvements to support scalability and performance Troubleshoot and resolve service issues to reduce downtime Requirements Extensive experience in site reliability engineering or a closely related field Strong technical background with proven problem-solving abilities Comfort working alongside software engineers to improve systems This position is located in Toronto, ON, Canada.
At Veeva Systems, we are driven by a mission to revolutionize the life sciences industry, empowering companies to bring therapies to patients at an accelerated pace. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and possess immense growth potential.Our core values - Do the Right Thing, Customer Success, Employee Success, and Speed - define who we are. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we offer the flexibility for you to work remotely or from our office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences sector and making a positive impact on our customers, employees, and communities.
Sign in to browse more jobs
Create account — see all 5,519 results
Search for Site Reliability Engineer at TextNow | Canada
5,519 results
TextNow Inc.
At TextNow, we believe that communication should be accessible to everyone. Our mission is to democratize phone service, and we are reshaping the way the world connects. As the largest provider of free phone service in Canada, we are just getting started. Join our team and help us break down communication barriers, enabling conversations to flow freely for i…
TextNow Inc.
At TextNow, we believe that communication is a fundamental right. Our mission is to make phone service accessible to everyone. As the largest provider of free phone service in Canada, we are driven by a team of innovative thinkers who approach challenges with both curiosity and optimism. We invite you to join us in our pursuit to eliminate barriers to communication and foster meaningful conversations across the globe. We are seeking a skilled Data Developer who has substantial experience in designing and implementing data platforms. In this role, you will be responsible for the architecture, development, and ongoing management of TextNow's data ecosystem, empowering our team to make informed, data-driven decisions. Your contributions will support various cross-functional projects aimed at building reliable and scalable frameworks for all of TextNow's business and data initiatives.This position offers an opportunity to create a significant impact at scale. You will have a hand in shaping the way TextNow constructs and operates its systems in an AI-first setting, where leveraging intelligent tools is part of our daily engineering processes. Utilizing AI is not just encouraged; it is essential. From system design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will harness AI tools to enhance development speed, elevate code quality, and support superior technical decision-making. We provide an extensive suite of AI-driven development tools and methods, and we expect you to continuously innovate in your use of these resources to enhance efficiency, clarity, and overall product excellence across our organization.
Join Our Team as a Site Reliability EngineerSeize the opportunity to create a robust reliability practice from the ground up at SiteTracker. You will be instrumental in establishing engineering standards—including Service Level Objectives (SLOs), error budgets, and observability—designed to safeguard our platform as we scale for enterprise clients and enhance our AI capabilities. With full autonomy to define strategy and the trust to implement it, your role will ensure our AI workloads (Evaluations, RAG, and LLM processing) consistently meet premier reliability benchmarks. If you thrive on solving challenges proactively and view toil as an engineering problem, this position promises to be a pivotal step in your career.Your ResponsibilitiesAs a Staff or Senior Staff Site Reliability Engineer, you will collaborate with existing engineers to shift our organization from a reactive approach to a proactive, methodical reliability practice. You will spearhead the intentional advancement of our infrastructure, recognizing the right moments to adopt new tools and transitioning from manual scripts and templates only when necessary. Whether architecting incident response frameworks or tackling unique reliability issues for AI agents, your contributions will amplify the effectiveness of the entire engineering team.By approaching every challenge with a consultative perspective, you will inform technical decisions grounded in data rather than instinct, ensuring our multi-region or service mesh adoption roadmap is future-ready. You will not merely receive tasks; rather, you will take ownership of strategies for production-readiness and deployment safety, fostering the organizational trust essential for making reliability a key differentiator for our product.Required Skills and QualificationsExtensive SRE ExpertiseDefine SLIs and SLOs for critical user journeys to drive proactive engineering choices.Lead live production incident response as an Incident Commander and conduct blameless postmortems that inspire actionable outcomes.Develop observability tools that narrate a system's behavior, creating intuitive dashboards and actionable alerts.Transform an organization from reactive incident management to a structured reliability practice, significantly improving paging volume.Establish error-budget policies to inform data-driven decisions between feature deployment and reliability maintenance.Advanced Technical Proficiency in AWSCompetently design and operate AWS services—VPC, IAM, compute (ECS/EC2/Lambda), managed data services, and load balancing.Effectively manage our existing CloudFormation and bash scripts through GitHub Actions without automatically resorting to Terraform.
At TextNow, we believe that communication should be accessible to everyone. We are pioneering a new way for individuals to connect, merging simplicity, intelligence, and affordability. Our team is composed of innovative builders who leverage technology to empower millions of users with cost-effective communication solutions every day.As a Backend Software Developer, your role will extend beyond mere service creation; you will be instrumental in shaping the systems, architecture, and tooling that underpin our services. Our Technical Staff members are expected to blend leadership with hands-on coding to maximize opportunities for impact. The ability to think strategically while also engaging with intricate details is essential. You will take technical ownership of pivotal backend domains and collaborate across mobile, web, and data to develop systems that are faster, smarter, and more reliable.AI and automation are integral to our development processes. You'll utilize these technologies to expedite development, enhance performance, swiftly identify and resolve issues, and continuously elevate the standard for backend development excellence.We are seeking Technical Staff members at various levels (intermediate/senior/staff+). Whether you're a seasoned developer ready to lead complex systems or an early-career developer eager to advance, we will tailor your title and responsibilities according to your experience and impact.This position promises to create significant impact at scale. You will influence how TextNow constructs and manages its backend systems, employing AI and automation to streamline development, enhance decision-making, and provide seamless experiences for millions of users globally.
At TextNow, we champion the belief that communication is a fundamental right for all. As a pioneer in redefining connectivity, we blend simplicity, intelligence, and accessibility to empower millions of users every day. Our innovative team of builders leverages technology to make communication more affordable and impactful.As an iOS Software Developer, you'll do more than just develop applications; you'll significantly influence the systems, architecture, and tools that enable their creation. Our Technical Staff members blend leadership with hands-on coding, driving high-impact opportunities. The ability to think strategically while also engaging with intricate details is essential.You will take technical ownership of critical mobile domains, collaborating across mobile, backend, and data teams to deliver faster, smarter, and more reliable products. Embracing AI and automation is central to our development process. Your role will involve utilizing these technologies to accelerate development, enhance performance, rapidly identify and solve issues, and continually elevate the standards of mobile development excellence.We are looking for Technical Staff members at various levels (intermediate/senior/staff+). Whether you are a seasoned developer prepared to lead complex systems or an emerging developer eager to grow, we will tailor your title and scope to align with your experience and impact.This position is about making a significant impact at scale. You will influence how TextNow constructs and operates its systems in an AI-first environment, where intelligent tools are integrated into the daily engineering workflow. Leveraging AI is not merely encouraged; it is an expectation. From design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will actively employ AI tools to enhance velocity, improve code quality, and make informed technical decisions. We offer a comprehensive suite of AI-powered development tools and workflows to support you in continually evolving your approach to foster efficiency, clarity, and product excellence across the organization.
Join our innovative team at Newton as a Site Reliability Engineer, where you'll play a crucial role in ensuring the reliability and performance of our systems. In this fully remote position, you will collaborate with engineering and operations teams to develop solutions that enhance system uptime and efficiency.Your expertise will help us transition and maintain our infrastructure, ensuring our services are resilient and scalable. This is an exciting opportunity to contribute to a company that values innovation and teamwork.
jobgether
As a Senior Site Reliability Engineer at jobgether, the focus is on maintaining and improving the reliability and performance of cloud infrastructure and services. This position is based in Canada and works closely with multiple teams across the company. Role overview The Senior Site Reliability Engineer monitors systems, implements improvements, and automates key processes. The goal is to support a platform that scales smoothly as demands grow. What you will do Ensure the ongoing reliability and performance of cloud-based systems Collaborate with other teams to address infrastructure needs and challenges Automate operational processes to reduce manual work and improve efficiency Identify and implement ways to improve scalability across the platform
Instacart
Transforming the Grocery IndustryAt Instacart, we believe in sharing love through food, ensuring everyone has access to their favorite groceries and quality time with loved ones. We don’t just see grocery delivery as a necessity; we recognize the exciting complexities and opportunities it presents to meet the diverse needs of our community. We provide an essential service that customers depend on for groceries and household goods, while also offering safe and flexible earning opportunities to our Personal Shoppers.Instacart has become a vital resource for millions, and we’re assembling a dynamic team to propel our shopping cart forward. If you are ready to deliver your best work, we invite you to join our team.Flex First Work EnvironmentWe embrace a flexible approach in how we perform our best work. Our team members can choose their work location—whether from home, an office, or their favorite coffee shop—while fostering connections and community through regular in-person events. Discover more about our flexible work approach.OverviewAbout the RoleAs a Senior Site Reliability Engineer II, you will be instrumental in ensuring the stability and performance of our platform. You will tackle challenges head-on, ensuring optimal performance and fostering a culture that emphasizes reliable and effective practices. We are seeking a proactive individual who is adept at solving complex problems and is enthusiastic about exploring innovative solutions to support our teams and services.About the TeamThe Site Reliability Engineering (SRE) team merges software and systems engineering to design and maintain large-scale, distributed, and fault-tolerant systems. Our mission is to guarantee high reliability, optimal performance, and continuous improvement for Instacart’s critical internal services and customer-facing systems.The SRE team focuses on enhancing existing systems, constructing robust infrastructure, and automating processes to reduce manual efforts. Joining the SRE team means facing unique scaling challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.
About Syndio Syndio is a Series C technology company based in Calgary, Alberta, focused on helping organizations create smarter, fairer compensation strategies. Our platform uses advanced technology and ethical AI to support decision-making, simplify compliance, and provide insights that help companies maintain equitable pay practices worldwide. Syndio analyzes compensation data for more than 10 million employees across many countries, working with leading enterprises to ensure fair and defensible pay. Role Overview: Senior Site Reliability Engineer The Senior Site Reliability Engineer (SRE) will help design, implement, maintain, and evolve solutions that improve the reliability and availability of Syndio’s applications and systems. This role blends software engineering with systems engineering, focusing on eliminating single points of failure, maximizing observability, and responding quickly to incidents. The SRE will work closely with other engineers and teams, sharing ownership and promoting a culture of collaboration and continuous learning. What You Will Do Design and maintain systems that support high availability and reliability for Syndio’s cloud-based applications. Apply software engineering principles to infrastructure and operations challenges. Identify and resolve single points of failure in the stack. Maximize observability and monitoring across platforms. Respond to and resolve failures efficiently to minimize downtime. Explore and implement new tools and techniques to improve reliability and performance. Work across platform, data, security, and software engineering as needed. Manage Kubernetes applications and infrastructure, primarily using Kubernetes and Terraform in a fully cloud-based environment. What We’re Looking For Experience managing Kubernetes applications in an SRE or similar capacity. Comfort working with Terraform and cloud-native environments. Interest in SRE practices and methodologies, with a drive to learn and adapt. Ability to work in a startup environment and handle tasks that may extend beyond traditional SRE responsibilities. Collaborative mindset and willingness to share ownership of systems and solutions. Why Join Syndio as an SRE? Play a key role in a growing engineering organization. Work on meaningful challenges that impact fair pay for millions of employees worldwide. Grow your skills across platform, data, security, and software engineering. Be part of a team that values learning, innovation, and ethical technology. Location: Calgary, Alberta, Canada
At TextNow, we envision a world where communication is accessible to everyone. Our mission is to revolutionize connectivity by blending simplicity, innovation, and affordability. Our team comprises passionate creators harnessing technology to empower millions of users with efficient communication tools every day.As an Android Software Developer, your role will extend beyond app development; you will influence the systems, architecture, and tools that bring our applications to life. At TextNow, our Technical Staff members embody a blend of leadership and hands-on coding, unlocking high-leverage opportunities. A strategic mindset, coupled with an attention to intricate details, is essential. You will take technical ownership of pivotal mobile areas and collaborate across mobile, backend, and data teams to deliver faster, smarter, and more reliable products.Embracing AI and automation is central to our development approach. You will leverage these technologies to enhance development speed, boost performance, quickly identify and resolve issues, and consistently elevate mobile development standards.We are actively seeking Technical Staff at various levels (intermediate/senior/staff+). Whether you are a seasoned developer eager to lead complex systems or a budding developer excited to expand your skills, we will tailor your title and responsibilities based on your experience and impact.This position revolves around creating significant impact at scale. You will influence how TextNow constructs and operates its systems within an AI-first environment, where intelligent tools are integrated into daily engineering practices. Using AI is not just encouraged; it is expected. From design and architecture to implementation, testing, debugging, documentation, and operational analysis, you will actively utilize AI tools to enhance productivity, improve code quality, and make informed technical decisions. We offer a comprehensive suite of AI-powered development tools and workflows to assist you, and we anticipate that you will continuously refine your usage of these tools to elevate efficiency, clarity, and product excellence throughout the organization.
Coalition
Join Coalition as a Staff Site Reliability Engineer, where you will play a pivotal role in enhancing the reliability and performance of our systems. We are seeking a passionate engineer who thrives in a collaborative environment and is dedicated to ensuring seamless operations across our platforms.
Axon Enterprise, Inc.
Become a Force for Good with Axon.At Axon, our mission is to Protect Life. We tackle society's most pressing safety and justice challenges through our innovative ecosystem of devices and cloud software. We believe collaboration is key; we connect with honesty and empathy, valuing diverse perspectives from our customers, communities, and one another.Life at Axon is dynamic, demanding, and deeply rewarding. Here, you'll take the initiative and drive meaningful change while continuously growing in a mission-driven environment.Your ImpactAs a senior member of the APX Site Reliability Engineering (SRE) team, your passion for delivering solutions to real-time challenges faced by our mission-critical cloud-native services will shine through. You will ensure the high standards of quality, reliability, and security that our customers expect. Your contributions will not only be pivotal within the APX SRE team but will also empower the entire engineering organization to deliver cutting-edge features consistently.In this role, you will significantly influence Identity and Security by assisting teams in building and managing systems that safeguard user identity, enhance authentication and authorization processes, and comply with regulatory standards. You will collaborate closely with engineering, security, and identity stakeholders to elevate secure-by-default reliability practices across the organization.
MongoDB, Inc.
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Tyk Technologies
About Tyk TechnologiesTyk Technologies is at the forefront of API Management, paving the way for a connected world and enabling innovative products and services. Our platform transforms how organizations connect their systems and services, whether they are internal, external, public, or highly encrypted. We empower businesses in various sectors including retail, finance, telecommunications, healthcare, and media.Founded in 2015, Tyk has expanded globally with offices in London, Ontario, Atlanta, and Singapore, serving thousands of users worldwide. Our platform is trusted by renowned brands such as Lotte, Bell, T-Mobile, RBS, Capital One, and Vinci, with a diverse user base that spans every continent.Our VisionAt Tyk, we are committed to connecting every system in the world through our comprehensive API Management platform.Work Culture: Flexibility and ResponsibilityWe believe in providing unlimited paid holidays and the flexibility to work from anywhere. Our remote-first philosophy is built on the principles of autonomy and flexibility, enabling our employees to perform at their best and fostering a diverse team without barriers to location or working hours.The Role: Site Reliability EngineerWe are seeking a proactive Site Reliability Engineer to oversee, enhance, and provide support for our platform. Your curiosity and problem-solving skills will drive improvements, as you will be responsible for identifying reliability issues and collaborating with your team to address them. As the first line of incident management for our clients, you will define our response strategies.This position offers a unique opportunity to collaborate with a leading distributed team and shape the future of Tyk as we continue to expand our Cloud platform.
About FableFable collaborates with global enterprises to enhance accessibility for over one billion individuals with disabilities. Our esteemed clients include industry leaders such as Walmart, Slack, and Shopify. Recognized on the Forbes Accessibility 100 list in 2025, we have also been honored as one of Fast Company’s Most Innovative Companies in Design, receiving accolades from prestigious organizations including the World Summit Awards and the UN-endorsed Zero Project.About the RoleAs a Senior Site Reliability Engineer at Fable, you will be instrumental in ensuring the reliability, scalability, and efficiency of our platform during our growth phase. Our products empower organizations to create more accessible digital experiences, and the robustness of our infrastructure is key to achieving this mission. You will engage with various platform and product systems to ensure stability, performance, and cost-effectiveness, enabling teams to operate swiftly and securely.With the integration of AI capabilities in contemporary product experiences, you will also help prepare Fable’s infrastructure to handle AI workloads, balancing reliability, performance, and cost while enabling teams to innovate and scale new features safely.Reporting to the Director of Technical Operations, you will collaborate closely with teams across Engineering and Product. This role is perfect for those who thrive on hands-on technical work, take pride in system health, tooling, and operational excellence, and are eager to influence Fable’s infrastructure and reliability strategy moving forward.Key ResponsibilitiesReliability, Infrastructure & PlatformDesign, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services.Enhance system observability, monitoring, and alerting to ensure high availability and rapid incident response.Contribute to and refine SRE practices, including SLIs/SLOs, incident management, and postmortems.Support and optimize CI/CD pipelines and deployment processes.Identify and minimize operational complexity across systems and tooling.Collaborate across infrastructure and application layers to diagnose and resolve reliability and performance issues, making targeted improvements to application code when necessary.Support infrastructure and platform capabilities required for AI/ML-powered features, including considerations for scaling, performance, and reliability.Cost Efficiency & PerformanceMonitor and optimize infrastructure costs across cloud environments.
Veeva Systems Inc.
Veeva Systems is a mission-driven leader in industry cloud technology, dedicated to accelerating the delivery of therapies to patients in the life sciences sector. As one of the fastest-growing SaaS companies ever, we surpassed $2 billion in revenue last fiscal year with significant growth prospects ahead.Central to Veeva's mission are our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. Notably, we made history in 2021 by becoming a public benefit corporation (PBC), which legally commits us to balance the interests of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your work environment, whether it's from home or in our office, enabling you to excel in your preferred setting.Be part of our journey in transforming the life sciences industry and making a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be instrumental in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge of Java and modern open-source technologies to create a meaningful impact on our production systems.The ideal candidate will possess substantial experience with Java applications and cutting-edge open-source technologies, particularly within the context of enterprise software development or a high-growth tech environment. As a Senior SRE, you should have a natural curiosity and a strong aptitude for problem-solving. Your unique engineering perspective will be critical as you understand how systems integrate in production to function efficiently on a global scale, supporting hundreds of customers across North America, Europe, and Asia.
Veeva Systems Inc.
Veeva Systems Inc. is a pioneering mission-driven company in the industry cloud, dedicated to accelerating the delivery of therapies to patients within the life sciences sector. As one of the fastest-growing SaaS companies ever, we achieved over $2 billion in revenue last fiscal year and maintain significant growth potential.Our core values drive us: Do the Right Thing, Customer Success, Employee Success, and Speed. Not just a public company, we made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we embrace flexible work arrangements, allowing you to excel in an environment that suits you best, whether that’s from home or in the office.Join us in transforming the life sciences industry, and make a meaningful impact on our customers, employees, and communities.
Axon Enterprise, Inc.
Become a Force for Good with Axon.At Axon, we are driven by our mission to Protect Life. We are innovators, tackling society's most pressing safety and justice challenges through our advanced ecosystem of devices and cloud-based software. Just like our products, we thrive on collaboration, embracing diverse perspectives from our customers, communities, and each other.Working at Axon is dynamic, rewarding, and impactful. You will take initiative and drive substantial change, growing continually as you contribute to a mission that truly matters at a company where your contributions are valued.Your ImpactAs a vital member of the Site Reliability Engineering (SRE) team, you are dedicated to providing solutions to the real-time challenges faced by our mission-critical cloud-native services. You are committed to ensuring the high quality and reliability that our customers expect. Collaborating closely not only within the SRE team, your technical contributions will empower the entire engineering organization, enabling product teams to consistently deliver cutting-edge features.Location: Remote in CanadaYour ResponsibilitiesDevelop robust, user-friendly foundational platforms and tools that allow engineering teams to provision services quickly, consistently, securely, and cost-effectively.Implement best practices in cloud-native site reliability.Write clean, maintainable, and efficient code.Utilize strong problem-solving abilities to debug issues in cloud-native distributed systems.Guide and educate the engineering organization in adopting innovative architectural patterns.Create thorough documentation to facilitate self-service for engineers.Embrace calculated risks, advocate for new ideas, and enhance your craft.
Pinterest, Inc.
Pinterest is hiring a Senior Site Reliability Engineer in Toronto, ON, Canada. The focus of this role is to ensure that Pinterest’s services remain reliable, scalable, and perform well as the platform grows. Working closely with software engineers, this position involves designing and implementing solutions that strengthen system reliability and efficiency. Key responsibilities Partner with engineering teams to maintain and enhance the reliability of Pinterest’s services Design and implement improvements to support scalability and performance Troubleshoot and resolve service issues to reduce downtime Requirements Extensive experience in site reliability engineering or a closely related field Strong technical background with proven problem-solving abilities Comfort working alongside software engineers to improve systems This position is located in Toronto, ON, Canada.
At Veeva Systems, we are driven by a mission to revolutionize the life sciences industry, empowering companies to bring therapies to patients at an accelerated pace. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and possess immense growth potential.Our core values - Do the Right Thing, Customer Success, Employee Success, and Speed - define who we are. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we offer the flexibility for you to work remotely or from our office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences sector and making a positive impact on our customers, employees, and communities.
Sign in to browse more jobs
Create account — see all 5,519 results
