Senior Site Reliability Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Qualifications
About Anduril Industries
Anduril Industries is redefining defense technology with a mission to enhance military capabilities through innovative solutions. Leveraging advanced technologies like AI, autonomy, and sensor fusion, we are changing the landscape of defense systems to be more responsive and effective. Our cutting-edge Lattice OS powers a real-time command and control center, ensuring our clients are equipped for the challenges of modern strategic competition. Join us in delivering unparalleled technological advancements to the defense sector.
Similar jobs
Search for Site Reliability Engineer
421 results
Future Secure AI
Future Secure AI creates AI Co-Workers that help enterprises automate operational tasks. The team builds production systems designed to handle real-world scale and reliability. Daily work emphasizes disciplined engineering, resilience, and a culture that encourages both collaboration and individual growth. Leadership maintains an entrepreneurial approach and remains accessible to support employees. Role overview The Site Reliability Engineer, based in Sydney, will design, build, and maintain the infrastructure that powers AI Co-Workers. This position works closely with product, AI, and engineering teams. The role involves taking responsibility for system reliability throughout the entire lifecycle. Key responsibilities Design, build, and manage reliable production infrastructure for AI Co-Workers. Oversee Kubernetes-based platforms for deploying and running AI workloads. Create and maintain infrastructure as code using Terraform. Implement and manage Helm-based deployment workflows. Define, measure, and improve system reliability using SLIs, SLOs, and SLAs. Participate in on-call rotations, handle incident response, conduct root cause analysis, and contribute to post-mortem reviews. Reduce operational toil through automation and engineering improvements. Develop and enhance observability, including monitoring, logging, and alerting. Work with engineers to keep systems resilient, scalable, and secure. Manage tasks across build, deploy, and operate phases of the software lifecycle.
Future Secure AI
About Future Secure AI Future Secure AI creates AI Co-Workers that automate key operational tasks for enterprises. Our systems run at scale in production, where reliability and disciplined engineering matter every day. The company values courage, rigor, and curiosity, and maintains an entrepreneurial, approachable leadership style. Professional growth is a priority here, and team members are supported to do their best work. Role Overview: Site Reliability Engineer (Sydney) This Site Reliability Engineer role focuses on building and supporting the platforms behind our AI Co-Workers. The position suits someone who takes responsibility for reliability from start to finish and enjoys collaborating with product, AI, and engineering teams. What You Will Do Design, build, and maintain production infrastructure for AI Co-Workers. Manage Kubernetes-based platforms for deploying and running AI workloads. Use Terraform to create and maintain infrastructure as code. Implement and manage deployment workflows using Helm. Define, measure, and improve system reliability with SLIs, SLOs, and SLAs. Participate in on-call rotation, incident response, root cause analysis, and post-mortem reviews. Reduce manual operational work by automating processes and making engineering improvements. Improve observability through better monitoring, logging, and alerting. Work closely with engineering teams to build resilient, scalable, and secure systems. Contribute across all phases of the software lifecycle: build, deploy, and operate. Location Sydney
The Trade Desk
Join Our Team: The Trade Desk is a pioneering technology firm dedicated to enhancing the internet experience through ethical and intelligent advertising solutions. With an impressive capacity of handling over 1 trillion queries daily, our platform operates on a scale that is unmatched in the industry. We are proud of our award-winning culture, which emphasizes trust, ownership, empathy, and collaboration. We celebrate the diverse experiences and perspectives of our team members and strive to create inclusive environments where everyone can show their true selves every day. Are you passionate about tackling complex challenges on a large scale? Do you want to be part of a vibrant, globally connected team where your efforts will significantly impact the media landscape? Discover why Fortune magazine consistently ranks The Trade Desk among the top small to medium-sized workplaces worldwide. Your Future Team: The Trade Desk Network Team is responsible for managing comprehensive networking across one of the industry’s most challenging infrastructures, which includes extensive bare-metal datacenters and major public cloud platforms. We operate at the intersection of network engineering and software development, collaborating closely with application, datacenter, and Site Reliability Engineering (SRE) teams to design and maintain networks that facilitate a global, high-performance advertising technology platform. Our approach is software-first, and we seamlessly integrate modern AI-assisted development tools like Cursor and Claude into our workflows. You will be instrumental in shaping the future of network automation rather than merely maintaining existing systems. What We Seek: We are in search of a Senior Software Engineer who excels at the intersection of advanced networking knowledge and software development. You will closely collaborate with SRE and infrastructure teams to define strategies and create the next generation of network automation, rooted in industry best practices and a focus on scalable, maintainable solutions. You possess a profound dedication to keeping networks healthy, efficient, and resilient. Your Responsibilities: Design, develop, and expand a global network platform that encompasses physical datacenters and multi-cloud environments, including AWS, Azure, and Alibaba Cloud. Support thousands of hosts across the globe, engineering reliable and efficient solutions to manage petabyte-scale data challenges. Take ownership of troubleshooting and resolving intricate network issues, ensuring high availability and performance across the entire infrastructure.
Join IMC as a Graduate Site Reliability Engineer, where you will play a crucial role in ensuring the efficiency and reliability of our cutting-edge, low-latency Linux trading environment. Balancing speed and quality is essential in our operations, and your contributions will be pivotal in automating various aspects of our platform.In this exciting position, you will embrace DevOps principles and implement SRE techniques to enhance IMC's operational capabilities. You will collaborate with a dynamic team, tackling the daily challenges of managing high-volume data flows in a complex distributed environment. Your role will involve addressing technical and trading challenges while ensuring the high availability, stability, and performance of our end-to-end systems.
AlayaCare
About the Role AlayaCare is hiring a Senior Site Reliability Engineer in Sydney, New South Wales. This role focuses on maintaining and improving the reliability and performance of AlayaCare’s software products. The position calls for strong experience in cloud infrastructure, automation, and monitoring. What You Will Do Support and enhance the reliability of software systems serving AlayaCare’s clients Apply expertise in cloud infrastructure to strengthen system stability Automate processes to streamline operations and reduce manual intervention Monitor system health and performance to identify and resolve issues quickly Location This position is based in Sydney, New South Wales, Australia.
Freelancer Ltd.
Join our dynamic Systems Engineering team as a Senior DevOps Engineer / Site Reliability Engineer, where your expertise will play a vital role in designing and delivering mission-critical services and systems. Collaborate closely with software engineers to manage infrastructure and services at scale, utilizing an array of cutting-edge technologies to support the high-traffic Freelance.com marketplace and various other business products deployed on Amazon Web Services. Our tech stack includes Nginx, MySQL, Redis, ElasticSearch, RabbitMQ, Consul, Docker, and Kubernetes, all aimed at building highly resilient, dynamically scalable, self-healing systems through automation and monitoring using Terraform, Puppet, Prometheus, Grafana, Kibana, and Jenkins.
About UsAt coreflow, we are transforming the entertainment industry through the power of AI. As one of the fastest-growing startups on a global scale, we proudly serve 20 million users in our first year. Our team thrives in a collaborative, in-person environment based in Sydney, Australia.We adhere to core principles that guide our work and influence every decision we make:User-First: Our focus is on creating products that resonate with our users. We dedicate time to understand their needs and prioritize delivering value.High Agency, High Ownership: We take full responsibility for our work, from start to finish. We learn from our mistakes and are committed to finding solutions without placing blame.Urgency: This is a unique opportunity in a fast-paced environment. We prioritize effectively, seek leverage, and maintain an inspiring pace of work.Your RoleAs our first dedicated Site Reliability Engineer, you will be pivotal in ensuring reliability and making core platform decisions as we scale to support hundreds of millions of users.Key ProjectsEnhance uptime and minimize RTO across essential services.Manage and strengthen GPU clusters that facilitate millions of AI generations daily.Establish platform-wide observability (metrics, tracing, alerting) and uphold SLOs.Refine AWS infrastructure to optimize costs while ensuring top-tier performance.Qualifications5+ years of experience in operating production systems at scale.Proficient in AWS (infrastructure as code, high-scale computing, K8s/ECS or similar).Strong background in observability and incident response.Expertise in CI/CD and deployment pipelines.Familiarity with our technology stack: TypeScript, Next.js, React, TailwindCSS, tRPC, Postgres, Temporal, AWS.A problem-solver who addresses root causes rather than just symptoms.A relentless drive to succeed; this role will challenge you.What We OfferCompetitive salary with significant growth potential.
The Platform Engineering division at IMC Trading is dedicated to enhancing the productivity of technology teams by offering self-service tools, comprehensive documentation, and robust support. This team is tasked with the design, construction, and upkeep of the foundational runtime platforms essential for IMC's software applications. Our mission is to optimize development workflows, create a consistent technical framework globally, and provide teams with the resources they need to innovate effectively.As a global entity, Platform Engineering serves as a crucial link between the technical demands of application development and the operational challenges of deploying and maintaining these applications in live environments. Our goal is to reduce friction and ensure that engineering teams can function seamlessly, driving our initiatives forward.We are currently seeking a dynamic and dedicated Site Reliability Engineer who will be pivotal in enhancing and optimizing our developer services infrastructure. You will become part of a highly skilled team that supports a range of critical systems, including source control, continuous integration pipelines, and observability tools, all integral to the stability and performance of our trading platforms.
Join Our TeamArista Networks is on the lookout for a talented Site Reliability Engineer (SRE) to enrich our Engineering Productivity (EngProd) team. You will play a pivotal role in maintaining and enhancing our growing infrastructure tailored for our internal user base. The ideal candidate will be adaptable, proactive, and eager to embrace new technologies. As part of our software engineering team, you will collaborate with fellow engineers to design, construct, and manage secure, scalable, and fault-tolerant tools within a hybrid cloud environment.In the EngProd group, you will work closely with engineers to architect, build, scale, and manage systems utilized by Arista’s product development teams. These systems incorporate industry-standard technologies such as Ansible, Artifactory, Gerrit, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, Varnish, and Perforce, along with bespoke internal systems designed to automate CI/CD, testing, analysis, and visualization.Your ResponsibilitiesSafely build, deploy, and operate critical production systems with an emphasis on scalability, reliability, observability, performance, and security.Monitor and enhance the developer experience across various services.Automate processes to minimize toil and streamline production operations.Proactively monitor, respond to, and improve alerts; establish automated alert handling.Draft and maintain incident response documentation.Triage platform and infrastructure issues, assisting Arista software engineers in their troubleshooting efforts while engaging with third-party vendor support.Compose postmortem reports and devise solutions to prevent recurrence of incidents.Plan and communicate maintenance schedules for production systems.Collaborate with product development teams to identify and resolve infrastructural bottlenecks affecting their workflows.Research and implement best practices for maintaining secure, scalable, and fault-tolerant systems.Analyze the design and implementation details of open-source systems to improve triage and resolution processes.
About UsAt Heidi Health, we believe that healthcare deserves a more harmonious approach—one that ensures continuous and deeply human care. Our mission is to develop an AI Care Partner that collaborates with clinicians to achieve this goal.Our diverse team comprises doctors, engineers, designers, researchers, and creatives dedicated to creating tools that empower clinicians to concentrate on what really counts: their patients.In just 18 months, we've reclaimed over 18 million hours for healthcare professionals, facilitating 73 million patient visits across 116 countries. Currently, over two million patient visits weekly are powered by Heidi around the globe.Supported by nearly $100 million in funding, we are expanding into the US, UK, Canada, and Europe. We collaborate with premier health systems, including the NHS, Beth Israel Lahey Health, and Monash Health.The PositionThe Senior Site Reliability Engineer will join our core Platform/SRE team responsible for production. You will directly engage in incident response, on-call duties, system reliability, and the daily operations of Heidi’s platform.We welcome strong mid-level SRE candidates eager to take on more responsibility, as well as seasoned SREs who thrive in hands-on operational roles. This position is purposefully operations-focused, with an emphasis on maintaining the health of real systems in production.Your ResponsibilitiesEngage in on-call and incident response: Address production incidents, assist in service restoration, and ensure clear communication during incidents, gradually taking on more leadership in managing incidents.Enhance operational reliability: Identify recurring issues and reliability risks, driving improvements through better alerting, automation, system adjustments, or process enhancements.Oversee components of the production environment: Manage and enhance Kubernetes clusters, cloud infrastructure, and core platform services, with increasing ownership as you gain experience.Bolster observability: Improve dashboards, alerts, logs, and traces to ensure quicker detection and diagnosis of issues, focusing on actionable insights.Minimize operational toil: Automate repetitive tasks, streamline runbooks, and enhance tooling to make on-call and daily operations more efficient and secure.
Join our dynamic global team of over 400 professionals dedicated to advancing open-source technology and multi-cloud solutions. Our mission is to empower developers, builders, and creators to realize their ideas swiftly and effortlessly by offering a cloud data platform that simplifies access to open-source databases, search, streaming, and application infrastructure.The Role:We are looking for an experienced Senior Site Reliability Engineering Manager to spearhead and nurture a talented team of Site Reliability Engineers based in the ANZ region. This team will ensure the reliability of Aiven's services for our customers by diligently monitoring the health of the Aiven platform throughout our 24/7/365 global operations. The team will also manage incident responses and collaborate with various units to address issues. The focus will be on software development and automation.This position will be located in Auckland, New Zealand, and we provide a relocation package for suitable candidates.What You'll Do:Collaborate with EMEA and NA-based SRE Managers to coordinate and enhance global operations.Lead and inspire a team of Site Reliability Engineers located in ANZ.Empower your team to drive significant software and process initiatives for the Aiven platform, including managing on-call rotations to ensure continuous team availability.Maintain an efficient operation where decisions are driven by metrics and data, prioritizing tasks accordingly.
Anduril Industries
Anduril Industries is a pioneering defense technology firm dedicated to enhancing U.S. and allied military capabilities through innovative technology. By integrating the expertise, technology, and business models of the 21st century's most forward-thinking companies into the defense sector, Anduril is revolutionizing the design, construction, and sales of military systems. Our advanced suite of systems is powered by Lattice OS, an AI-driven operating system that transforms vast data streams into a real-time, 3D command and control center. As we navigate a new era of strategic competition, Anduril is committed to delivering state-of-the-art autonomy, AI, computer vision, sensor fusion, and networking technology to the military in record time.At Anduril Australia, we are at the forefront of developing unmanned systems for subsurface and other domains, leveraging unsupervised autonomy to provide long-endurance, multi-mission capabilities to our clients. Our projects, including the Extra Large Autonomous Undersea Vehicle (XL-AUV) Program, require seasoned Senior Site Reliability Engineers with extensive experience in modern software development, delivery, and sustainment toolchains. The Senior Site Reliability Engineer will utilize their expertise to swiftly architect, design, deliver, support, and evolve the toolchains necessary for the next generation of autonomous systems.
Canva is looking for a Senior Software Engineer to join the Reliability Foundations team in Sydney. This position centers on strengthening the reliability and performance of Canva’s platform. Role overview This role involves designing, building, and refining systems to support resilient and high-performing services. The Senior Software Engineer will work closely with others to improve software delivery and support operational goals. What you will do Create and implement solutions that boost the reliability of Canva’s core platform. Optimize existing systems to maintain high performance and stability. Collaborate with cross-functional teams to advance software delivery processes and operational standards. Requirements Experience designing and maintaining reliable, scalable systems. Strong background in performance optimization and system resilience. Ability to work effectively with engineers and stakeholders across different teams.
Constantinople
Join Constantinople, a pioneering all-in-one software and operational platform designed specifically for banks. Our cloud-native infrastructure empowers banking clients to efficiently manage their customer operations. With a rich suite of banking products and extensive operational services, we simplify the complexities of banking.Our cutting-edge platform eliminates the burden of costly infrastructure, automating manual operational and compliance processes at scale. By minimizing operational complexities, we enable our clients to concentrate on their customers and enhance their banking services.The RoleAs a Release Engineer within our Site Reliability Engineering (SRE) team, you will take ownership of our platform release operations, particularly during the APAC timezone. You will collaborate closely with the Release Manager to execute and manage bi-weekly platform releases while enhancing and automating the release process. Your role will focus on boosting reliability across backend, web, and mobile deployments, as well as scaling release governance for multiple tenants.We have existing automation in place; your mission will be to refine these processes by eliminating manual gaps, mitigating risks, and increasing repeatability as we scale. You will closely collaborate with the SRE Squad and other teams to ensure safe and predictable releases across all environments.
Anduril Industries
At Anduril Industries, we are redefining defense technology with a commitment to enhance military capabilities through innovation. Our mission is to integrate advanced technologies into the U.S. and allied military frameworks, ensuring they are equipped to tackle contemporary challenges. With our proprietary Lattice OS—an AI-driven operating system—we are transforming data into actionable insights, creating a real-time, three-dimensional command and control environment. As we navigate a new era of strategic competition, we are dedicated to delivering pioneering solutions in autonomy, artificial intelligence, computer vision, and sensor fusion within accelerated timelines.Based in Sydney, Anduril Australia is at the forefront of developing unmanned systems, focusing on subsurface capabilities that utilize unsupervised autonomy for extended multi-mission operations. Our ongoing projects, including the Extra Large Autonomous Undersea Vehicle (XL-AUV) Program, necessitate the expertise of Site Reliability Engineers who possess substantial experience in modern software development, delivery, and sustainment toolchains. As a Senior Site Reliability Engineer, you will play a crucial role in architecting, designing, and evolving the toolchains essential for the deployment of next-generation autonomous systems.
Anduril Industries
Join Anduril Industries, a pioneering defense technology company, as a Senior Site Reliability Engineer. In this role, you will be at the forefront of transforming military capabilities through state-of-the-art software delivery toolchains, driving efficiency and security for next-generation autonomous systems. You will lead the design and implementation of cutting-edge solutions that enhance our unmanned systems, such as the Extra Large Autonomous Undersea Vehicle (XL-AUV) Program. Collaborate with a dynamic team of professionals, mentor fellow engineers, and advocate for best practices in security and performance.
At Unispace, we are dedicated to revolutionizing workplaces to cater to the varied requirements of today’s dynamic global brands. Our agile, end-to-end approach is executed seamlessly across borders, driven by robust client partnerships and backed by insightful workplace intelligence and cutting-edge technology.We believe that everyone deserves a work environment that boosts productivity, encourages peak performance, and supports overall well-being. Across our 48 global studios, we recognize that positive work environments foster open, collaborative, and supportive cultures, enriched by the contributions of independent and motivated individuals from diverse backgrounds.While our core business focuses on strategizing, designing, and constructing workplaces for some of the world’s leading brands, we understand that our people are our most valuable asset. We prioritize their growth and well-being through a comprehensive Employee Value Proposition program that encompasses rewards and recognition, career development opportunities, and a strong emphasis on mental and physical health.To further support our team, we embrace flexible working arrangements, including hybrid options that empower our employees to balance in-office collaboration with remote work flexibility. This approach not only enhances work-life balance but also enables us to harness the full potential of our diverse and talented workforce.Role ProfileAs a Site Manager, you will ensure the successful execution of high-quality construction projects from inception to completion. As a key member of our team, you will oversee on-site operations, ensuring that subcontractors, stakeholders, and the Unispace team are coordinated, and that projects are delivered on time, safely, and to the highest standards. Your leadership will be instrumental in bringing our innovative designs to fruition while upholding our dedication to safety, quality, and client satisfaction. By championing Unispace’s values and methodologies, you will play a vital role in creating exceptional spaces that surpass client expectations. Responsibilities include but are not limited toOn-Site Leadership: Oversee all on-site activities, ensuring that subcontractors and teams align with the project scope, drawings, and delivery strategy.Stakeholder Management: Serve as the primary on-site contact for clients, landlords, tenants, and agents, fostering transparent and professional communication.Quality & Program Control: Manage the construction program to meet deadlines, oversee quality assurance, and ensure compliance with safety regulations.
AECOM is hiring an Engineering Geologist / Geotechnical Engineer I in Sydney. This position plays a key part in delivering infrastructure and environmental sustainability projects throughout the region. Main responsibilities Assess site conditions by applying geological and geotechnical methods Carry out field investigations and interpret collected data Prepare recommendations to guide project planning and execution Collaborate with colleagues from various disciplines to achieve project objectives Teamwork and contribution Working within a multidisciplinary team, this role directly supports the success of engineering projects and the ongoing development of Sydney’s infrastructure. The work performed helps inform decisions that impact both project outcomes and the broader community.
At Neara, we are redefining the future of energy management. Imagine the ability to simulate and analyze the resilience of an entire power grid against extreme weather events before they happen. We are at the forefront of this innovation, employing cutting-edge machine learning technology to create engineering-grade, physics-enabled digital twins of electricity grids across four continents. This empowers asset owners to address their most pressing challenges and implement the most effective solutions across vast networks of infrastructure.Through our advanced simulations, we enable some of the world’s largest utilities to identify risks, optimize investments, and contribute to a more robust global energy future. Our team comprises passionate individuals dedicated to making a real-world impact by leveraging AI and machine learning to enhance everything from data classification to intricate scenario analyses. We have cultivated a vibrant culture of innovation where each team member takes ownership of our mission, and we are on the lookout for intelligent, creative individuals to help us expand our influence globally.As the Lead Engineer for Engineering Productivity, you will be pivotal in enhancing the performance and reliability of our development processes, ultimately improving the satisfaction and efficiency of our Software and Machine Learning engineers. Your focus will be on modernizing our systems to facilitate an AI-driven, resilient, and scalable platform.In this role, you will work collaboratively to design and implement state-of-the-art CI/CD pipelines, improve developer experience, shorten feedback loops, and integrate security and reliability into the development lifecycle. Your success will be defined by the happiness of our engineers, accelerated iterations, safer deployments, and an engineering organization that scales smoothly.You will also play a vital role in educating and empowering our engineering team with new tools and best practices, enabling them to build upon your initiatives for further scalability. Mentorship will be a key aspect of your role, fostering a collaborative and innovative environment that encourages continuous learning and personal growth.Join us at Neara, where your ideas are valued, your development is prioritized, and your successes are celebrated. If you have a results-driven mindset and are motivated to make a difference, we want to hear from you.
Join Canva as a Senior Engineering Manager in our Business Solutions Engineering team! In this pivotal role, you will lead a talented group of engineers to develop innovative solutions that empower businesses to thrive using our platform. Your expertise will guide the team in delivering high-quality software products while fostering a culture of collaboration, creativity, and continuous improvement.
Sign in to browse more jobs
Create account — see all 421 results

