Network Reliability Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About csit
At csit, we are dedicated to fostering innovation and excellence in network infrastructure. Our team is passionate about leveraging cutting-edge technologies to deliver reliable and efficient networking solutions for our clients.
Similar jobs
Search for Reliability Software Engineer In Risk
1,489 results
Squarepoint Capital
Position Overview:Join our Risk team as a Reliability Software Engineer, where you will be instrumental in maintaining the performance, stability, and availability of our Risk software systems. The Risk platform at Squarepoint is essential for position management, profit/loss computation, inventory management, and internal order routing. These vital systems must handle high volumes of trading data efficiently and reliably, necessitating strong software development capabilities and analytical skills.Your primary focus will be on developing firm-wide platforms aimed at enhancing Squarepoint's observability, preventing functional and performance regressions, and automating operational processes. You will implement domain-specific logic tailored for various Risk sub-teams using these platforms. Examples of our projects include:Observability: Our health check platform simplifies the implementation of health checks across teams at Squarepoint. It supports generic health checks set up through configuration, as well as a 'plug-n-play' architecture for custom health checks.Preventing functional/performance regressions: We are creating a platform to automate benchmarking by managing job scheduling, hardware resources, metric collection, reporting results, and integrating with GitLab.Automation: We are developing a self-service automation platform allowing users to request system configuration changes via a Jira portal, which automatically schedules jobs to apply approved changes.Operational continuity is vital; therefore, our responsibilities include:Level-2 support: Each team member participates in a daily support rotation, prioritizing incident response during business hours over project work.
Airwallex
About AirwallexAirwallex stands as the premier unified payments and financial platform catering to global enterprises. Our unique blend of proprietary infrastructure and cutting-edge software empowers over 200,000 businesses worldwide—including notable names like Brex, Rippling, Navan, Qantas, and SHEIN—with comprehensive solutions for managing business accounts, payments, spend management, treasury, and embedded finance on a global scale.Founded in Melbourne, we boast a vibrant team of over 2,000 innovative tech professionals across 26 global offices. Valued at US$8 billion, our growth is supported by top-tier investors such as T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital. Join us to embark on the most ambitious journey of your career.Attributes We ValueWe seek dynamic builders with an entrepreneurial spirit who crave real impact, accelerated learning, and authentic ownership. You should possess robust expertise in your role, sharp analytical skills, and a passion for our mission and operating principles. You thrive in fast-paced environments, utilizing sound judgment, curiosity, and a first-principles approach to decision-making.Collaboration and humility define you as you transform innovative ideas into tangible products, ensuring that you “get stuff done” from start to finish. By leveraging AI, you work smarter and solve challenges swiftly. Here, you will engage with complex, high-stakes problems alongside exceptional teammates and advance your career as we shape the future of global banking. If this resonates with you, let's create what comes next.About the TeamThe Client Risk Team at Airwallex is dedicated to proactively identifying and managing client-level financial crime and credit risk across all products. Our objective is to provide a cohesive, data-driven perspective of each client by integrating internal and external insights. We are committed to utilizing advanced machine learning and large language models (LLMs) to enhance decision intelligence, automate operational processes, and facilitate responsible global business growth.What You’ll DoAs a Software Engineer in Client Risk, you will architect and develop scalable risk monitoring systems leveraging AI, ML, and LLM technologies. Your role will involve automating review processes, consolidating risk insights, and creating unified client management tools. Additionally, you will define and implement evaluation frameworks to enhance our systems.
Join Airwallex as a Senior Software Engineer in Client RiskAt Airwallex, we are redefining the global financial landscape with our innovative payments and financial platform tailored for businesses worldwide. We harness a unique blend of proprietary technology and software to support over 200,000 enterprises, empowering them to integrate solutions for business accounts, payments, spend management, treasury operations, and embedded finance on a global scale.Founded in Melbourne and now with a diverse team of over 2,000 talented professionals across 26 global offices, we are backed by top-tier investors such as T. Rowe Price, Visa, and Mastercard. Valued at $8 billion, we are at the forefront of building the future of global banking. If you are eager to tackle ambitious challenges and make a significant impact, we invite you to join our team.Attributes We ValueWe seek builders with an entrepreneurial spirit who are eager to learn, take ownership, and make a difference. You possess deep expertise in your field, are driven by our mission, and align with our operating principles. You act with purpose, make informed decisions, and maintain a balance between speed and thoroughness.Collaboration and humility are key; you turn visionary ideas into tangible products and drive projects from inception to completion. By utilizing AI, you will enhance productivity and address challenges efficiently. Join us in solving complex problems alongside exceptional colleagues while advancing your career within a dynamic environment.About the Client Risk TeamThe Client Risk Team is crucial for proactively identifying and managing client-specific financial crimes and credit risks associated with Airwallex's offerings. We aim to create a comprehensive, data-informed profile for each client by integrating insights from various sources. Our team is committed to leveraging cutting-edge machine learning and large language models (LLMs) to improve decision-making processes, automate workflows, and facilitate responsible global business growth.Role ResponsibilitiesIn this role, you will engineer scalable risk monitoring systems utilizing AI, ML, and LLM technologies. Your contributions will include automating review processes, consolidating risk insights, and developing cohesive client management tools. You will also play a pivotal role in defining and implementing innovative solutions that enhance our risk management capabilities.
About AirwallexAirwallex stands as the leading unified payments and financial platform designed for global enterprises. Leveraging our unique blend of proprietary technology and software, we empower over 200,000 businesses worldwide, including industry leaders like Brex, Rippling, Navan, Qantas, and SHEIN. Our comprehensive solutions encompass business accounts, payments, spend management, treasury services, and embedded finance on a global scale.Founded in Melbourne, we boast a team of over 2,000 talented and innovative tech professionals across 26 global offices. With a valuation of US$8 billion and support from world-class investors such as T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is at the forefront of revolutionizing the global payments and financial landscape. If you are prepared to embark on the most ambitious journey of your career, we invite you to join us.Attributes We ValueWe seek exceptional builders with an entrepreneurial mindset who are eager to make a significant impact, accelerate their learning, and take true ownership. You should possess strong expertise relevant to your role, analytical thinking, and a passion for our mission and operating principles. Your ability to act swiftly with sound judgment, explore in-depth with curiosity, and make decisions based on foundational principles while balancing speed and thoroughness is essential.We value humility and collaboration; transforming innovative ideas into tangible products while ensuring that you effectively deliver results from start to finish. Utilizing AI to enhance productivity and streamline problem-solving is encouraged. Here, you will confront complex, high-visibility challenges alongside exceptional teammates and grow your career as we shape the future of global banking. If this resonates with you, let's create what comes next together.About the TeamThe Risk Platform team at Airwallex is charged with overseeing risk management across all Airwallex products, including GTPN, PA, Issuing, Onboarding, and Account Takeover. The risk environment is continuously evolving, with fraudsters employing increasingly sophisticated tactics. We are at the cutting edge of innovation in risk management.Our mission is to safeguard Airwallex's products and services, establishing Airwallex as a trusted partner for businesses globally. We utilize advanced technologies such as graph databases, machine learning, and large language models to refine and advance our strategies.Collaboration with other teams and our customers worldwide is integral to ensuring a comprehensive approach to risk management.
About AirwallexAirwallex stands out as the premier unified payments and financial platform designed specifically for global enterprises. Our innovative blend of proprietary infrastructure and advanced software empowers over 200,000 businesses, including renowned names like Brex, Rippling, Navan, Qantas, and SHEIN, to seamlessly manage their business accounts, payments, spending, treasury, and embedded finance solutions on a global scale.Founded in the vibrant city of Melbourne, our team comprises over 2,000 exceptional and forward-thinking professionals across 26 offices worldwide. With a valuation of US$8 billion and support from top-tier investors such as T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is at the forefront of revolutionizing the future of global payments and finance. If you're ready to embark on the most ambitious journey of your career, we invite you to join us.
Control Risks
Role PurposeAt Control Risks, our Digital Risks team leverages our extensive global presence and local expertise across 42 offices worldwide to empower our clients in navigating the dynamic threat landscape. We assist organizations in safeguarding their vital assets, effectively managing crises, and optimizing technology adoption for strategic advantage.The Principal will spearhead business development and consulting initiatives, enhancing our capabilities to assist senior stakeholders in addressing complex digital risk challenges. This role focuses on delivering impactful consulting results and supporting the growth of high-performing teams within the Digital Risks practice.Key ResponsibilitiesStrategic Advisory & Client Leadership• Act as a trusted advisor to senior executives and boards, offering strategic insights on digital risk, resilience, regulatory requirements, and emerging technology threats.• Simplify complex technical and risk concepts into actionable recommendations for non-technical audiences.• Lead the creation and presentation of executive briefings, strategic assessments, and transformation roadmaps aligned with clients’ long-term risk and security strategies.Digital Risk, Cybersecurity & Technology Advisory Delivery• Manage and guide intricate consulting projects focused on digital risk, cybersecurity, technology governance, and organizational resilience.• Oversee maturity assessments, control reviews, and risk analyses using recognized industry frameworks and regulatory models.• Deliver actionable, practical, and business-aligned recommendations to enhance clients’ risk management and resilience profiles.• Ensure high-quality standards and clarity in all deliverables through rigorous quality assurance and oversight.Regulatory, Governance & Compliance Advisory• Counsel clients on regional and global regulatory requirements concerning cybersecurity, data governance, technology operations, and third-party risk.• Collaborate with external specialists to integrate their expertise into comprehensive advisory solutions.Risk, Crisis & Incident Advisory Support• Provide senior-level support during digital incidents, investigations, or technology-related crises.• Employ crisis management principles, stakeholder communication strategies, and risk assessment methodologies to assist clients in navigating disruptive events.• Facilitate scenario planning, simulation exercises, and initiatives aimed at building resilience.Leadership of Consulting Engagements• Supervise multiple concurrent projects, ensuring consistent quality, methodological rigor, and high client satisfaction.• Lead project teams effectively to achieve strategic objectives.
Control Risks
Join a prestigious global technology firm as an Intelligence Analyst, where you will play a critical role in providing first-line support for employee travel and events. You will engage in real-time monitoring, assist travelers, perform horizon scanning, and offer technical intelligence support. This position is pivotal in ensuring effective triage, escalation, and communication during evolving situations.As part of a dedicated global team, you will provide 24/7 coverage for our clients, which may require working on weekends and holidays based on the schedule established by the team.Key Responsibilities:Monitor travel-related developments affecting employees, including executives.Deliver first-line assistance to travelers and facilitate issue resolution.Coordinate secure transportation and vetted third-party vendors.Support pre-trip advisories, briefings, and connect with travelers.Assist with travel-related incidents and provide support for distressed travelers.Review upcoming events and prepare situational summaries, including geofencing, tactical intelligence, and weather assessments.Conduct targeted monitoring during event periods and escalate any concerns.Establish and manage mass-notification groups for events.Monitor real-time alerts, evaluate their relevance, and escalate according to established workflows.Draft concise alerts, situation updates, and rapid assessments.Provide on-demand support to internal intelligence teams.
Airwallex
About AirwallexAirwallex is a revolutionary financial technology company, providing a unified payments and financial platform tailored for global enterprises. With our unique blend of proprietary infrastructure and cutting-edge software, we empower over 200,000 businesses worldwide—including industry leaders like Brex, Rippling, Navan, Qantas, and SHEIN—with fully integrated solutions for managing business accounts, payments, spend management, and treasury, as well as embedded finance on a global scale.Founded in Melbourne, our dynamic team of over 2,000 innovative tech professionals operates across 26 global offices. Valued at US$8 billion and supported by leading investors such as T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is at the forefront of transforming the future of global payments and finance. If you're eager to embark on the most ambitious journey of your career, we invite you to join us.Attributes We ValueWe seek builders with an entrepreneurial spirit who are driven to make a significant impact, accelerate their learning, and embrace true ownership. You should possess strong expertise in your field, exhibit sharp analytical skills, and be motivated by our mission and operating principles. You thrive in fast-paced environments, demonstrate sound judgment, and adopt a curious mindset, making decisions based on foundational principles while balancing speed and thoroughness.Humility and collaboration are key traits; you are adept at transforming innovative ideas into tangible products and possess a strong bias for action. Leveraging AI, you work smarter and resolve challenges more efficiently. Here, you'll confront complex, high-visibility issues alongside exceptional colleagues and advance your career as we redefine the future of global banking. If this resonates with you, let's create what's next together.About the TeamThe Risk Platform team at Airwallex is tasked with managing risk across all our products, including GTPN, PA, Issuing, Onboarding, and Account Takeover. As the risk landscape continually evolves and fraud becomes increasingly sophisticated, we are pioneers in risk management innovation.Our mission is to ensure the safety and security of Airwallex's products and services, establishing trust with businesses globally. We utilize advanced technologies, such as graph databases, machine learning, and large language models to enhance our strategic approach.Collaboration is essential as we work with other teams and customers worldwide to implement a comprehensive risk management strategy.
Join our innovative team at csit as a Network Reliability Engineer, where you'll be instrumental in constructing robust and resilient network infrastructures utilizing state-of-the-art technologies, including cloud-based solutions and software-defined networking such as SD-WAN, ACI, and NSX. A solid understanding of IT infrastructure systems and familiarity with the latest advancements in networking technologies and platforms are essential. We seek a collaborative team player eager to embrace new challenges and stay updated with the rapidly changing technology landscape.
About k-IDk-ID stands at the forefront of privacy-first compliance and age verification infrastructure, recognized as one of TIME’s Best Inventions of 2025 and a Tech Pioneer by the World Economic Forum. As a recipient of Fast Company’s accolade for the Next Big Things in Tech, we are creating the Age Layer for the internet—a vital framework that empowers digital platforms to seamlessly verify age and manage global compliance.Our flagship platform, driven by the Compliance Development Kit (CDK) and AgeKit, serves as the trusted engine for the world’s foremost game publishers and digital ecosystems. We transform fragmented, manual compliance processes into a unified API that efficiently handles age verification, parental consent, and regulatory logic across over 200 markets. Supported by leading venture capital firms such as a16z and Lightspeed, k-ID is entering an exciting growth phase to set the benchmark for global digital safety.About the RoleWe are seeking a dynamic Senior Site Reliability Engineer to ensure k-ID's reliability at scale. This pivotal role resides within our production backbone, where you will take ownership of and enhance the systems that maintain the availability, observability, security, and resilience of our platform as we experience traffic growth and an expanding client base.You will engage in infrastructure, tooling, deployment workflows, incident response, and systems design to ensure our scalability without compromise. This position is not about closing tickets; we seek a proactive individual who can assess systems, pinpoint vulnerabilities, and fortify them. You should possess a keen understanding of failure modes, blast radius, deployment safety, recovery time, cost efficiency, and the realities of managing production systems under pressure. Comfort with coding, automating processes, and collaborating closely with engineers to enhance reliability through improved architecture and operational practices is essential.
Join our innovative team at csit as a Network Reliability Staff Engineer, where you will play a crucial role in developing robust network infrastructure. You will leverage advanced technologies, including cloud-based solutions and software-defined networking, such as SD-WAN, ACI, and NSX. A solid understanding of IT infrastructure systems and familiarity with the latest networking technologies is essential. As a technical expert within our team, you will be encouraged to embrace new challenges and stay updated with the rapidly changing technology landscape.
As a Site Reliability Engineer (SRE) and Environment Engineer in the Banking sector, you will play a pivotal role in enhancing application reliability and operational efficiency.This contract position based in Singapore involves:- Managing the software deployment lifecycle, from development to production, ensuring systematic release schedules.- Overseeing multiple test environments on the Bank’s core platform, ensuring proper configurations and connectivity of satellite applications.- Supporting IT project executions through comprehensive test executions and regressions.- Conducting regular health checks to ensure system connectivity, consistency, and data integrity across all testing environments.- Coordinating deployment processes for both production and test environments and provisioning environments for various testing phases.- Maintaining proactive communication with stakeholders about environment statuses, managing expectations, and highlighting risks and issues.- Collaborating with global teams to support environment-related changes.- Reviewing and executing deployment instructions accurately for both production and test environments.- Working closely with the change manager to coordinate all releases.
As a Site Reliability Engineer (SRE) focused on Globalization, you will play a pivotal role in ensuring the robustness and availability of our next-generation international infrastructure. As our client, a fast-growing global consumer internet platform, scales its operations across international markets, you'll be instrumental in building a resilient architecture that supports millions of users worldwide. This role involves working on multi-region architecture, global traffic routing, and large-scale distributed systems, directly influencing the reliability and scalability of our evolving platform.Key Responsibilities:Global Architecture & Disaster Recovery: Collaborate in designing and implementing a global infrastructure architecture. Own cross-region architecture, disaster recovery (DR), and high availability (HA) capabilities. Enable critical systems for multi-region deployment, disaster recovery failover, and fault isolation.Overseas Infrastructure Platform Deployment & Operations: Build, deploy, operate, and optimize core infrastructure platforms in overseas regions, ensuring consistency and reliability between international and domestic environments.Reliability Engineering & Incident Response: Develop a comprehensive reliability engineering framework for international systems, including observability systems, incident response mechanisms, and root cause analysis processes.Internationalization Infrastructure Enablement: Understand overseas business requirements and architectural constraints to drive the implementation of infrastructure capabilities in global environments.Cross-Team Collaboration & System Alignment: Work closely with domestic infrastructure, product engineering, and platform teams to ensure alignment with internal architecture standards and best practices.
About k-IDk-ID is a pioneer in privacy-first compliance and age verification infrastructure, setting the standard for digital safety. We were celebrated as one of TIME’s Best Inventions of 2025, recognized as a Tech Pioneer by the World Economic Forum, and featured in Fast Company’s Next Big Things in Tech. Our mission is to create the Age Layer for the internet—a crucial infrastructure enabling digital platforms to verify age and manage compliance seamlessly across global markets.Powered by our Compliance Development Kit (CDK) and AgeKit, our core platform is the trusted backbone for the world’s leading game publishers and digital ecosystems, streamlining fragmented compliance through a unified API that efficiently handles age verification, parental consent, and regulatory requirements in over 200 markets. Supported by esteemed venture capital firms, including a16z and Lightspeed, k-ID is poised for significant growth.About The RoleWe are seeking a Lead Site Reliability Engineer and NOC Lead to spearhead production reliability and operational excellence across our platform.In this senior position, you will be accountable for the reliability, availability, observability, and operational maturity of k-ID’s systems while leading the Network Operations Center (NOC) function. Your role extends beyond merely responding to incidents; you will build systems, processes, tools, and team standards that minimize incident frequency and severity, ensuring rapid resolution when they occur.This role surpasses our senior NOC hires, as we need someone capable of establishing the operational model for the NOC, enhancing technical standards for incident management, collaborating closely with engineering leadership, and driving the long-term reliability roadmap for the business. You should be adept at transitioning between hands-on technical tasks, operational leadership, incident command, and team development.
Join AvePoint as a Senior Splunk Engineer focused on Automation and Reliability Engineering Projects!Project OverviewContribute to Automation and Reliability Engineering efforts and operations.Key Responsibilities:Oversee Observability Engineering and Governance initiatives.Design and maintain enterprise SIEM solutions compliant with operational resilience frameworks (e.g., MAS TRM, DORA, APRA CPS 230).Lead the deployment, configuration, and optimization of Splunk for comprehensive visibility across infrastructure, applications, networks, and user experiences.Establish and uphold telemetry data governance standards—including metrics, logs, and traces—to ensure consistency, compliance, and security.Integrate Splunk with incident management, ITSM, and AIOps systems for predictive alerting and anomaly detection.Serve as the SIEM/Splunk subject matter expert (SME) for architecture reviews, upgrades, and performance enhancements.Reliability Engineering and Automation:Implement and advocate for Site Reliability Engineering (SRE) frameworks and reliability practices for critical systems.Design and automate runbooks, alerts, and self-healing workflows using Python, Ansible, and Terraform.Collaborate with Application, Infrastructure, and Cyber teams to incorporate reliability principles into the delivery lifecycle.Conduct resilience, chaos, and capacity testing in accordance with business continuity and disaster recovery standards.Define and monitor error budgets, reliability scorecards, and service health indicators for production workloads.Cloud & Platform Integration:Engineer SIEM solutions for cloud-native workloads in AWS and Azure, ensuring visibility across compute, storage, and network layers.Integrate Splunk and cloud observability tools into CI/CD pipelines and landing zones for continuous compliance.Implement infrastructure-as-code (IaC) models using Terraform and Ansible for consistent and auditable provisioning.Work alongside Cloud, DevOps, and Security teams to ensure telemetry aligns with audit, compliance, and operational risk requirements.Operational Excellence and Collaboration:Drive reductions in incident recurrence, Mean Time to Recovery (MTTR), and manual intervention through observability-led automation.Partner with Service Delivery, Cyber, and Application teams to facilitate predictive incident prevention and root cause transparency.Develop and maintain executive dashboards and reports highlighting availability, reliability KPIs, and operational risk indicators.
Intrinsic Robotics
Intrinsic Robotics, a pioneering venture under Alphabet, is on a mission to redefine the capabilities of industrial robotics. Our team is committed to harnessing advancements in AI, perception, and simulation to revolutionize the field of industrial robotics, placing software and data at its core.Our goal is to make intelligent robotics accessible and usable for countless businesses, entrepreneurs, and developers. We are a vibrant group of engineers, roboticists, designers, and technologists dedicated to unlocking the creative and economic potential of industrial robotics.RoleAs the Senior Software Engineering Lead, you will spearhead an engineering team focused on ensuring the resilience and reliability of our robotics software platform. You will architect systems that proactively identify and mitigate vulnerabilities across our APIs, SDKs, web interfaces, and cloud-to-edge communication layers. By simulating diverse scenarios, including AI model inference timeouts, high network latencies, data pipeline congestion, and malformed inputs, you will guarantee that our platform remains robust and predictable, even under adverse conditions. You will lead a specialized team in developing automated frameworks that replicate real-world disruptions, thereby providing a solid infrastructure for developers crafting the next generation of AI-driven robotics. Your responsibilities will also encompass the creation of essential monitoring tools to gain in-depth insights into overall system health. As a pivotal technical leader, you will collaborate with elite engineering teams in Mountain View and Munich to align resilience strategies and establish global benchmarks for software reliability.How Your Work Propels Our MissionDesign and implement automated resilience tests focusing on service boundaries and hybrid environments (both on-prem and Cloud).Enhance the robustness of AI integrations by applying failure injection within data pipelines.Utilize fuzzing and property-based testing techniques across the platform to ensure efficient degradation.Strengthen the stability of developer tools and frontend systems against latency and service interruptions.Foster a culture of reliability through mentorship in defensive programming and by leading global "Game Day" exercises.Develop observability tools to monitor and analyze comprehensive system health.
We are seeking a dedicated Site Reliability Engineer (SRE) to enhance and maintain the availability of our trading binary systems. This role requires you to be on duty during the European team's off-hours, ensuring uninterrupted operations.Your Key Responsibilities:Overseeing operational management of trading activities, with a focus on proactive monitoring.Managing incidents, including rapid escalation and mitigation strategies.Participating in on-call duty to address critical issues.Performing debugging tasks using C++ and Python, along with classifying issues effectively.Developing observability metrics and trading analytics to support our trading systems.Keeping abreast of financial and technical news by reading relevant materials and monitoring exchange newsletters.Our Ideal Candidate Will Have:A Bachelor’s degree in a quantitative field such as Computer Science, Engineering, Physics, or Mathematics.At least 5 years of experience in a Site Reliability Engineering role.Programming proficiency in Python or Go is preferred.Strong knowledge of Unix systems.Experience deploying, configuring, and managing Linux-based servers, including Docker, Kubernetes, and Grafana.Ability to identify opportunities for platform improvements within a complex technical landscape.Exceptional communication skills, capable of engaging with both internal teams and external clients.Proficiency in English at B2/Upper-Intermediate level or higher.A proactive approach and willingness to learn about new domains.
Airwallex
About Airwallex Airwallex is a global payments and financial platform serving businesses worldwide. Over 200,000 companies, including Brex, Rippling, Navan, Qantas, and SHEIN, use Airwallex to manage business accounts, payments, spend management, treasury, and embedded finance solutions across borders. Founded in Melbourne, Airwallex has grown to more than 2,000 employees in 26 offices. The company is valued at US$8 billion and backed by investors such as T. Rowe Price, Visa, Mastercard, and Sequoia. What We Look For Airwallex values builders with an entrepreneurial mindset. The team seeks individuals with deep expertise who use analytical thinking to tackle challenges. Quick, thoughtful decision-making and curiosity are important here. Collaboration and humility matter, as does the ability to turn ideas into real solutions. Experience with AI to improve efficiency is a plus. At Airwallex, engineers work on complex, high-impact problems with skilled colleagues, helping shape the future of global finance. About the Risk Platform Team The Risk Platform team protects Airwallex products such as GTPN, PA, Issuing, Onboarding, and Account Takeover. As threats become more sophisticated, this group develops new strategies for risk management. The team’s mission is to keep Airwallex’s offerings secure, building trust with businesses around the world. Technologies in use include graph analytics, machine learning, and large language models. The team works closely with other groups and clients globally to stay ahead of emerging risks.
About ClickHouseRanked among the 2025 Forbes Cloud 100, ClickHouse stands as a leading innovator in the private cloud sector. With a customer base exceeding 3,000 and an annual recurring revenue (ARR) growth of over 250% year-on-year, we excel in real-time analytics, data warehousing, observability, and AI workloads.Our recent $400 million Series D funding round underscores our rapid growth and momentum. In just three months, renowned clients like Capital One, Lovable, Decagon, Polymarket, and Airwallex have adopted or expanded their use of our platform. They join industry giants such as Meta, Cursor, Sony, and Tesla who rely on our technology.We invite you to join us on our mission to revolutionize the way organizations harness their data!About the RoleAs we aim to provide our customers with dependable and secure services, we are expanding our Site Reliability Engineering team. In this role, you will lead initiatives to guarantee the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborating with teams across Control Plane, Data Plane, Core, Security, Support, and Operations, you will guide the design and implementation of scalable, secure, and resilient distributed systems. You will also oversee incident management, conduct post-mortem analyses, and drive continuous improvements in our Cloud services. Utilizing your software engineering skills, you will develop platforms and tools to enhance operational and engineering efficiencies in ClickHouse Cloud. This position offers a unique chance to significantly contribute to the high-performance, elastic, and limitless scale of ClickHouse Cloud.What Will You Do?Work collaboratively with various engineering teams at ClickHouse to design and implement scalable, secure, and highly available systems.Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.Ensure comprehensive monitoring and alerting for all infrastructure components in ClickHouse Cloud, enabling timely incident detection and resolution.Refine incident response processes and conduct post-mortem analyses for outages, partnering with the support team to communicate effectively with affected customers.Continuously enhance the reliability and performance of our ClickHouse services.Plan and lead Chaos Engineering initiatives to identify potential vulnerabilities.
Key Responsibilities:• Develop and implement comprehensive test plans and test cases for our infrastructure platforms.• Create and manage automated testing suites for diverse infrastructure components.• Conduct both manual and automated testing to ensure the quality and reliability of our systems.• Analyze testing results and report defects with detailed reproduction steps.• Collaborate with development and operations teams to enhance testing processes and continuously refine testing methodologies and tools.• Document testing procedures meticulously and maintain up-to-date test documentation.• Track and report on test coverage alongside quality metrics.• Employ Chaos Engineering practices to uncover system vulnerabilities.• Contribute to the formulation of Service Level Objectives (SLOs) and error budgets.
Sign in to browse more jobs
Create account — see all 1,489 results

