Infrastructure Security Engineer
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
About csit
csit is a forward-thinking technology company specializing in cutting-edge infrastructure solutions. We pride ourselves on our dynamic team environment and commitment to innovation, providing our clients with secure and reliable services tailored to meet their evolving needs.
Similar jobs
Search for Site Reliability Engineer Platform And Environment Engineer Banking
1,403 results
As a Site Reliability Engineer (SRE) and Environment Engineer in the Banking sector, you will play a pivotal role in enhancing application reliability and operational efficiency.This contract position based in Singapore involves:- Managing the software deployment lifecycle, from development to production, ensuring systematic release schedules.- Overseeing multiple test environments on the Bank’s core platform, ensuring proper configurations and connectivity of satellite applications.- Supporting IT project executions through comprehensive test executions and regressions.- Conducting regular health checks to ensure system connectivity, consistency, and data integrity across all testing environments.- Coordinating deployment processes for both production and test environments and provisioning environments for various testing phases.- Maintaining proactive communication with stakeholders about environment statuses, managing expectations, and highlighting risks and issues.- Collaborating with global teams to support environment-related changes.- Reviewing and executing deployment instructions accurately for both production and test environments.- Working closely with the change manager to coordinate all releases.
The RoleJoin Pave Bank, where we are pioneering the future of programmable banking by merging traditional banking services with digital assets on a single, regulated platform. We are seeking a dynamic Site Reliability Engineer (SRE) to play a critical role in ensuring our core systems are consistently available, scalable, and high-performing as we expand.As a Site Reliability Engineer at Pave Bank, you will collaborate closely with our Engineering, Product, Security, and Operations teams to develop robust infrastructure, automate operational tasks, and uphold reliability across all services. Your contributions will significantly influence the safety, performance, and scalability of our banking platform, enabling customers to place their trust in Pave Bank for their financial needs.Key ResponsibilitiesOversee, maintain, and enhance the reliability, availability, and performance of our production systems and services.Design and sustain infrastructure as code (IaC), deployment pipelines, and automation processes to facilitate continuous delivery, scalability, and disaster recovery.Address incidents, conduct root-cause analyses, and lead postmortems to ensure that lessons learned are effectively implemented.Establish and uphold operational best practices including observability, logging, metrics, alerting, capacity planning, failover strategies, and backups.Collaborate with Engineering, Product, Compliance, and Operations teams to ensure that our infrastructure aligns with reliability, compliance, and security standards.Assist in service scaling, database operations, cloud infrastructure (preferably GCP), networking, and microservices orchestration.Document operational runbooks, on-call procedures, and system architecture to support maintenance, knowledge sharing, and compliance.QualificationsTechnical Skills and ExperienceProficient in programming or scripting languages such as Go, Python, Bash, or similar for automation and tooling.Hands-on experience with cloud infrastructure, preferably Google Cloud Platform (GCP).Familiar with containerization and orchestration technologies (Docker, Kubernetes, etc.).Experience with infrastructure-as-code tools (Terraform, Cloud Deployment Manager, etc.).
About k-IDk-ID stands at the forefront of privacy-first compliance and age verification infrastructure, recognized as one of TIME’s Best Inventions of 2025 and a Tech Pioneer by the World Economic Forum. As a recipient of Fast Company’s accolade for the Next Big Things in Tech, we are creating the Age Layer for the internet—a vital framework that empowers digital platforms to seamlessly verify age and manage global compliance.Our flagship platform, driven by the Compliance Development Kit (CDK) and AgeKit, serves as the trusted engine for the world’s foremost game publishers and digital ecosystems. We transform fragmented, manual compliance processes into a unified API that efficiently handles age verification, parental consent, and regulatory logic across over 200 markets. Supported by leading venture capital firms such as a16z and Lightspeed, k-ID is entering an exciting growth phase to set the benchmark for global digital safety.About the RoleWe are seeking a dynamic Senior Site Reliability Engineer to ensure k-ID's reliability at scale. This pivotal role resides within our production backbone, where you will take ownership of and enhance the systems that maintain the availability, observability, security, and resilience of our platform as we experience traffic growth and an expanding client base.You will engage in infrastructure, tooling, deployment workflows, incident response, and systems design to ensure our scalability without compromise. This position is not about closing tickets; we seek a proactive individual who can assess systems, pinpoint vulnerabilities, and fortify them. You should possess a keen understanding of failure modes, blast radius, deployment safety, recovery time, cost efficiency, and the realities of managing production systems under pressure. Comfort with coding, automating processes, and collaborating closely with engineers to enhance reliability through improved architecture and operational practices is essential.
As a Site Reliability Engineer (SRE) focused on Globalization, you will play a pivotal role in ensuring the robustness and availability of our next-generation international infrastructure. As our client, a fast-growing global consumer internet platform, scales its operations across international markets, you'll be instrumental in building a resilient architecture that supports millions of users worldwide. This role involves working on multi-region architecture, global traffic routing, and large-scale distributed systems, directly influencing the reliability and scalability of our evolving platform.Key Responsibilities:Global Architecture & Disaster Recovery: Collaborate in designing and implementing a global infrastructure architecture. Own cross-region architecture, disaster recovery (DR), and high availability (HA) capabilities. Enable critical systems for multi-region deployment, disaster recovery failover, and fault isolation.Overseas Infrastructure Platform Deployment & Operations: Build, deploy, operate, and optimize core infrastructure platforms in overseas regions, ensuring consistency and reliability between international and domestic environments.Reliability Engineering & Incident Response: Develop a comprehensive reliability engineering framework for international systems, including observability systems, incident response mechanisms, and root cause analysis processes.Internationalization Infrastructure Enablement: Understand overseas business requirements and architectural constraints to drive the implementation of infrastructure capabilities in global environments.Cross-Team Collaboration & System Alignment: Work closely with domestic infrastructure, product engineering, and platform teams to ensure alignment with internal architecture standards and best practices.
About k-IDk-ID is a pioneer in privacy-first compliance and age verification infrastructure, setting the standard for digital safety. We were celebrated as one of TIME’s Best Inventions of 2025, recognized as a Tech Pioneer by the World Economic Forum, and featured in Fast Company’s Next Big Things in Tech. Our mission is to create the Age Layer for the internet—a crucial infrastructure enabling digital platforms to verify age and manage compliance seamlessly across global markets.Powered by our Compliance Development Kit (CDK) and AgeKit, our core platform is the trusted backbone for the world’s leading game publishers and digital ecosystems, streamlining fragmented compliance through a unified API that efficiently handles age verification, parental consent, and regulatory requirements in over 200 markets. Supported by esteemed venture capital firms, including a16z and Lightspeed, k-ID is poised for significant growth.About The RoleWe are seeking a Lead Site Reliability Engineer and NOC Lead to spearhead production reliability and operational excellence across our platform.In this senior position, you will be accountable for the reliability, availability, observability, and operational maturity of k-ID’s systems while leading the Network Operations Center (NOC) function. Your role extends beyond merely responding to incidents; you will build systems, processes, tools, and team standards that minimize incident frequency and severity, ensuring rapid resolution when they occur.This role surpasses our senior NOC hires, as we need someone capable of establishing the operational model for the NOC, enhancing technical standards for incident management, collaborating closely with engineering leadership, and driving the long-term reliability roadmap for the business. You should be adept at transitioning between hands-on technical tasks, operational leadership, incident command, and team development.
We are seeking a dedicated Site Reliability Engineer (SRE) to enhance and maintain the availability of our trading binary systems. This role requires you to be on duty during the European team's off-hours, ensuring uninterrupted operations.Your Key Responsibilities:Overseeing operational management of trading activities, with a focus on proactive monitoring.Managing incidents, including rapid escalation and mitigation strategies.Participating in on-call duty to address critical issues.Performing debugging tasks using C++ and Python, along with classifying issues effectively.Developing observability metrics and trading analytics to support our trading systems.Keeping abreast of financial and technical news by reading relevant materials and monitoring exchange newsletters.Our Ideal Candidate Will Have:A Bachelor’s degree in a quantitative field such as Computer Science, Engineering, Physics, or Mathematics.At least 5 years of experience in a Site Reliability Engineering role.Programming proficiency in Python or Go is preferred.Strong knowledge of Unix systems.Experience deploying, configuring, and managing Linux-based servers, including Docker, Kubernetes, and Grafana.Ability to identify opportunities for platform improvements within a complex technical landscape.Exceptional communication skills, capable of engaging with both internal teams and external clients.Proficiency in English at B2/Upper-Intermediate level or higher.A proactive approach and willingness to learn about new domains.
Join our innovative team as an Infrastructure Platform Engineer, where you will play a pivotal role in researching, designing, constructing, and optimizing distributed systems and platforms that empower our internal developers and products. A solid grasp of IT architecture, systems design, application and systems integration, and site reliability engineering (SRE) is essential for success in this role.
Become part of our vision to transform the employee technology experience. As a Workplace Platform Engineer, you will be integral to a dynamic team responsible for designing and managing platforms that support thousands of employee devices. Your focus will be on ensuring these platforms are efficient, secure, and create an exceptional digital experience for users.This position transcends conventional IT support. You will be tasked with architecting scalable systems, streamlining processes through automation, and implementing Site Reliability Engineering (SRE) principles to foster innovation in workplace technology. Collaborating with cross-functional teams encompassing security, infrastructure, and software development will be key to delivering a frictionless, zero-touch experience for all employees.If you are enthusiastic about leveraging the latest technologies and driving impactful engineering projects, we invite you to connect with us.
Key Responsibilities:• Develop and implement comprehensive test plans and test cases for our infrastructure platforms.• Create and manage automated testing suites for diverse infrastructure components.• Conduct both manual and automated testing to ensure the quality and reliability of our systems.• Analyze testing results and report defects with detailed reproduction steps.• Collaborate with development and operations teams to enhance testing processes and continuously refine testing methodologies and tools.• Document testing procedures meticulously and maintain up-to-date test documentation.• Track and report on test coverage alongside quality metrics.• Employ Chaos Engineering practices to uncover system vulnerabilities.• Contribute to the formulation of Service Level Objectives (SLOs) and error budgets.
About ClickHouseRanked among the 2025 Forbes Cloud 100, ClickHouse stands as a leading innovator in the private cloud sector. With a customer base exceeding 3,000 and an annual recurring revenue (ARR) growth of over 250% year-on-year, we excel in real-time analytics, data warehousing, observability, and AI workloads.Our recent $400 million Series D funding round underscores our rapid growth and momentum. In just three months, renowned clients like Capital One, Lovable, Decagon, Polymarket, and Airwallex have adopted or expanded their use of our platform. They join industry giants such as Meta, Cursor, Sony, and Tesla who rely on our technology.We invite you to join us on our mission to revolutionize the way organizations harness their data!About the RoleAs we aim to provide our customers with dependable and secure services, we are expanding our Site Reliability Engineering team. In this role, you will lead initiatives to guarantee the reliability, availability, scalability, and performance of our cloud infrastructure. Collaborating with teams across Control Plane, Data Plane, Core, Security, Support, and Operations, you will guide the design and implementation of scalable, secure, and resilient distributed systems. You will also oversee incident management, conduct post-mortem analyses, and drive continuous improvements in our Cloud services. Utilizing your software engineering skills, you will develop platforms and tools to enhance operational and engineering efficiencies in ClickHouse Cloud. This position offers a unique chance to significantly contribute to the high-performance, elastic, and limitless scale of ClickHouse Cloud.What Will You Do?Work collaboratively with various engineering teams at ClickHouse to design and implement scalable, secure, and highly available systems.Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.Ensure comprehensive monitoring and alerting for all infrastructure components in ClickHouse Cloud, enabling timely incident detection and resolution.Refine incident response processes and conduct post-mortem analyses for outages, partnering with the support team to communicate effectively with affected customers.Continuously enhance the reliability and performance of our ClickHouse services.Plan and lead Chaos Engineering initiatives to identify potential vulnerabilities.
Join our innovative team at Pinely, where we are building a top-tier Site Reliability Engineering (SRE) group from the ground up! As the SRE Lead for TradeOps, you will be instrumental in shaping our infrastructure and processes, beginning as a hands-on individual contributor from day one. As we grow, you will have the opportunity to lead a talented team dedicated to enhancing our trading operations.Key Responsibilities:Oversee the operational management of trading activities, ensuring robust monitoring and availability.Lead incident management efforts, including rapid escalation and effective mitigation strategies.Participate in on-call rotations to provide immediate support when needed.Debug and classify issues within our trading systems, utilizing your expertise in C++ and Python.Develop observability metrics and analytics to enhance trading performance and reliability.Stay updated with financial and technical news, actively engaging with exchange newsletters.
About AirwallexAirwallex is a pioneering global payments and financial platform, uniquely designed to streamline operations for businesses around the world. With our exceptional blend of proprietary technology and software, we empower over 200,000 companies globally—including industry leaders such as Brex, Rippling, Navan, Qantas, and SHEIN—with integrated solutions that cover everything from business accounts and payment processing to spend management and treasury operations, all tailored for a global audience.Founded in Melbourne, Airwallex boasts a diverse team of over 2,000 innovative professionals across 26 offices worldwide. With a valuation of US$8 billion and support from top-tier investors including T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia Capital, Salesforce Ventures, DST Global, and Lone Pine Capital, we are at the forefront of transforming the future of global finance. If you're ready to take on the most ambitious challenges of your career, we invite you to join us.Attributes We ValueWe seek builders with entrepreneurial spirit eager to make a significant impact, accelerate their learning, and take true ownership of their work. You should possess strong expertise in your field, complemented by analytical thinking and a passion for our mission and operating principles. You are quick to act with sound judgment, driven by curiosity to explore deeply, and you make informed decisions based on foundational principles, balancing speed with thoroughness.Collaboration and humility are vital traits; you can transform initial ideas into fully realized products and ensure tasks are completed efficiently. You leverage AI to enhance productivity and solve challenges swiftly. In this role, you will tackle intricate, high-profile challenges alongside exceptional colleagues, advancing your career while we build the future of global banking. If this resonates with you, let’s create the future together.About the TeamThe Engineering team at Airwallex comprises a vibrant mix of innovators, builders, and problem solvers committed to empowering businesses to operate without constraints. We thrive in a collaborative, fast-paced environment, relentlessly pushing the boundaries of what’s achievable in the fintech sector. Our focus is on technical excellence, continual learning, and a profound sense of ownership, all while creating scalable, reliable, and secure products that enable businesses to expand globally.Our Site Reliability Engineering (SRE) team is paving the way for innovative engineering solutions, addressing a variety of challenges and setting a benchmark for other teams to emulate. This team is accountable for the availability, performance, and reliability of our systems, ensuring seamless operations across our platform.
Assurity Trusted Solutions
Join our Digital Resiliency Engineering (DRE) team, where we fuse software and systems engineering to create and manage large-scale, distributed systems built for the Singapore Government. Our mission is to ensure that Government services are dependable, performant, and tailored to meet user needs.We are seeking talented individuals with a robust background in DevOps, Infrastructure Engineering, or Site Reliability Engineering (SRE) who have experience managing critical production technology infrastructures at scale. If you are eager to collaborate with a team of skilled practitioners and industry leaders, we invite you to apply.As a Platform Engineer, you will develop essential services for the observability and automation of infrastructure services. You will participate in an on-call rotation with fellow engineers, providing swift responses to significant incidents affecting critical Government services. Your role will involve offering technical leadership to the team while closely collaborating with technical leads to maintain highly available solutions. You will also mentor team members on managing the availability and performance of mission-critical services, developing automation, and establishing monitoring solutions to prevent reoccurring issues.In this capacity, you will oversee the execution of project priorities, timelines, and deliverables. You will lead the design of key components, systems, and features aimed at enhancing the availability, scalability, latency, and efficiency of services designed and implemented by the Government.Key Responsibilities:Establish Service Level Indicators (SLIs), Service Level Objectives (SLOs), Error Budgets, and post-mortem incident processes.Participate in an on-call roster to ensure the reliability and performance of critical Government services, providing operational support for large-scale distributed systems to effectively resolve incidents.Analyze metrics and logs from operating systems and applications for capacity planning, performance tuning, and fault isolation.Develop automation to manage services, infrastructure, and applications.Enhance the reliability and quality of services through proactive monitoring.Continuously measure and optimize system performance, advancing SRE practices.Create an SRE playbook for government-wide reference.Identify and evaluate emerging technologies that can foster innovation for the Government.Collaborate within a cross-functional service team comprising software engineers, infrastructure engineers, DevOps, and other specialists.
Join AvePoint as a Site Reliability Engineer (SRE) and play a crucial role in the development and management of a Whole-of-Government (WoG) runtime platform. We are looking for a dedicated engineer who is passionate about enhancing infrastructure and ensures optimal performance.In this role, you will design and manage robust infrastructure utilizing GitLab, AWS, and Kubernetes solutions, focusing on the stability, scalability, and performance of our platform.Key Responsibilities:Toil Reduction & Automation: Identify repetitive tasks and implement automation through CI/CD pipelines to minimize manual processes and enhance operational efficiency.Observability & System Health: Develop comprehensive observability solutions (logs, metrics, traces, alerts) focusing on the four Golden Signals: latency, traffic, errors, and saturation. Build automation for proactive system health evaluations and self-remediation.Production Support & Incident Management: Engage in on-call rotations, respond swiftly to incidents to reduce MTTR, and conduct thorough post-incident analyses to bolster system resilience.Security & Compliance: Collaborate with security teams to design and implement secure and compliant solutions, perform regular audits, and integrate advanced vulnerability scanning tools.Maintenance, Optimization & Performance: Identify and rectify performance bottlenecks, define and track KPIs (e.g., MTTR, system uptime, cost efficiency), and drive continuous optimization efforts.Strategic Customer Engagement: Serve as a technical advisor for tenants, guiding them on containerization and best practices for cloud-native deployments while participating in strategic initiatives to enhance scalability and performance.Knowledge Sharing & Documentation: Create and maintain detailed playbooks, runbooks, and documentation to promote team-wide knowledge sharing and streamline incident responses.Continuous Learning & Innovation: Stay abreast of industry trends and innovations to enhance our operational practices and technologies.
Join Our Team at Plaud Inc.Plaud is at the forefront of developing the world’s most reliable AI work companion, designed to enhance productivity through innovative note-taking solutions. Since our inception in 2023, we have gained the trust of over 1.5 million users globally. Our mission is to amplify human intelligence by constructing advanced interfaces and infrastructures that capture, extract, and utilize information from various forms of communication.Headquartered in San Francisco and incorporated in Delaware, Plaud Inc. is pioneering the integration of human and AI intelligence through an innovative hardware-software blend. We adhere to the highest standards of data security and privacy protection, maintaining ISO 27001, ISO 27701, GDPR, SOC 2, HIPAA, and EN 18031 compliance.Why Join Plaud?Experience working with a bootstrapped, rapidly growing company, achieving a remarkable $250 million revenue run rate in just three years.Help define the future of human-AI interaction.Engage with cutting-edge AI technologies and play a pivotal role in our global expansion efforts.Collaborate with a passionate team that values innovation and customer success.Advance your career in a culture that promotes continuous learning and development.
Cyber Security Agency of Singapore (CSIT)
Join the Cyber Security Agency of Singapore (CSIT), where we are dedicated to enhancing the nation's security through innovative technology solutions. As a Platform & Site Reliability Engineer focusing on our Database and Search Platform, you will be at the forefront of crucial operations such as counter-terrorism and computer network defense. Collaborate with a talented team and leverage cutting-edge technologies to safeguard Singapore's future.
Accion Labs
Join Accion Labs as a Build and Test Environment Support Engineer in Singapore. In this role, you will be responsible for providing essential support for build and test environments, ensuring smooth operation and maintenance of crucial testing frameworks. Collaborate with cross-functional teams to troubleshoot issues, implement solutions, and maintain high-quality standards in our development processes.
Join our innovative team at csit as a Senior Platform Engineer, where you will spearhead the development of advanced testing platform services designed to help organizations stay ahead of evolving cyber threats. In this pivotal role, you will architect and implement a flexible grid of both physical and virtual devices, granting developers and testers on-demand, scalable access to a diverse array of devices and environments for thorough testing. Your expertise in designing complex systems will be essential, as you will ensure compatibility across various device types, architectures, and communication protocols, all while prioritizing high availability, scalability, and robust security measures.
We are looking for a skilled Platform Engineer specializing in JBoss EAP to join our dynamic team at fuku. This role is pivotal in managing and enhancing our JBoss EAP-based Java application platform. Ideal candidates will prioritize reliability, performance, security, and scalability, ensuring high-quality application delivery in a fast-paced Banking/Fintech environment.Key Responsibilities:- Architect, implement, and maintain robust platform infrastructure for Java applications utilizing JBoss EAP and OpenJDK.- Develop, improve, and oversee CI/CD pipelines to automate build, test, and deployment processes with tools like GitLab and Octopus Deploy.- Collaborate with software engineers, architects, and DevOps teams to outline platform requirements and deliver resilient, production-ready solutions.- Monitor and refine platform performance and availability using monitoring tools such as Dynatrace, Prometheus, Grafana, and ELK.- Implement Infrastructure as Code (IaC) using frameworks like Terraform or Ansible.- Uphold platform security, stability, and compliance with both internal policies and external standards.- Troubleshoot and resolve platform-related issues across development, testing, staging, and production environments.
Join our innovative team as an Infrastructure Security Engineer, where your expertise will be invaluable in designing, building, and maintaining robust infrastructure security services. You will play a vital role in ensuring the reliability, availability, and security of our infrastructure platform by implementing advanced automation, effective monitoring, and rapid incident response strategies. A strong understanding of IT architecture, cybersecurity practices, and site reliability engineering (SRE) is essential, along with analytical skills to troubleshoot and resolve security incidents efficiently.
Sign in to browse more jobs
Create account — see all 1,403 results

