company

Platform Engineer (Cloud Site Reliability Engineer Operations)

Assurity Trusted SolutionsSingapore, Singapore, Singapore
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

To be successful in this role, you should possess:6+ years of experience in technology operations as an Infrastructure Engineer or Site Reliability Engineer, with a background in managing large-scale, mission-critical production systems. Strong expertise in DevOps methodologies and practices. Proficient knowledge in infrastructure automation and monitoring solutions. Experience in incident management and response. Ability to work in a collaborative environment with cross-functional teams.

About the job

Join our Digital Resiliency Engineering (DRE) team, where we fuse software and systems engineering to create and manage large-scale, distributed systems built for the Singapore Government. Our mission is to ensure that Government services are dependable, performant, and tailored to meet user needs.

We are seeking talented individuals with a robust background in DevOps, Infrastructure Engineering, or Site Reliability Engineering (SRE) who have experience managing critical production technology infrastructures at scale. If you are eager to collaborate with a team of skilled practitioners and industry leaders, we invite you to apply.

As a Platform Engineer, you will develop essential services for the observability and automation of infrastructure services. You will participate in an on-call rotation with fellow engineers, providing swift responses to significant incidents affecting critical Government services. Your role will involve offering technical leadership to the team while closely collaborating with technical leads to maintain highly available solutions. You will also mentor team members on managing the availability and performance of mission-critical services, developing automation, and establishing monitoring solutions to prevent reoccurring issues.

In this capacity, you will oversee the execution of project priorities, timelines, and deliverables. You will lead the design of key components, systems, and features aimed at enhancing the availability, scalability, latency, and efficiency of services designed and implemented by the Government.

Key Responsibilities:

  • Establish Service Level Indicators (SLIs), Service Level Objectives (SLOs), Error Budgets, and post-mortem incident processes.
  • Participate in an on-call roster to ensure the reliability and performance of critical Government services, providing operational support for large-scale distributed systems to effectively resolve incidents.
  • Analyze metrics and logs from operating systems and applications for capacity planning, performance tuning, and fault isolation.
  • Develop automation to manage services, infrastructure, and applications.
  • Enhance the reliability and quality of services through proactive monitoring.
  • Continuously measure and optimize system performance, advancing SRE practices.
  • Create an SRE playbook for government-wide reference.
  • Identify and evaluate emerging technologies that can foster innovation for the Government.
  • Collaborate within a cross-functional service team comprising software engineers, infrastructure engineers, DevOps, and other specialists.

About Assurity Trusted Solutions

Assurity Trusted Solutions is dedicated to providing innovative technology solutions designed to enhance the resilience and efficiency of Government operations in Singapore. Our team is committed to leveraging advanced engineering practices to ensure the highest standards of service delivery.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.