About the job
As a Site Reliability Engineer (SRE) focused on Globalization, you will play a pivotal role in ensuring the robustness and availability of our next-generation international infrastructure. As our client, a fast-growing global consumer internet platform, scales its operations across international markets, you'll be instrumental in building a resilient architecture that supports millions of users worldwide. This role involves working on multi-region architecture, global traffic routing, and large-scale distributed systems, directly influencing the reliability and scalability of our evolving platform.
Key Responsibilities:
- Global Architecture & Disaster Recovery: Collaborate in designing and implementing a global infrastructure architecture. Own cross-region architecture, disaster recovery (DR), and high availability (HA) capabilities. Enable critical systems for multi-region deployment, disaster recovery failover, and fault isolation.
- Overseas Infrastructure Platform Deployment & Operations: Build, deploy, operate, and optimize core infrastructure platforms in overseas regions, ensuring consistency and reliability between international and domestic environments.
- Reliability Engineering & Incident Response: Develop a comprehensive reliability engineering framework for international systems, including observability systems, incident response mechanisms, and root cause analysis processes.
- Internationalization Infrastructure Enablement: Understand overseas business requirements and architectural constraints to drive the implementation of infrastructure capabilities in global environments.
- Cross-Team Collaboration & System Alignment: Work closely with domestic infrastructure, product engineering, and platform teams to ensure alignment with internal architecture standards and best practices.

