About the job
We are seeking a highly skilled Senior Platform / DevOps Engineer with expertise in Real-time Media, WebRTC, Edge Computing, and Cloud Technologies. Join our dynamic team in Bengaluru as we develop a robust LiveKit-like real-time communications platform designed to scale to millions of simultaneous calls. This position offers an exciting opportunity to work with cutting-edge technologies while ensuring ultra-low latency and cloud reliability.
About the Role:
In this hands-on role, you will take ownership of production systems, focusing on performance and resilience. We are particularly interested in candidates with experience in scaling real-time and streaming infrastructures.
Key Responsibilities:
- Ensure the reliability and performance of signaling, SFU/media nodes, TURN, routing, failover, and capacity planning.
- Build and manage multi-region Kubernetes platforms with secure networking and zero-downtime deployments.
- Design edge and cloud architecture including PoPs, global routing, failover, autoscaling, and disaster recovery.
- Implement SLOs/SLIs, incident response, postmortems, and maintain operational excellence.
- Develop strong observability practices, including metrics, logs, tracing, and real-time QoE/latency metrics.
Preferred Qualifications:
- Proven experience with Kubernetes at scale (multi-cluster/multi-region).
- Strong foundation in Linux and networking fundamentals (UDP/TCP, NAT, conntrack, DNS, load balancing).
Nice to Have:
- Experience in WebRTC/RTC operations (ICE, STUN/TURN, SFU scaling, packet loss/jitter tuning).
- Knowledge of Edge/PoP and traffic management (global routing, Anycast/DNS strategies).
- Familiarity with cost optimization for bandwidth-heavy workloads.
- Previous experience operating real-time/streaming systems at high concurrency levels.
Success Criteria:
- Ability to maintain a real-time system's stability through traffic spikes, packet loss, ISP variability, and region failures.
- Understanding of latency budgets, concurrency, bandwidth, and packet throughput, beyond just pods and nodes.
- Capability to create platforms that are observable, automatable, and easy to manage.

