About the job
About the Role:
Join Wrike's Backend Reliability (BRE) team, a crucial component of our backend infrastructure and the guardian of our uptime. We aim to achieve and maintain 99.99% availability while developing the tools, components, and safety nets relied upon by our entire engineering organization. As a Senior / Staff Backend Engineer on this team, you will not merely address tickets but architect essential reliability solutions that influence how Wrike scales, operates, and recovers from failures.
Your Impact:
- Design, build, and maintain vital reliability components including HTTP rate limiters, internal DB schema migration tools, circuit breakers, and distributed Redis-based caching.
- Troubleshoot intricate production issues, optimize PostgreSQL usage, and ensure our distributed systems remain robust and stable under high load.
- Lead initial investigations during significant production incidents to identify probable root causes, assess impacts, and suggest mitigation strategies. Long-term solutions are then implemented by the responsible teams based on your insights.
- Develop scalable, reusable tools and frameworks aiding other engineering teams in building more resilient services.
- Utilize AI-driven tools and coding agents to expedite development, scrutinize architectures, and automate repetitive or error-prone tasks.
- Promote reliability best practices across engineering through knowledge sharing, design reviews, and establishing high technical standards.
Your Qualifications:
- Proficient in Java/JVM, with experience in building scalable, high-performance backend systems; willing to adopt other languages as necessary.
- Strong grasp of distributed systems concepts, including high availability, the CAP theorem, and fault tolerance.
- Extensive experience with relational databases (PostgreSQL) and non-relational storage solutions.

