About the job
About Air Apps
Air Apps builds tools to help people plan and manage their resources more effectively. Founded in 2018 in Lisbon, Portugal, the company is developing an AI-powered Personal & Entrepreneurial Resource Planner (PRP). With a strong family-oriented culture and a focus on innovation, Air Apps has reached over 100 million downloads worldwide. The team is committed to advancing AI solutions that make a real difference in daily life.
Site Reliability Engineer (SRE) – Madrid
This onsite role is based in Madrid. Air Apps offers relocation support for candidates moving to join the team.
What You Will Do
- Design and build systems that are scalable, reliable, and fault-tolerant across cloud platforms.
- Develop and manage observability tools such as Prometheus, Grafana, Datadog, or ELK for monitoring, logging, and alerting.
- Automate infrastructure provisioning, deployments, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Improve system performance, scalability, and incident response processes to maximize uptime.
- Work closely with development and DevOps teams to strengthen system reliability.
- Lead root cause analysis (RCA) and put preventive measures in place to reduce failures.
- Maintain high availability by designing and supporting load balancing, failover, and disaster recovery strategies.

