About the job
Join Rhapsody’s global Cloud Operations team as a Senior Cloud Engineer (Windows), where you will play a critical role in maintaining the stability, reliability, and operational excellence of our Windows-based workloads hosted on AWS. This position combines hands-on Windows systems engineering with cloud infrastructure automation, deep troubleshooting, deployment support, and collaboration with customers. You will work closely with Engineering, SRE, Security, and Global Operations teams to ensure high availability, swift incident responses, and ongoing improvements across Windows services and allied cloud components.
Key Responsibilities
- Deploy, manage, and secure Windows Server workloads on AWS (including EC2, ASG, Launch Templates, EBS, and AMIs).
- Oversee Windows services such as Active Directory/AD DS, AWS Directory Service, Group Policy, DNS, IIS, SMB file services, certificates/PKI, and RDP access.
- Implement infrastructure-as-code using Terraform, Windows scripting, and AWS CLI; ensure consistent golden images and configuration baselines.
- Utilize AWS Systems Manager to manage and standardize Windows fleets effectively.
- Develop and maintain PowerShell scripts/modules for provisioning, configuration, maintenance, and diagnostics.
- Collaborate on shared Terraform modules and CI/CD deployments to minimize manual operations.
- Monitor Windows server performance and cloud metrics leveraging CloudWatch, Datadog, Event Logs, and performance counters.
- Conduct in-depth troubleshooting across the Windows OS, IIS, Active Directory, Group Policy, Kerberos/NTLM authentication, and network/firewall issues.
- Diagnose hybrid connectivity challenges involving VPNs, load balancers, and security configurations.
- Establish secure configuration baselines and enforce monthly patch/update cycles for Windows environments.
- Partner with the Security team on detections, log analysis, and endpoint protections.
- Maintain comprehensive documentation (SOPs, runbooks, standards, diagrams) and adhere to change and incident management processes.
- Engage directly with internal and external customers to resolve application issues and environment configurations.
- Support onboarding, migrations, deployments, and post-incident reviews.
- Work with SRE/Engineering teams on observability, tuning, resiliency, and cost optimization.

