companyAvePoint logo

Senior Splunk Engineer for Automation and Reliability Engineering

AvePointSingapore
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Senior

Qualifications

Qualifications:Proven experience with Splunk, SIEM, and observability tools. Strong understanding of cloud environments, particularly AWS and Azure. Demonstrated skills in Python, Ansible, and Terraform. Experience in implementing SRE frameworks and reliability engineering practices. Excellent problem-solving skills and ability to work collaboratively in a team setting. Strong communication skills to explain complex technical concepts clearly.

About the job

Join AvePoint as a Senior Splunk Engineer focused on Automation and Reliability Engineering Projects!

Project Overview

  • Contribute to Automation and Reliability Engineering efforts and operations.
  • Key Responsibilities:
  • Oversee Observability Engineering and Governance initiatives.
  • Design and maintain enterprise SIEM solutions compliant with operational resilience frameworks (e.g., MAS TRM, DORA, APRA CPS 230).
  • Lead the deployment, configuration, and optimization of Splunk for comprehensive visibility across infrastructure, applications, networks, and user experiences.
  • Establish and uphold telemetry data governance standards—including metrics, logs, and traces—to ensure consistency, compliance, and security.
  • Integrate Splunk with incident management, ITSM, and AIOps systems for predictive alerting and anomaly detection.
  • Serve as the SIEM/Splunk subject matter expert (SME) for architecture reviews, upgrades, and performance enhancements.
  • Reliability Engineering and Automation:
  • Implement and advocate for Site Reliability Engineering (SRE) frameworks and reliability practices for critical systems.
  • Design and automate runbooks, alerts, and self-healing workflows using Python, Ansible, and Terraform.
  • Collaborate with Application, Infrastructure, and Cyber teams to incorporate reliability principles into the delivery lifecycle.
  • Conduct resilience, chaos, and capacity testing in accordance with business continuity and disaster recovery standards.
  • Define and monitor error budgets, reliability scorecards, and service health indicators for production workloads.
  • Cloud & Platform Integration:
  • Engineer SIEM solutions for cloud-native workloads in AWS and Azure, ensuring visibility across compute, storage, and network layers.
  • Integrate Splunk and cloud observability tools into CI/CD pipelines and landing zones for continuous compliance.
  • Implement infrastructure-as-code (IaC) models using Terraform and Ansible for consistent and auditable provisioning.
  • Work alongside Cloud, DevOps, and Security teams to ensure telemetry aligns with audit, compliance, and operational risk requirements.
  • Operational Excellence and Collaboration:
  • Drive reductions in incident recurrence, Mean Time to Recovery (MTTR), and manual intervention through observability-led automation.
  • Partner with Service Delivery, Cyber, and Application teams to facilitate predictive incident prevention and root cause transparency.
  • Develop and maintain executive dashboards and reports highlighting availability, reliability KPIs, and operational risk indicators.

About AvePoint

AvePoint is a leading provider of SaaS solutions that empower organizations to manage, protect, and optimize their Microsoft 365 environments. We are committed to fostering innovation and ensuring operational resilience across all our projects.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.