About the job
Join our dynamic team at accellor as a Senior MLOps Engineer, where you will play a crucial role in architecting, building, and sustaining the infrastructure and pipelines essential for operationalizing AI and Machine Learning systems at scale. This position serves as a vital link between model development and production deployment, ensuring that ML and GenAI workloads are dependable, observable, cost-effective, and continuously optimized across enterprise environments.
Key Responsibilities
- Design and execute comprehensive ML pipelines involving data ingestion, feature engineering, model training, evaluation, and deployment.
- Develop and oversee CI/CD pipelines for ML models, incorporating automated testing, validation, and rollback strategies.
- Architect and sustain model serving infrastructure for real-time and batch inference workloads, including deployments of LLM and agentic AI.
- Implement systems for model monitoring, drift detection, and alerting to maintain production model reliability and health.
- Oversee experiment tracking, model versioning, and artifact registries to facilitate reproducibility and governance.
- Optimize computing costs and inference latency for both GPU and CPU workloads on cloud platforms (AWS, Azure, or GCP).
- Utilize Docker and Kubernetes for the containerization and orchestration of ML workloads.
- Automate data pipeline workflows and manage feature stores for training and inference processes.
- Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the transition from prototype to production.
- Establish and uphold MLOps best practices, standards, and documentation across the engineering team.
