About the job
Location: Remote within EMEA time zones (including Ukraine)
Start date: ASAP
Languages: Fluent English required
Industry: Cloud Computing, AI, European Deep-Tech SaaS
Role Overview
Pragmatike is hiring an ML Ops Engineer to help build the backbone of a distributed cloud infrastructure startup. This well-funded company focuses on AI-native cloud services, offering GPU-powered compute for machine learning workloads, secure storage, and high-speed data transfer. The platform relies on a decentralized architecture designed to reduce environmental impact compared to traditional cloud providers.
This position centers on designing and operating scalable ML inference platforms for real-time AI applications. The role involves close collaboration with infrastructure, platform, and applied AI teams to deliver high availability, low latency, and cost-efficient model serving. A production mindset and hands-on experience with distributed GPU systems are essential.
What You Will Do
- Build and maintain production-ready model serving infrastructure using frameworks such as vLLM, TGI, Triton, or similar tools.
- Design and implement deployment pipelines with blue/green and canary rollout strategies for machine learning models.
- Develop and support auto-scaling systems, multi-model serving solutions, and smart request routing layers.
- Optimize GPU utilization, memory usage, network throughput, and model artifact storage performance.
- Set up observability systems to monitor inference latency, throughput, GPU consumption, cost, and system health.
- Manage model registries and CI/CD pipelines to automate and standardize model deployments.
- Oversee the full ML systems lifecycle, from development through production operations, including on-call support.
- Shape engineering best practices and contribute to platform scalability as the company grows.
Requirements
- Proven experience in ML Ops and production model serving.
- Hands-on background with GPU systems and distributed computing frameworks.
- Skilled in deployment strategies and CI/CD pipeline management.
- Strong problem-solving abilities, especially in performance tuning and optimization.
- Comfort working collaboratively in a team-oriented, fast-moving setting.

