About the job
Lead and execute software engineering initiatives to facilitate end-to-end machine learning operations, emphasizing the scaling of workflows for extensive datasets and distributed systems.
Architect, develop, and sustain MLOps infrastructures, including microservices, queuing systems, APIs, and orchestration workflows utilizing Python, Kubernetes, Kafka, and advanced database technologies.
Deploy observability tools such as Prometheus and Grafana to guarantee the reliability, performance, and visibility of machine learning systems in a production environment.
Work collaboratively with data scientists and machine learning engineers to optimize workflows for large language models, generative AI, and intelligent agent development and deployment.
Evaluate architecture and implementation strategies to ensure alignment with organizational objectives, scalability, and adherence to best practices.
Guide and mentor junior and mid-level engineers, promoting a culture of collaboration, innovation, and operational excellence.

