About the job
Join Anthelion Capital
Anthelion Capital is an innovative investment firm dedicated to creating a cutting-edge AI and data platform that streamlines the entire investment lifecycle, from underwriting to portfolio management. Our platform harnesses both structured and unstructured data, advanced analytics, and automated workflows to achieve exceptional, risk-adjusted returns in private credit and structured finance.
As a team of engineers and investors, we are committed to transforming how institutional investment decisions are made—ensuring they are executed faster, smarter, and with greater transparency.
Your Role
We are seeking a talented Machine Learning Infrastructure/Platform Engineer to develop the foundational systems that support our data science and AI platform.
You will engage with the underlying infrastructure that drives our ML and AI workflows, including data pipelines, orchestration, compute provisioning, model serving, and observability. Additionally, you will be instrumental in operationalizing our agentic AI platform, ensuring agents are efficiently hosted, monitored, and integrated into production-level systems.
Key Responsibilities
Data Pipelines & Orchestration
- Design, implement, and manage robust production data pipelines that process and deliver structured and unstructured data to downstream ML workflows.
- Take ownership of our Prefect-based orchestration framework, focusing on flow scheduling, error management, retry mechanisms, and human-in-the-loop (HITL) workflows.
- Build and oversee feature stores, data contracts, and promotion workflows to guarantee data quality and traceability from raw ingestion to model consumption.
- Collaborate with data scientists to convert experimental workflows into reliable and repeatable pipelines.
ML/AI Infrastructure & Deployment
- Establish and maintain scalable infrastructure for model training, retraining, and inference (both batch and real-time), including GPU compute provisioning and container orchestration.
- Implement and oversee model serving infrastructure, such as containerized endpoints, API gateways, and self-service deployment frameworks for the data science team.
- Deploy and manage monitoring systems to track model performance, data drift, prediction usage, and pipeline reliability.
- Ensure all deployed systems are highly available, resilient, and well-documented, with clear data lineage and operational runbooks.
Agentic AI Platform & Tooling
- Assist in the development and operationalization of agentic AI workflows, including agent hosting, lifecycle management, and integration with Model Content.

