About the job
Contribute to a Safer Future.
TRM Labs is at the forefront of blockchain analytics and AI technology, empowering law enforcement, financial institutions, and cryptocurrency enterprises to identify and combat cryptocurrency-related fraud and financial crime. Our innovative blockchain intelligence and AI tools are designed to trace fund flows, pinpoint illicit activities, build comprehensive cases, and provide actionable insights into potential threats. Trusted by prominent agencies and organizations globally, TRM is committed to fostering a safer and more secure environment for everyone.
Join our dynamic AI Engineering Team, dedicated to pioneering next-generation AI applications, with a particular emphasis on Large Language Models (LLMs) and agent-based systems. Our objective is to create efficient pipelines, high-caliber infrastructure, and operational tools that facilitate the rapid, safe, and scalable deployment of AI systems.
We oversee petabyte-scale data pipelines, deliver models with millisecond latency, and ensure the observability and governance necessary to make AI production-ready. Our team actively evaluates and integrates cutting-edge technologies in the LLM and agent domains, utilizing open-source stacks, vector databases, evaluation frameworks, and orchestration tools that enhance TRM’s agility and innovation capacity.
As a Senior or Staff AI Infrastructure Engineer, you will play a pivotal role in constructing and scaling the technical framework for AI and ML systems. Your responsibilities will include:
Developing reusable CI/CD workflows for model training, evaluation, and deployment, integrating tools like Langfuse, GitHub Actions, and experiment tracking systems.
Automating model versioning, approval workflows, and compliance checks across various environments.
Building a modular and scalable AI infrastructure stack, encompassing vector databases, feature stores, model registries, and observability tools.
Collaborating with engineering and data science teams to embed AI models and agents into real-time applications and workflows.
Continuously assessing and integrating state-of-the-art AI tools (e.g., LangChain, LlamaIndex, vLLM, MLflow, BentoML).
Driving AI reliability and governance, facilitating experimentation while ensuring compliance, security, and uptime.
Enhancing the performance of AI and ML models.
Ensuring data accuracy, consistency, and reliability for improved model training and inference.
Deploying infrastructure to support both offline and online evaluations of LLMs and agents.

