About the job
At Critical Manufacturing, we are committed to transforming Industry 4.0 into a tangible reality by delivering the most innovative, comprehensive, and modular Manufacturing Execution System (MES) software. Based in Maia, Porto District, Portugal, our global presence allows us to develop cutting-edge solutions for various sectors including Semiconductor, Electronics, and Medical Devices.
For the third consecutive year, we have been recognized as a Leader by Gartner, and we proudly operate as a part of ASMPT, the premier provider of leading equipment and technological process support in the electronics and semiconductor industries.
Your Role:
As a member of our dedicated AI engineering team, you will play a crucial role in building robust AI infrastructure tailored for manufacturing systems. This hands-on position involves developing Model Context Protocol (MCP) servers and creating essential tools for model observability, telemetry, and retraining pipelines, focusing on solid execution within a collaborative environment, without the need for leadership responsibilities.
This role is situated at our headquarters in Porto, Portugal, where collaboration, experimentation, and adherence to rigorous engineering standards are vital. You will actively engage in technical design reviews, architecture discussions, and foster collaboration with teams across Product, Data, and Platform Engineering. This is an opportunity for those who are passionate about constructing AI systems that are not only intelligent, but also observable, debuggable, and continuously evolving.
Your Responsibilities:
Develop MCP Servers
- Implement and maintain Model Context Protocol (MCP) servers that facilitate connections between language models and manufacturing domain tools and data sources.
- Optimize server performance and establish clear interfaces for tool integration, ensuring safe and reliable access for models to business logic.
- Work alongside team leads to translate complex manufacturing workflows into structured tools and prompts.
Build Model Observability and Telemetry Infrastructure
- Design and implement comprehensive telemetry systems to monitor model behavior, token usage, latency, and production costs.
- Develop dashboards and alert systems that provide real-time insights into model performance and anomalies.
- Instrument models to capture structured traces, including prompts, system context, tool invocations, inputs/outputs, intermediate artifacts, and decision metadata.
- Contribute to establishing standards for logging, tracing, and distributed observability across all AI systems.
Develop Retraining and Continuous Improvement Pipelines
- Create data collection pipelines that gather production interactions, model failures, and edge cases for retraining purposes.
