About the job
At d-Matrix, we are dedicated to harnessing the power of generative AI to revolutionize technology. Positioned at the cutting edge of software and hardware innovation, we continuously push the limits of possibility. Our workplace culture is built on respect and collaboration.
We champion humility and advocate for open communication. Our inclusive team thrives on diverse perspectives to foster superior solutions. We are on the lookout for passionate individuals eager to tackle challenges and driven by results. Are you ready to discover your creative playground? Together, we can explore the limitless potential of AI.
Location:
This position offers a hybrid work model, requiring onsite attendance at our Santa Clara, CA headquarters 3-5 days a week, or the option to work fully remote.
The Role: Compiler Architect
As a hands-on Software Compiler Architect specializing in cloud-based AI inference, you will lead the design and implementation of a scalable MLIR-based compiler framework tailored for deploying large-scale NLP and transformer models in cloud settings. You will architect the complete software pipeline that converts high-level AI models into efficient, low-latency executables on a distributed, multi-chiplet hardware platform, integrating heterogeneous compute elements such as in-memory tensor processors, vector engines, and hierarchical memory.
Your compiler designs will facilitate dynamic partitioning, scheduling, and deployment of inference workloads across a cloud-scale infrastructure, accommodating both statically compiled and runtime-optimized execution paths. Your focus will be on compiler strategies that reduce inference latency, enhance throughput, and optimize the use of compute and memory resources within data centers, alongside your work on compiler development.
Collaboration will be key as you work cross-functionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and efficient inference delivery at scale.
What You Will Do:
Architect the MLIR-based compiler for cloud inference workloads, emphasizing the efficient mapping of large-scale AI models (e.g., LLMs, Transformers, Torch-MLIR) onto distributed compute and memory hierarchies.
Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency-aware scheduling.
Design support for hybrid offline/online compilation and deployment flows with runtime-aware mapping, enabling adaptable execution strategies.

