company

Cloud Compiler Architect at d-Matrix | Santa Clara

d-MatrixSanta Clara
Hybrid Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proven experience in compiler design and development, particularly with MLIR and cloud-based AI inference. Strong understanding of distributed computing architectures and heterogeneous computing elements. Proficiency in programming languages relevant to compiler construction. Excellent problem-solving and analytical skills. Ability to work effectively in a collaborative, cross-functional team environment.

About the job

At d-Matrix, we are dedicated to harnessing the power of generative AI to revolutionize technology. Positioned at the cutting edge of software and hardware innovation, we continuously push the limits of possibility. Our workplace culture is built on respect and collaboration.

We champion humility and advocate for open communication. Our inclusive team thrives on diverse perspectives to foster superior solutions. We are on the lookout for passionate individuals eager to tackle challenges and driven by results. Are you ready to discover your creative playground? Together, we can explore the limitless potential of AI.

Location:

This position offers a hybrid work model, requiring onsite attendance at our Santa Clara, CA headquarters 3-5 days a week, or the option to work fully remote.

The Role: Compiler Architect

As a hands-on Software Compiler Architect specializing in cloud-based AI inference, you will lead the design and implementation of a scalable MLIR-based compiler framework tailored for deploying large-scale NLP and transformer models in cloud settings. You will architect the complete software pipeline that converts high-level AI models into efficient, low-latency executables on a distributed, multi-chiplet hardware platform, integrating heterogeneous compute elements such as in-memory tensor processors, vector engines, and hierarchical memory.

Your compiler designs will facilitate dynamic partitioning, scheduling, and deployment of inference workloads across a cloud-scale infrastructure, accommodating both statically compiled and runtime-optimized execution paths. Your focus will be on compiler strategies that reduce inference latency, enhance throughput, and optimize the use of compute and memory resources within data centers, alongside your work on compiler development.

Collaboration will be key as you work cross-functionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and efficient inference delivery at scale.

What You Will Do:

  • Architect the MLIR-based compiler for cloud inference workloads, emphasizing the efficient mapping of large-scale AI models (e.g., LLMs, Transformers, Torch-MLIR) onto distributed compute and memory hierarchies.

  • Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency-aware scheduling.

  • Design support for hybrid offline/online compilation and deployment flows with runtime-aware mapping, enabling adaptable execution strategies.

About d-Matrix

d-Matrix is at the forefront of generative AI technology, driving transformative innovations in both software and hardware. Our culture emphasizes respect, collaboration, and a commitment to fostering diverse perspectives, enabling us to develop solutions that reshape the technological landscape.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.