About the job
AI Architecture Intern - Inference
Location: San Jose, CA
Team: Architecture
About Etched
At Etched, we are pioneering the development of the world’s first AI inference system specifically designed for transformers, achieving over 10 times the performance and significantly reduced cost and latency compared to traditional systems. Our innovative ASICs empower the creation of groundbreaking products, enabling real-time video generation and advanced reasoning agents that are unattainable with conventional GPUs. Supported by substantial investments from leading venture capital firms and staffed by top-tier engineering talent, Etched is at the forefront of transforming the infrastructure of the fastest-growing industry.
The Role
We are in search of a motivated Architecture Intern to join our dynamic team, contributing to the design and optimization of next-generation AI accelerators. This role will involve developing and fine-tuning compute architectures that deliver outstanding performance and efficiency for transformer workloads. Throughout your internship, you will tackle cutting-edge architectural challenges and engage in performance modeling.
Key Responsibilities
Assist in adapting state-of-the-art models to our architecture and develop programming abstractions and testing capabilities for rapid model iteration.
Help enhance and scale Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
Contribute to the optimization of routing and communication layers utilizing Sohu’s collectives.
Employ performance profiling and debugging tools to pinpoint bottlenecks and correctness issues.
Gain a deep understanding of Sohu to collaboratively design hardware instructions and model architecture operations to maximize performance.
Implement high-performance software components for the Model Toolkit.
Qualifications
Currently pursuing a Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Applied Mathematics, or a related discipline.
Strong programming skills in Python and C++.
Familiarity with performance-sensitive or complex distributed software systems, such as Linux internals, accelerator architectures (e.g., GPUs, TPUs), and compilers.

