About the job
You will serve as a proactive leader, combining extensive technical knowledge in ML systems and performance optimization with robust leadership and team management capabilities. Your responsibilities will include recruiting, mentoring, and fostering a high-performing engineering team while cultivating an environment that promotes innovation, collaboration, and ongoing improvement.
Key Responsibilities:
- Team Leadership & Management:
- Build, lead, and manage a high-performing team of ML and infrastructure engineers focused on acceleration.
- Offer technical guidance, mentorship, and career development opportunities to team members.
- Encourage a collaborative and inclusive team culture.
- Establish team objectives, priorities, and roadmaps in alignment with organizational goals.
- Technical Strategy & Execution:
- Define the overarching technical vision and strategy for ML acceleration within the organization.
- Identify and assess advanced technologies and methodologies for expediting ML training, including but not limited to data pipeline optimization, large-scale distributed training, data loader optimization, hardware acceleration, and model optimization techniques.
- Design, develop, and implement scalable and efficient solutions for ML acceleration.
- Cross-functional Collaboration:
- Work closely with ML research, ML training platform, and product teams to comprehend their requirements and seamlessly integrate acceleration solutions.
- Effectively communicate complex technical concepts and strategies to both technical and non-technical stakeholders.
- Serve as a technical expert and advocate for ML acceleration initiatives across the organization.
- Impact & Measurement:
- Regularly measure and report on the effectiveness of acceleration efforts.
- Continuously explore and implement opportunities for further optimization and innovation.

