companyCausal Labs logo

Machine Learning Infrastructure Engineer

Causal LabsSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

We seek individuals with a relentless problem-solving mindset, swift execution capabilities, and a strong capacity to learn rapidly in new domains. Solid understanding of the latest techniques for enhancing training and inference workloads. Proven experience with distributed training frameworks (e.g., FSDP, DeepSpeed) for training large foundational models. Experience in designing, deploying, and maintaining extensive distributed ML training and inference clusters. Adept at developing efficient, scalable end-to-end pipelines to manage massive datasets and oversee model training throughout the entire ML lifecycle. Research and evaluate various training methodologies, including parallelization techniques and numerical precision trade-offs across diverse model sizes. Skilled at analyzing, profiling, and debugging low-level GPU operations for performance optimization. Commitment to staying current with research trends to integrate innovative ideas into your work.

About the job

At Causal Labs, we are on a groundbreaking mission to develop general causal intelligence, harnessing AI to (1) forecast future events and (2) pinpoint optimal actions to influence that future.

To realize this vision, we are constructing a Large Physics foundation Model (LPM), as the domains governed by physics inherently feature cause-and-effect relationships, which is distinct from visual or textual data.

Weather serves as the perfect training environment for our LPM, being the most extensively observed physical system and providing rapid, objective ground truth feedback from sensory data at an unprecedented scale, far exceeding what is utilized for current large language models (LLMs).

Our team comprises elite researchers and engineers with backgrounds in self-driving technology, drug discovery, and robotics, including talents from Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple. We believe that achieving general causal intelligence will be a pivotal technological advancement for humanity.

We are searching for infrastructure engineers who are eager to tackle formidable challenges and contribute to our mission.

Your expertise in distributed training clusters and performance optimization for large models will be crucial as we address our training and inference challenges. If you possess experience in developing large-scale ML infrastructure within fields like language models, vision systems, robotics, or biology, we invite you to join us.

About Causal Labs

Causal Labs is at the forefront of AI research, focusing on the development of general causal intelligence. Our team of experts, drawn from top tech and research organizations, is dedicated to creating groundbreaking solutions that will revolutionize the understanding of causal relationships in various domains.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.