companyThinking Machines Lab logo

Research Engineer, Infrastructure & Inference

On-site Full-time $350K/yr - $475K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

The ideal candidate will possess a strong background in systems engineering, computer science, or a related field. Proficiency in programming languages such as Python and experience with AI frameworks are essential. Familiarity with cloud platforms, distributed systems, and performance optimization techniques will be advantageous.

About the job

At Thinking Machines Lab, we are dedicated to empowering humanity by advancing collaborative general intelligence. Our vision is to create a future where everyone can leverage AI to meet their unique needs and aspirations.

Our talented team comprises scientists, engineers, and innovators who have developed some of the most widely recognized AI products, including ChatGPT and Character.ai, alongside open-weight models like Mistral and popular open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.

About the Position

We are seeking a motivated Infrastructure Research Engineer to design, enhance, and scale the systems that underpin large AI models. Your contributions will significantly improve inference speed, cost-effectiveness, reliability, and reproducibility, allowing our teams to concentrate on enhancing model capabilities rather than dealing with bottlenecks.

Our mission centers on delivering high-performance and efficient model inference to support real-world applications and accelerate research efforts. In this role, you will be responsible for the infrastructure that guarantees smooth operation for every experiment, evaluation, and deployment at scale.

Note: This is an evergreen role, kept open continuously to express interest. We receive numerous applications and may not always have an immediate opening that aligns perfectly with your skills and experience. However, we encourage you to apply. We regularly review applications and reach out to candidates as new opportunities arise. Feel free to reapply as you gain more experience, but we kindly ask that you avoid applying more than once every six months. You may also notice postings for specific roles related to particular projects or teams, in which case you are welcome to apply directly in addition to this evergreen role.

What You Will Do

  • Collaborate with researchers and engineers to transition cutting-edge AI models into production.
  • Partner with research teams to ensure high-performance inference for innovative architectures.
  • Design and implement new techniques, tools, and architectures that enhance performance, latency, throughput, and efficiency.
  • Optimize our codebase and computing resources (e.g., GPUs) to maximize hardware FLOPs, bandwidth, and memory usage.
  • Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
  • Establish standards for reliability, observability, and reproducibility throughout the inference stack.
  • Publish and share insights through internal documentation, open-source libraries, or technical reports that further the field of scalable AI infrastructure.

About Thinking Machines Lab

Thinking Machines Lab is at the forefront of AI innovation, committed to developing solutions that empower individuals and organizations to harness the full potential of artificial intelligence. Our team is passionate about building tools and technologies that open up new possibilities for everyone.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.