embedding-vc logo

Technical Staff Member - ML Infrastructure & Performance

embedding-vcSan Mateo, CA
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Candidates should possess a strong background in machine learning infrastructure, performance optimization, and practical experience with the technologies outlined in the scope of work. A passion for pushing the boundaries of AI technology and a collaborative mindset are essential.

About the job

Join the innovative team at Moonlake, where we harness the power of AI to create real-time interactive content.

Mission: Elevate performance metrics by enhancing throughput, reducing latency, and optimizing costs - deploying our models 2–10 times faster and at lower costs without compromising quality.

Scope of Work:

  • GPU Performance: Expertise in CUDA/Triton kernels, FlashAttention family, paged attention, and CUDA Graphs.
  • Serving Stack: Proficiency with TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; and mixture-of-agents routing.
  • Parallelism: Experience with FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.
  • Quantization/PEFT: Familiarity with AWQ/GPTQ/FP8; LoRA/DoRA serving.
  • Systems: Knowledge of Ray/k8s/Argo, observability tools (Prom/Grafana/OpenTelemetry), autoscaling, A/B infrastructure, and canary + rollback.

Tech Signals:

Ideal candidates will have previous experience at infrastructure-heavy startups such as Databricks or Roblox.

We are dedicated to maintaining an on-site, in-person team based in San Mateo.

About embedding-vc

Moonlake is at the forefront of AI-driven innovation, specializing in the development of real-time interactive content. Our mission is to deliver cutting-edge technology solutions that enhance user experiences and streamline operations.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.