companyFireworks AI logo

Software Engineer - Performance Optimization

Fireworks AISan Mateo, CA
On-site Full-time $175K/yr - $220K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Key Responsibilities:Optimize system and GPU performance for high-throughput AI workloads across training and inference. Analyze and enhance latency, throughput, memory usage, and compute efficiency. Profile system performance to identify and resolve GPU- and kernel-level bottlenecks. Implement low-level optimizations utilizing CUDA, Triton, and other performance tools. Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models). Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency. Enhance support for mixed precision, quantization, and model graph optimization. Build and maintain performance benchmarking and monitoring infrastructure. Scale inference and training systems across multi-GPU, multi-node environments. Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes.

About the job

About Us:

At Fireworks AI, we are pioneering the future of generative AI infrastructure. Our cutting-edge platform provides unparalleled model quality with the fastest and most scalable inference capabilities in the industry. We have been independently recognized as the leader in LLM inference speed and are at the forefront of innovative projects like our proprietary function calling and multimodal models. As a Series C company valued at $4 billion, we are backed by premier investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. Our dynamic team of builders includes veterans from Meta PyTorch and Google Vertex AI, and we are driven by ambition and collaboration.

The Role: 

We are seeking a talented Software Engineer specializing in Performance Optimization to enhance the speed and efficiency of our AI infrastructure. This role demands ownership of performance optimization across all system layers, from low-level GPU kernels to large-scale distributed systems. Your primary focus will be on maximizing the efficiency of our most demanding workloads, which encompass large language models (LLMs), vision-language models (VLMs), and next-generation video models.

You will collaborate closely with teams in research, infrastructure, and systems to identify performance bottlenecks, implement innovative optimizations, and scale our AI systems to meet the real-world demands of production use cases. Your contributions will significantly influence the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models globally.

About Fireworks AI

Fireworks AI is at the forefront of generative AI infrastructure, delivering industry-leading performance and scalability. With significant backing from top-tier investors and a team of experts from leading tech companies, we strive to innovate and push the boundaries of AI technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.