About the job
About Us:
At Fireworks AI, we are pioneering the future of generative AI infrastructure. Our cutting-edge platform provides unparalleled model quality with the fastest and most scalable inference capabilities in the industry. We have been independently recognized as the leader in LLM inference speed and are at the forefront of innovative projects like our proprietary function calling and multimodal models. As a Series C company valued at $4 billion, we are backed by premier investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. Our dynamic team of builders includes veterans from Meta PyTorch and Google Vertex AI, and we are driven by ambition and collaboration.
The Role:
We are seeking a talented Software Engineer specializing in Performance Optimization to enhance the speed and efficiency of our AI infrastructure. This role demands ownership of performance optimization across all system layers, from low-level GPU kernels to large-scale distributed systems. Your primary focus will be on maximizing the efficiency of our most demanding workloads, which encompass large language models (LLMs), vision-language models (VLMs), and next-generation video models.
You will collaborate closely with teams in research, infrastructure, and systems to identify performance bottlenecks, implement innovative optimizations, and scale our AI systems to meet the real-world demands of production use cases. Your contributions will significantly influence the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models globally.

