company

Research Engineer - CUDA Kernel Development

Voltai TechnologiesPalo Alto Office
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Qualifications:Proven experience in CUDA kernel programming and optimization. Strong understanding of GPU architecture and performance profiling tools. Familiarity with AI frameworks and their integration for enhanced performance. Hands-on experience with NVIDIA's latest technology. Ability to work collaboratively in a fast-paced research environment.

About the job

About Voltai
At Voltai, we are pioneering the development of world models and agents capable of learning, evaluating, planning, experimenting, and interacting with the physical realm. Our journey begins with a focus on hardware, specifically in electronics systems and semiconductors, where we harness AI to design and innovate beyond human cognitive capabilities.

About the Team

Our team boasts extraordinary talent, including esteemed former Stanford professors, SAIL researchers, and medalists from prestigious competitions like IPhO and IOI. We are supported by top-tier investors from Silicon Valley and industry leaders, including CEOs and Presidents from Google, AMD, Broadcom, and Marvell.

About the Role

As a Research Engineer specializing in CUDA Kernel engineering, you will design, integrate, and optimize cutting-edge CUDA kernels that drive AI models, facilitating rapid advancements in semiconductor design and verification. Your contributions will empower extensive model training, inference, and reinforcement learning systems capable of reasoning about circuit layouts, generating and validating RTL, and optimizing chip architectures, all while efficiently utilizing thousands of GPUs.
You will create tools, performance benchmarks, and integration layers that maximize GPU utilization for compute-intensive workloads in AI-driven hardware design. Collaborating closely with fellow researchers and engineers, you will help position Voltai as the foremost organization in AI and semiconductor research. Furthermore, your kernels and tools will be released as valuable contributions to the open-source AI and HPC ecosystems.

You might excel in this position if you possess experience in:

  • Writing and optimizing CUDA kernels for large-scale AI applications (e.g., attention mechanisms, routing, graph-based operations, and physics-inspired operators).

  • Profiling and enhancing GPU performance for specialized compute or memory-bound workloads.

  • Integrating custom kernels into state-of-the-art training and inference frameworks (including PyTorch, Megatron, vLLM, and TorchTitan).

  • Engaging with the latest NVIDIA hardware and software frameworks (Hopper, Blackwell, NVLink, NCCL, Triton).

  • Creating GPU-accelerated primitives for graph reasoning, symbolic computation, or hardware simulation tasks.

About Voltai Technologies

Voltai is at the forefront of AI and semiconductor innovation, developing advanced models and agents that redefine hardware capabilities. Our mission is to push the boundaries of technology, transforming how we understand and interact with the world through cutting-edge research and development.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.