Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
What We Look ForWe are seeking candidates who have a blend of technical skills and a passion for innovation in AI and GPU technologies. Your ability to collaborate with a team and contribute to complex projects will be key to your success.
About the job
About the Role
We're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.
Your Responsibilities
Design and implement scalable infrastructure for AI model training and inference tasks.
Guide the team in making technical decisions and architectural choices.
Qualifications We Seek
Essential Technical Skills
GPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.
Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.
Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.
Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.
Preferred Background
Publications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.
Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
About Wafer
wafer is at the forefront of AI infrastructure development, dedicated to pushing the boundaries of GPU optimization. We foster a collaborative environment where innovation thrives, and we are passionate about empowering our team to make impactful contributions.
Similar jobs
1 - 20 of 11,641 Jobs
Search for Technical Staff Engineer Gpu Optimization At Wafer San Francisco
About the PositionAt Wafer, we are on a mission to enhance the intelligence per watt by developing AI systems that can self-optimize. Our journey begins with GPU kernels, and we aim to revolutionize every aspect of ML systems and AI infrastructure. We are a compact, dynamic team of four, supported by renowned investors including Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba, co-founder of OpenAI. We are seeking passionate engineers eager to innovate at the convergence of AI agents and systems programming.In this role, you will collaborate closely with our founding team to create the systems that power our GPU optimization platform. Your projects will range from the agent framework that refines kernels to the profiling infrastructure that interfaces with NCU and ROCprofiler, as well as the compiler tools that scrutinize PTX and SASS.
About the InternshipEmbark on an exciting journey with Wafer as a Technical Intern, where you'll play a pivotal role in shaping the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our talented team to define our technical strategy and develop the foundational systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and build scalable infrastructure for training and inference of AI models.Make key architectural decisions and lead technical direction.
About the RoleWe invite you to join our innovative team at Wafer as a Technical Intern, where you will have the opportunity to shape the future of inference, GPU optimization, and AI infrastructure. As a full-time engineer, you will collaborate closely with our team to define our technical direction and develop the core systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference.Make pivotal technical decisions and influence architectural choices.
About the RoleWe're excited to invite you to join wafer as a Spring Intern, where you will play a crucial role in shaping the future of AI infrastructure and GPU optimization. As part of our innovative team, you will work closely with full-time engineers to define our technical strategies and contribute to the development of the essential systems that drive our GPU optimization platform.Your ResponsibilitiesDesign and implement scalable infrastructure for AI model training and inference tasks.Guide the team in making technical decisions and architectural choices.Qualifications We SeekEssential Technical SkillsGPU Fundamentals: A strong grasp of GPU architectures, CUDA programming, and parallel computing methodologies.Deep Learning Frameworks: Skilled in PyTorch, TensorFlow, or JAX, especially for GPU-accelerated applications.Knowledge of LLM/AI: Solid foundation in large language models, including training, fine-tuning, prompting, and evaluation.Systems Engineering: Proficient in C++, Python, and potentially Rust/Go for developing tools around CUDA.Preferred BackgroundPublications or contributions to open-source projects related to inference GPU computing or ML/AI are advantageous.Hands-on experience in conducting large-scale experiments, benchmarking, and performance optimization.
At Gimlet Labs, we are pioneering the first heterogeneous neocloud tailored for AI workloads. As the demand for AI systems grows, traditional infrastructure faces significant limitations in terms of power, capacity, and cost. Our innovative platform addresses these challenges by decoupling AI workloads from the hardware, intelligently partitioning tasks, and directing each component to the most suitable hardware for optimal performance and efficiency. This method allows for the creation of heterogeneous systems that span multiple vendors and generations of hardware, including the latest cutting-edge accelerators, achieving substantial improvements in performance and cost-effectiveness.Building upon this robust foundation, Gimlet is developing a production-grade neocloud designed for agentic workloads. Our customers can effortlessly deploy and manage their workloads with stable, production-ready APIs, eliminating the complexities of hardware selection, placement, or low-level performance optimization.We collaborate with foundational labs, hyperscalers, and AI-native companies to drive real production workloads capable of scaling to gigawatt-class AI data centers.We are currently seeking a dedicated Member of Technical Staff specializing in kernels and GPU performance. In this role, you will work closely with accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behaviors, design and optimize kernels, and ensure consistent performance across both established and emerging hardware.This position is perfect for engineers who thrive on deep performance analysis, enjoy exploring hardware trade-offs, and are passionate about transforming theoretical peak performance into tangible real-world outcomes.
Join the Innovative Team at Liquid AIFounded as a spin-off from MIT’s CSAIL, Liquid AI is at the forefront of developing cutting-edge AI systems that operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our technology is designed to ensure low latency, efficient memory usage, privacy, and reliability. We collaborate with leading enterprises in sectors such as consumer electronics, automotive, life sciences, and financial services as we rapidly scale our operations. We are seeking talented individuals who are passionate about technology and innovation.Your Role in Our TeamAs a GPU Performance Engineer, your expertise will be critical in enhancing our models and workflows beyond the capabilities of standard frameworks. You will be responsible for designing and deploying custom CUDA kernels, conducting hardware-level profiling, and transforming research concepts into production code that yields tangible improvements in our pipelines (training, post-training, and inference). Our dynamic team values initiative and ownership, and we are looking for a candidate who thrives on tackling complex challenges related to memory hierarchies, tensor cores, and profiling outputs.While San Francisco and Boston are preferred, we welcome applications from other locations.
Baseten develops infrastructure and tools that help AI companies deploy and scale inference. Teams at organizations like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma, and Writer rely on Baseten to bring advanced machine learning models into production. The company recently secured a $300M Series E from investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Role overview This Software Engineer - GPU Inference position joins the founding team for Baseten Voice AI in San Francisco. The team focuses on building production-ready Voice AI systems, bringing open-source voice models into real-world use for clients in productivity, customer service, healthcare conversations, and education. The work shapes how people interact with technology through voice, creating broad impact across industries. In this role, the engineer leads the internal inference stack that powers Voice AI models. Responsibilities include guiding the product roadmap and driving engineering execution. Collaboration is a key part of the job, working closely with Forward Deployed Engineers, Model Performance Engineers, and other technical groups to advance Voice AI capabilities. Sample projects and initiatives The world's fastest Whisper, with streaming and diarization Canopy Labs selects Baseten for Orpheus TTS inference Partnering with the Core Product team to build an orchestration framework for a multi-model voice agent Working with the Training Platform team to support continuous training of voice models Designing a developer-friendly API and SDK for self-service adoption of Baseten Voice AI products
Technical Staff MemberAt humans&, we are dedicated to pioneering a human-centric approach to artificial intelligence. Our mission is to redefine AI by placing individuals and their interpersonal connections at the heart of our innovations.We invite talented researchers and engineers who have made significant contributions to the cutting-edge of AI to join our dynamic team. If you excel in your field and are driven to innovate, we want to hear from you!
Full-time|$100K/yr - $300K/yr|On-site|San Francisco, CA
About Cogent SecurityCogent Security is an innovative Applied AI Lab pioneering the future of AI agents in the realm of cybersecurity. In a world where cyber threats evolve at unprecedented speeds, our 'AI Taskforce' analyzes vast amounts of enterprise data to proactively address vulnerabilities and avert critical breaches.We remain at the forefront of technology by merging cutting-edge research with practical applications. Our dedicated Cogent Research team fuels our mission, ensuring we develop truly effective security workflows powered by AI.Since our inception, Cogent has rapidly grown, collaborating with Fortune 500 companies to safeguard the most intricate production environments globally.Supported by Greylock, our team comprises some of the brightest minds in applied AI, including experts from:Renowned universities such as Stanford, Berkeley, Penn, Duke, Carnegie Mellon, and Waterloo.High-growth unicorn companies like Scale AI, Databricks, Stripe, Tesla, and Coinbase.Leading cybersecurity specialists from Wiz, Abnormal AI, and Zscaler.Prestigious research institutions including DeepMind and SAIL.About the RoleAs we embark on building a suite of backend services and integrations with our design partners, we seek passionate and skilled Backend Engineers at both Senior and Staff levels, eager to thrive in the Applied AI domain.ResponsibilitiesDesign and implement critical backend subsystems and integration platformsComprehend business objectives and customer requirements to engineer backend subsystems that align with our technology strategies.Adapt systems to meet evolving needs of design partners and clients.Incorporate non-functional requirements such as compliance and security into system design.Establish scalable infrastructure foundationsPrepare for future growth in customer base, headcount, and data management by collaborating with your team to enhance infrastructure.
Join Lambda, a pioneer in AI cloud infrastructure, where we empower countless customers ranging from AI researchers to large enterprises and hyperscalers. Our mission is to democratize access to superintelligence, making compute as essential as electricity. If you're passionate about revolutionizing the AI cloud landscape, we invite you to be part of our journey.*Note: This position mandates in-office presence at our San Jose or San Francisco locations for four days a week; our designated remote workday is Tuesday.Our Operations team is integral to delivering a seamless AI Infrastructure-as-a-Service (AI-IaaS) experience. This team is responsible for sourcing infrastructure and components, overseeing daily data center operations to ensure optimal performance, and facilitating cross-company coordination through product management to align operational capabilities with strategic objectives. By managing the entire lifecycle from procurement to deployment, our Operations team guarantees that our AI-driven infrastructure is robust, scalable, and aligned with our business priorities.Key ResponsibilitiesOversee large-scale GPU cluster deployments across data center colocation sites nationwide.Collaborate closely with Data Center engineering and operations to ensure the accurate deployment of infrastructure requirements.Advance multiple concurrent projects, evaluating risks and monitoring progress.Proactively manage dependencies and foresee/resolve execution challenges.Engage with a diverse range of cross-functional stakeholders, including Product, Engineering, and Operations, to ensure effective project execution and adherence to SLAs.Communicate progress and status updates with internal stakeholders and customer groups across various locations and time zones.Interact with stakeholders at all levels to resolve technical and scheduling issues.Cultivate strong partnerships across Lambda while leveraging your credibility and technical expertise to foster process improvements and efficiencies.Contribute to the development and expansion of new business opportunities.
About tierzero tierzero helps engineering teams build and deploy code with greater speed and operational clarity in an AI-driven world. The company focuses on improving incident response, operational visibility, and knowledge sharing for engineers. Backed by $7 million in funding from investors like Accel and SV Angel, tierzero supports large-scale systems for clients such as Discord, Drata, and Framer. Role Overview: Founding Member of Technical Staff This role is based at tierzero's San Francisco headquarters. In-person work is required three days a week. As a founding member of the technical team, you will help design and build core products and systems from the ground up. Collaboration is central: expect to work closely with the CEO, CTO, and customers. Projects span a wide range of technical challenges and product areas. What You Will Do Design and implement intelligent AI systems that process and reason over large volumes of unstructured data. Develop full-stack features, incorporating direct feedback from users. Improve the product experience so intelligent agents are practical and reliable for engineers. Create systems that automatically evaluate LLM outputs and refine agent reasoning using self-play and feedback loops. Build machine learning pipelines covering data ingestion, feature generation, embedding stores, RAG pipelines, vector search, and graph databases. Prototype and experiment with open-source and advanced LLMs to weigh different approaches. Set up scalable infrastructure for long-running, multi-step agents, including memory management, state handling, and asynchronous workflows. What We Look For At least 5 years of professional or open-source experience in a relevant technical field. Comfort working in a setting that changes and evolves quickly. Strong product focus and an understanding of customer needs. Interest in LLMs, MCPs, cloud infrastructure, and observability tools. Ability to learn from and collaborate with engineers who have delivered over $10 billion in value. Commitment to working onsite in San Francisco three days per week. Startup experience is a plus.
Full-time|$230K/yr - $230K/yr|On-site|San Francisco
About daydreamAt daydream, we are revolutionizing the SEO landscape by developing the world's most advanced SEO agent, complemented by a team of top 1% human experts. Our goal is to manage organic search intricately, freeing our clients from the burdens of hiring specialized teams, evaluating multiple tools, or managing external agencies.In today's competitive environment, excelling in search demands the production of expert-level content at scale, alongside constant adaptation to shifts in ranking systems and user behavior. Traditional tools merely provide metrics without actionable insights, while agencies rely on cumbersome manual workflows that fail to generate compounding growth. Even well-resourced teams often find it challenging to translate sustained efforts into consistent results.daydream simplifies this landscape by consolidating fragmented tools and external vendors into a single coherent system that continuously plans and executes SEO strategies. Our SEO agent takes charge of analysis, planning, and content generation, while seasoned Growth Leads translate this work into actionable outputs—shipped pages, technical adjustments, and updated priorities—holding accountability for performance over time.We have successfully raised over $20M from notable investors including First Round Capital, Basis Set Ventures, WndrCo, and SOMA Capital, and our growth is supported by industry leaders from Notion, Airtable, Webflow, and Dropbox, such as Lenny Rachitsky and Eli Schwartz, author of Product-Led SEO.The TeamYou will collaborate closely with a dedicated, senior team across engineering, design, and growth:Shravan (CTO), who previously served as an Engineering Lead at Flixed and a Software Engineer at Facebook, focusing on large-scale web scraping for Facebook Jobs and Meta Reality Labs.Vishruth (Founding Applied AI Engineer), a former Data Scientist at Klaviyo specializing in NLP, with a robust research background from Cornell.Daniel (Founding Applied AI Engineer), who previously worked as a Data Engineer on Tesla’s Autopilot team, specializing in large-scale data systems and applied machine learning, and was formerly the Lead Data Engineer at Replit.Tom (Founding Designer), who led product design at Blockless and plays a significant role in UI engineering.You will also interact directly with Growth Leads, senior SEO practitioners who manage the system daily and provide invaluable feedback on AI performance in real-world applications.Your ImpactIn your role as a Staff Engineer at daydream, you will provide vital technical leadership across the product while remaining deeply engaged in hands-on execution. You will closely collaborate with the CTO to define system architecture, guide project execution, and elevate quality standards throughout the engineering organization.
Role overview As a Founding Member of Technical Staff at tierzero in San Francisco, this early hire will play a key role in shaping both the company’s technology and its products from day one. The position calls for hands-on work, building new solutions and collaborating with the team to set technical direction. What you will do Work directly with the founding team on core product development Help establish engineering practices and technical standards Take on projects designed to set new benchmarks in the industry Influence decisions that guide the company’s technology roadmap Location This role is based at tierzero’s headquarters in San Francisco.
TierZero seeks a Founding Member of Technical Staff to join the team in San Francisco. This in-person position requires working from the SF headquarters at least three days per week. Role overview This role centers on close collaboration with a group of engineers who have collectively delivered over $10 billion in value during their careers. Expect to work side by side with teammates, sharing ideas and building strong connections in the office. The environment often shifts, so adaptability and comfort with changing priorities are important. Key responsibilities Work directly with experienced engineers to design and build new products Prioritize customer needs and satisfaction in product decisions Develop solutions using large language models (LLMs), multi-cloud platforms (MCPs), cloud infrastructure, and observability tools Requirements Minimum 5 years of professional engineering experience or a strong record of open-source contributions Experience in startups and familiarity with their unique challenges is a plus Location This position is based in San Francisco. In-office presence is required three days each week for collaboration.
About Our TeamJoin the Fleet team at OpenAI, where we empower groundbreaking research and product innovation through our advanced computing infrastructure. We manage extensive systems across data centers, GPUs, and networking, ensuring optimal performance, high availability, and efficiency. Our work is crucial in enabling OpenAI’s models to function seamlessly at scale, supporting both our internal research endeavors and external products like ChatGPT. We are committed to prioritizing safety, reliability, and the ethical deployment of AI technology.About the RoleAs a Software Engineer on the Fleet High Performance Computing (HPC) team, you will play a vital role in ensuring the reliability and uptime of OpenAI’s compute fleet. Minimizing hardware failures is essential for smooth research training progress and uninterrupted services, as even minor hardware issues can lead to significant setbacks. With the rise of large supercomputers, the stakes in maintaining efficiency and stability have never been higher.At the cutting edge of technology, we often lead the charge in troubleshooting complex, state-of-the-art systems at scale. This is a unique opportunity for you to engage with groundbreaking technologies and create innovative solutions that enhance the health and efficiency of our supercomputing infrastructure.Our team fosters a culture of autonomy and ownership, enabling skilled engineers to drive meaningful change. In this role, you will focus on comprehensive system investigations and develop automated solutions to enhance our operations. We seek individuals who dive deep into challenges, conduct thorough investigations, and create scalable automation for detection and remediation.Key Responsibilities:Develop and maintain automation systems for provisioning and managing server fleets.Create tools to monitor server health, performance metrics, and lifecycle events.Collaborate effectively with teams across clusters, networking, and infrastructure.Work closely with external operators to maintain a high level of service quality.Identify and resolve performance bottlenecks and inefficiencies in the system.Continuously enhance automation processes to minimize manual intervention.You Will Excel in This Role if You Have:Experience in managing large-scale server environments.A blend of technical skills in systems programming and infrastructure management.Strong problem-solving abilities and a methodical approach to troubleshooting.Familiarity with high-performance computing technologies and tools.
Full-time|$180K/yr - $250K/yr|Hybrid|San Francisco
At Tanagram, we are on a mission to revolutionize agentic coding by transforming the way engineering lessons are harvested from codebases, code reviews, incident post-mortems, and Slack communications. Our innovative approach provides real-time guardrails that identify and rectify risky coding patterns, empowering teams to accelerate their shipping process while minimizing potential pitfalls.We are assembling a small, diverse group of outstanding engineers who are passionate about the future of agentic coding. Our ideal candidates are detail-oriented, possess a high level of agency, and are adept at identifying critical problems to solve, building exceptional products, and continuously iterating towards superior solutions.In this role, you will engage with our entire technology stack, from our Typescript + React frontend to our Python/Go backend and LLM infrastructure. You should possess a keen intuition for selecting the right tools and configuring them to achieve optimal user results.You are proactive and prefer to take immediate action rather than postponing tasks. You communicate your intentions clearly and follow through promptly.We operate in a hybrid work environment, primarily based in San Francisco (Mission Bay) and NYC, but we welcome strong remote candidates. For more insight into our work culture, refer to our guidelines.
At Sciforium, we are at the forefront of AI infrastructure, innovating next-generation multimodal AI models and a proprietary high-efficiency serving platform. With substantial funding and direct collaboration from AMD, supported by their engineers, our team is rapidly expanding to develop the complete stack that powers cutting-edge AI models and real-time applications.About the RoleWe are on the lookout for a talented GPU Kernel Engineer who is eager to explore and maximize performance on modern accelerators. In this role, you will be responsible for designing and optimizing custom GPU kernels that drive our advanced large-scale AI systems. You will navigate the hardware-software stack, engaging in low-level kernel development and integrating optimized operations into high-level machine learning frameworks for large-scale training and inference.This position is perfect for someone who excels at the intersection of GPU programming, systems engineering, and state-of-the-art AI workloads, and aims to contribute significantly to the efficiency and scalability of our machine learning platform.Key ResponsibilitiesDevelop, implement, and enhance custom GPU kernels utilizing C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas.Profile and fine-tune the end-to-end performance of machine learning operations, particularly for large-scale LLM training and inference.Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and our proprietary internal runtimes.Create performance models, pinpoint bottlenecks, and deliver kernel-level enhancements that significantly boost AI workloads.Collaborate with machine learning researchers, distributed systems engineers, and model-serving teams to optimize computational performance across the entire stack.Engage closely with hardware vendors (NVIDIA/AMD) and stay updated on the latest GPU architecture and compiler/toolchain advancements.Contribute to the development of tools, documentation, benchmarking suites, and testing frameworks ensuring correctness and performance reproducibility.Must-Haves5+ years of industry or research experience in GPU kernel development or high-performance computing.Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related discipline.Strong programming proficiency in C++, Python, and familiarity with machine learning frameworks.
About MetaDAOMetaDAO is at the forefront of revolutionizing decision-making through innovative decision markets, empowering individuals and organizations to shape the most critical choices in the world.Our flagship offering is a cutting-edge fundraising platform that leverages decision markets to minimize the risks of fraud and misallocation for investors. Since its monetization in October 2025, the platform has achieved a remarkable run rate revenue of approximately $9 million annually.With $12 million raised from leading investors, we are dedicated to assembling a world-class team. If you share our belief in the transformative power of markets, we invite you to join us in this exciting journey.
About tierzero tierzero builds tools that help engineering teams manage production code with stronger incident response, better operational visibility, and collaborative knowledge sharing. Companies like Discord, Drata, and Framer use tierzero to support their infrastructure in an AI-driven landscape. Backed by $7 million from investors including Accel and SV Angel, tierzero is growing quickly from its San Francisco headquarters. Role Overview: Founding Member of Technical Staff This is a hands-on role shaping tierzero’s core product and systems from the ground up. The founding technical team works closely with the CEO, CTO, and early customers to solve real engineering challenges. The position is based in San Francisco, with a hybrid schedule: three days each week in the office. What You’ll Do Design and build intelligent AI systems that process large volumes of unstructured data Deliver full-stack features informed by real-time user feedback Improve usability so AI agents are both effective and trustworthy for engineers Develop systems for automated evaluation of LLM outputs, including feedback loops and self-play Construct machine learning pipelines for data ingestion, feature generation, embedding storage, retrieval-augmented generation (RAG), vector search, and graph databases Prototype with open-source LLMs to understand their strengths and weaknesses Create scalable infrastructure for complex, multi-step agents, focusing on memory, state management, and asynchronous workflows Who We’re Looking For 5+ years of professional experience or significant open-source contributions Interest in LLMs, MCPs, cloud infrastructure, and observability tools Comfort working in changing, ambiguous situations Product-focused and customer-first mindset Experience learning from and collaborating with engineers from diverse backgrounds Bonus: Previous experience in a startup setting Work Location Hybrid schedule: three days per week in-person at the San Francisco HQ.
ABOUT BASETENAt Baseten, we empower the world's leading AI firms—such as Cursor, Notion, and OpenEvidence—by delivering mission-critical inference solutions. Our unique blend of applied AI research, robust infrastructure, and user-friendly developer tools enables AI pioneers to effectively deploy groundbreaking models. With our recent achievement of a $300M Series E funding round supported by esteemed investors like BOND and IVP, we're on an exciting growth trajectory. Join our dynamic team and contribute to the platform that drives the next generation of AI products.THE ROLEWe are looking for an experienced Senior GPU Kernel Engineer to join our innovative team at the forefront of AI acceleration. In this role, your programming expertise will directly enhance the performance of cutting-edge machine learning models. You'll be responsible for developing highly efficient GPU kernels that optimize computational processes, allowing for transformative AI applications.You'll thrive in a fast-paced, intellectually challenging environment where your technical skills are pivotal. Your contributions will directly affect production systems that serve millions of users across various platforms. This position offers exceptional opportunities for career advancement for engineers enthusiastic about low-level optimization and impactful systems engineering.EXAMPLE INITIATIVESAs part of our Model Performance team, you will engage in projects like:Baseten Embeddings Inference: The quickest embeddings solution availableThe Baseten Inference StackEnhancing model performance optimizationRESPONSIBILITIESCore Engineering ResponsibilitiesDesign and develop high-performance GPU kernels for essential machine learning operations, including matrix multiplications and attention mechanisms.Collaborate with cross-functional teams to drive performance improvements and implement optimizations.Debug and refine kernel code to achieve maximal efficiency and reliability.Stay abreast of the latest advancements in GPU technology and machine learning frameworks.
Jul 17, 2025
Sign in to browse more jobs
Create account — see all 11,641 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.