About the job
About Our Team
The Future of Computing Research team is a dynamic applied research unit within the Consumer Devices group at OpenAI. We are dedicated to pioneering innovative methods, models, and evaluation frameworks that propel our vision for the future of computing. Our focus lies at the cutting edge of multimodal AI, transforming emerging model capabilities into product experiences that are not only functional and enjoyable but also foster long-term trust.
Our research delves into a new generation of AI systems capable of learning and evolving over time, adapting to individual needs, and enhancing daily life. This includes exploring long-term memory, user modeling, and personalized systems aligned with broader human goals, values, and overall well-being.
We collaborate closely across multiple disciplines—research, engineering, design, product management, and safety—to define what it means to build AI systems that recognize and respond to user needs in a contextually aware and respectful manner, ensuring demonstrable benefits.
About the Position
We are seeking a passionate Research Engineer/Scientist to join our Future of Computing Research team, focusing on Reinforcement Learning from Human Feedback (RLHF) and post-training techniques for personalized multimodal AI systems.
In this role, you will be instrumental in establishing the learning and evaluation foundations necessary for models to become increasingly context-aware, adaptive, and useful over time. You will tackle challenges such as reward modeling, preference learning, long-horizon evaluation, and policy improvement for systems that are required to make high-quality behavioral decisions in real-world settings. Our success is measured not just by improved benchmark performance but by enhanced model behavior in actual use cases.
The ideal candidate is enthusiastic about advancing beyond simplistic one-turn assistant interactions towards systems that learn and grow through feedback, utilizing richer signals and training against meaningful notions of user value. This requires a thoughtful approach to reward design, feedback mechanisms, and evaluation frameworks that assess the long-term benefits of interventions.
This position is based in San Francisco, CA, with a hybrid work model of four days in the office each week. We also provide relocation assistance for new hires.
Key Responsibilities:
Develop RLHF and post-training strategies for multimodal models.
Create reward models and preference-learning pipelines to foster adaptive, personalized model behavior.
Engage in long-term evaluation and policy refinement to enhance user interactions.

