About the job
sensmore builds automation systems for heavy machinery, applying intelligent robotics to help equipment such as wheel loaders adapt to changing tasks and environments. Their Physical AI platform connects robotics with real-world industrial needs, aiming to boost productivity and safety across sectors like mining and construction.
This PhD Research Internship centers on advancing industrial automation, blending research and engineering in a practical setting. The position is based in Berlin or Potsdam and focuses on Vision-Language Models (VLM) and Vision-Language-Action (VLA) systems for robotics.
Role overview
The internship targets general purpose AI, with an emphasis on developing scalable VLA systems that enable robots to perceive, reason, and act in complex industrial environments. The work combines multi-modal perception, including video, radar, and lidar, with practical robotics. Interns will contribute to embodied AI research for heavy industry, working at the intersection of method development and hands-on engineering. There are opportunities to publish research and influence the direction of industrial autonomy at sensmore.
Key responsibilities
- Research and method development:
- Design and implement new approaches for Vision-Language-Action systems in industrial contexts.
- Investigate scalable architectures for multi-modal reasoning and action generation.
- Advance methods in embodied AI and robotic autonomy.
- Multi-modal learning and data systems:
- Lead the design and evaluation of large-scale multi-modal datasets, including video, radar, lidar, and sensor fusion.
- Develop self-supervised or weakly supervised pipelines for generating VLA datasets.
- Explore data-centric strategies to improve robustness and generalization.
- Model development and optimization:
- Build, adapt, and extend advanced models to achieve project objectives.

