About the job
About Reality Defender
Reality Defender is a pioneering cybersecurity firm dedicated to assisting enterprises and governmental bodies in identifying deepfakes and AI-generated media. Our advanced, patented multi-model approach provides resilience against the latest generative platforms that create video, audio, imagery, and text content. Our API-first deepfake detection platform enables teams and developers to detect fraud, disinformation campaigns, and harmful deepfakes in real time.
Supported by an elite group of investors, including DCVC, Illuminate Financial, Y Combinator, Booz Allen Hamilton, IBM, Accenture, Rackhouse, and Argon VC, we collaborate with leading enterprises, financial institutions, and governments to ensure that AI-generated media is not exploited for malicious intents.
Watch our Victory at RSA as Most Innovative Startup.
About the Multimodal AI Internship
This 4-month internship is tailored for current PhD students and candidates, allowing you to collaborate with Reality Defender's AI team on groundbreaking research and the publication of peer-reviewed papers. You will work closely with Surya Koppisetti and Yi Zhu, who will provide guidance in the field of multimodal deepfake detection. This internship is fully remote, although you are welcome to work from our headquarters in New York City if preferred.
Your Responsibilities
Explore and propose innovative methods for detecting generative multimodal content across audio and visual mediums.
Conduct research on multimodal deepfake detection and reasoning tasks.
Collaborate actively with team researchers.
Document research findings for internal reports and prepare submissions for academic journals and workshops.
Independently implement and assess concepts using modern deep learning technologies, including Python, PyTorch, and cloud computing platforms like AWS or GCP.
Candidate Profile
PhD student in a relevant technical field, ideally with three or more years of study completed.
Experience in multimodal learning, particularly in audio-visual classification and audio-language reasoning.
Strong proficiency in Python and experience in developing deep learning models.

