Publiée 19 juin 2026

PhD Position F/M Toward Grounded, Consistent, and Temporally Faithful Video Reasoning

Inria

Paris, Île-de-France 75000, France CDI

Mission confiée

Recent video multimodal large language models (Video-MLLMs) have achieved
strong results on standard benchmarks, yet remain systematically unreliable on tasks
requiring temporally consistent, spatially grounded reasoning. Video-LLMs achieve
near-chance consistency (≈50%) in temporal grounding even after task-specific fine-
tuning; they hallucinate actions, temporal sequences, and scene transitions at
high rates; and they perform close to random on 4D spatiotemporal tasks (GPT-
4o: 57.5% vs. 98.8% human) and multi-object dynamic spatial reasoning.
These failures are structural: current systems compress all perceptual history into a
flat token sequence and ask the language model to simultaneously act as the archive
of what happened and the reasoner about what it means. These are architecturally
distinct operations, and conflating them in a single attention pass makes temporal
inconsistency, hallucination, and spatial failure modes unavoidable by design. This
PhD addresses the design of an explicit memory and state space to improve long-video
reasoning.

Principales activités

Main activities:

Analyse and implement related work.
Design novel innovative solutions.
Write progress reports and papers.
Present work at conferences.

Compétences

Technical skills and level required : programming skills are required.

Languages : English and possibly French.

Relational skills : Good communication skills.

Avantages

Subsidized meals
Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

Postuler à cette offre

PhD Position F/M Toward Grounded, Consistent, and Temporally Faithful Video Reasoning

S’inscrire aux alertes d’offres d’emploi

Partager cette offre d’emploi