Publiée 20 juin 2026

Thèse: Multi-Agent Reinforcement Learning for Dialogue Grounding, Reasoning and Planning. F/H

Orange SA

Lannion, Bretagne 22300, France CDI

Date de publication : Apr 08, 2026, 12:15AM

Since their breakthrough in 2022, Large Language Models (LLMs) are transforming our daily lives. However, they still struggle with reliable reasoning and planning, often neglecting grounding-the process by which interlocutors ensure mutual understanding. These limitations cannot be fully addressed by prompting techniques alone (e.g., chain-of-thoughts, ReAct). They are even more pronounced in Small Language Models (LMs), whose limited parameters restrict their generalization.

Reinforcement Learning from Human Feedback (RLHF) has proven effective in reducing hallucinations and improving reasoning in LLMs (notably with DeepSeek), but it is less efficient for small LMs. This has led to increased interest in Multi-agent Reinforcement Learning (MRL) as a promising alternative.

This thesis proposes to study MRL by decomposing complex conversational tasks into three sub-tasks: grounding, reasoning, and planning, focusing on small LMs. The objectives are:

Adjusting the weights of specialized LM agents working collaboratively in a multi-agent environment, going beyond traditional prompting, RAG, or fine-tuning.
Applying MRL to public benchmarks and Orange's use cases (e.g., resolving network or product issues).
Key challenges include identifying the optimal task decomposition, designing effective reward functions, and evaluating their performance. By cooperating, specialized agents can overcome individual limitations to solve complex tasks

Skills (Technical and scientific) and soft skills

You have experience in the fields of Artificial Intelligence, Machine Learning and particularly in deep learning.
You have a strong background in mathematics (numerical optimization, statistics, probability, etc.).
You are proficient in software development
You are proficient in read, written and spoken English
You are curious, attracted by new technologies, and ready to keep up with their evolutions - You enjoy working in a team, within multidisciplinary projects, and contributing to a common goal, while being autonomous in your activities
You have good analytical and synthesis skills
Proficiency in one of the following deep learning tools: Torch, pyTorch, TensorFlow, MXNet is desired
You like to communicate the results of your work through written reports and oral presentations preferable in English

Required training (master's degree, engineering degree, PhD, scientific and technical field, etc.)

Engineering degree and/or Research Master's degree, with knowledge in machine learning and in at least one of the fields listed above.

Desired experience (internships, etc.),

A first experience in the implementation of deep learning algorithms (as part of an internship for example) would be desired.

You will join a team specialized in dialogue, you will work with researchers, data scientists, architects, developers, PhD students and interns.

The proposed gross salary ranges between37 K€and 40 K€ and is paid over 12 months

L'ambition de la Division Innovation est de porter plus loin l'innovation d'Orange et de renforcer son leadership technologique, en mobilisant nos capacités de recherche pour nourrir une innovation responsable au service de l'humain, éclairer les choix stratégiques du Groupe à long terme et influencer l'écosystème digital mondial.
Nous formons les expertes et les experts des technologies d'aujourd'hui et de demain, et veillons à une amélioration continue de la performance de nos services et de notre efficacité. La division Innovation rassemble, dans le monde, 6000 salariés dédiés à la recherche et l'innovation dont 740 chercheurs. Porteurs d'une vision globale avec une grande diversité de profils (chercheurs, ingénieurs, designers, développeurs, data scientists, sociologues, graphistes, marketeurs, experts en cybersécurité...), les femmes et les hommes de Innovation sont à l'écoute et au service des pays, des régions et des business units pour faire d'Orange un opérateur multiservices de confiance.

Au sein de Innovation, vous serez intégré(e) dans la direction Data & AI. Cette direction a pour principale mission de faire d'Orange une entreprise " data driven qui définit les standards du Groupe en matière de data et d'intelligence artificielle, et qui facilite le développement des cas d'usage, des produits et services de données. Cette direction est appelée à accompagner l'ensemble du groupe Orange.

Chez Orange, seules vos compétences comptent.

Quel que soit votre âge, genre, origine, parcours, religion, orientation sexuelle, handicap, neuroatypie, ou apparence, nous encourageons activement la diversité au sein de nos équipes, car elle constitue une force pour le collectif et un vecteur d'innovation.
Orange est une entreprise handi-accueillante : n'hésitez pas à nous faire part de vos besoins spécifiques.

Postuler à cette offre

Thèse: Multi-Agent Reinforcement Learning for Dialogue Grounding, Reasoning and Planning. F/H

S’inscrire aux alertes d’offres d’emploi

Partager cette offre d’emploi