Publiée 25 juin 2026
PhD Position F/M Optimization of Pipeline and Tensor Parallelism for large Neural Networks
Inria
Talence, Nouvelle-Aquitaine 33400, France
CDI
A propos du centre ou de la direction fonctionnelle
The Inria center at the University of Bordeaux is one of the nine Inria centers in France and has about twenty research teams.. The Inria centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative SMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute...
Contexte et atouts du poste
This PhD work will take place in the Inria TOPAL team in Bordeaux, within the framework of the
CamelIA program between Inria and CEA.
The CamelIA project ambitions to develop an AI-focused harware accelerator, that can be used for
both training and inference on large neural networks. A complete software stack will be developed to
provide a compiler for the specific kernels, effectively representing the computation of the neural
network as a coarse-grain graph of tasks.
In the last years, the TOPAL team has developed a strong expertise in efficient algorithms for deep
learning networks, with a focus on rematerialization, offloading and pipeline parallelism for
memory-efficient training, and on query selection and grouping for efficient inference in LLMs.
With the rapid adoption and progress of large neural networks and particularly of Large Language
Models, the resource usage associated to both training and inference with these networks has grown
tremendously. The main performance indicators are throughput for training and latency for inference
(both in terms of time-to-first-token and time-between-tokens); in both cases the memory usage is
often the limiting factor to the size of the network that can be used. Pipeline parallelism is
routinely used to distribute the weights of the network across several resources, and multiple
approaches have been proposed to optimize the execution of the pipelines by avoiding idle time.
Mission confiée
The goal of this doctoral work is to provide optimization algorithms for low-latency and
memory-efficient inference on large neural networks. We will study how to map the task graph of the
computation onto the hardware platform and how to schedule the computations to optimize the
performance and the resource usage. In particular, we will explore a combination of Tensor and
Pipeline parallelism, including hybrid versions where both types of parallelisms are used on
different parts of the graph. We will take into account data movement and optimize data placement
and lifetime to make the best of the available resources.
An important part of the work will also be to obtain insight about the correct granularity for the
task graph, in interaction with the compiler and runtime parts of the software stack. The questions
here will be how large the individual tasks should be to allow for tractable optimization algorithms
operating in reasonable time, and which decisions should be taken statically at compile time versus
which would benefit from runtime optimization.
If possible, we will also target a training workload with the same questions of hybrid Tensor and
Pipeline parallelism and granularity exploration. Integrating the work and the proposed algorithms
into the software stack of the CamelIA project will allow us to target the CamelIA hardware when it
will be available at the end of the project.
Collaboration : In addition to the advisors Oliver Beaumont and Lionel Eyraud-Dubois, the recruited
person will be in connection with all members of the CamelIA program, specifically the STORM Inria
team (also in Bordeaux) responsible for developing the runtime, and the CORSE and CAMUS team (in
Grenoble and Strasbourg) responsible for the compiler framework.
Principales activités
Main activities:
Additional activities:
Compétences
Technical skills and level required : mathematics, programming (C/C++/Python)
Languages : French and/or English
Relational skills : teamwork
Other valued appreciated :
Avantages
Rémunération
The gross monthly salary will be 2300€ (before witholding tax)
The Inria center at the University of Bordeaux is one of the nine Inria centers in France and has about twenty research teams.. The Inria centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative SMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute...
Contexte et atouts du poste
This PhD work will take place in the Inria TOPAL team in Bordeaux, within the framework of the
CamelIA program between Inria and CEA.
The CamelIA project ambitions to develop an AI-focused harware accelerator, that can be used for
both training and inference on large neural networks. A complete software stack will be developed to
provide a compiler for the specific kernels, effectively representing the computation of the neural
network as a coarse-grain graph of tasks.
In the last years, the TOPAL team has developed a strong expertise in efficient algorithms for deep
learning networks, with a focus on rematerialization, offloading and pipeline parallelism for
memory-efficient training, and on query selection and grouping for efficient inference in LLMs.
With the rapid adoption and progress of large neural networks and particularly of Large Language
Models, the resource usage associated to both training and inference with these networks has grown
tremendously. The main performance indicators are throughput for training and latency for inference
(both in terms of time-to-first-token and time-between-tokens); in both cases the memory usage is
often the limiting factor to the size of the network that can be used. Pipeline parallelism is
routinely used to distribute the weights of the network across several resources, and multiple
approaches have been proposed to optimize the execution of the pipelines by avoiding idle time.
Mission confiée
The goal of this doctoral work is to provide optimization algorithms for low-latency and
memory-efficient inference on large neural networks. We will study how to map the task graph of the
computation onto the hardware platform and how to schedule the computations to optimize the
performance and the resource usage. In particular, we will explore a combination of Tensor and
Pipeline parallelism, including hybrid versions where both types of parallelisms are used on
different parts of the graph. We will take into account data movement and optimize data placement
and lifetime to make the best of the available resources.
An important part of the work will also be to obtain insight about the correct granularity for the
task graph, in interaction with the compiler and runtime parts of the software stack. The questions
here will be how large the individual tasks should be to allow for tractable optimization algorithms
operating in reasonable time, and which decisions should be taken statically at compile time versus
which would benefit from runtime optimization.
If possible, we will also target a training workload with the same questions of hybrid Tensor and
Pipeline parallelism and granularity exploration. Integrating the work and the proposed algorithms
into the software stack of the CamelIA project will allow us to target the CamelIA hardware when it
will be available at the end of the project.
Collaboration : In addition to the advisors Oliver Beaumont and Lionel Eyraud-Dubois, the recruited
person will be in connection with all members of the CamelIA program, specifically the STORM Inria
team (also in Bordeaux) responsible for developing the runtime, and the CORSE and CAMUS team (in
Grenoble and Strasbourg) responsible for the compiler framework.
Principales activités
Main activities:
- Propose combinatorial optimisation models and algorithms for efficient parallel execution of neural networks
- Implement/validate the proposed algorithms in the CamelIA framework
- Write scientific articles
- Present the findings in conferences/workshops
Additional activities:
- Teaching
- Internship advising
Compétences
Technical skills and level required : mathematics, programming (C/C++/Python)
Languages : French and/or English
Relational skills : teamwork
Other valued appreciated :
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
The gross monthly salary will be 2300€ (before witholding tax)