Publiée 25 juin 2026
PhD Position F/M PhD Position F/M: Management of complex applications on volatile heterogeneous platforms
Inria
Nancy, Grand-Est 54000, France
CDI
Contexte et atouts du poste
This PhD thesis will be in the context of a collaboration between HIVE and Loreley and Magellan Inria teams. The Ph.D student will be located at Inria Center of the University of Lorraine and will be visiting team Magellan at IInria Center of the University of Rennes and the Hive offices in Cannes.
About Hive:
Hive intends to play the role of a next generation cloud provider in the context of Web 3.0. Hive aims to exploit the unused capacity of computers to offer the general public a greener and more sovereign alternative to the existing clouds where the true power lies in the hands of the users. It relies both on distributed peer-to-peer networks, on the encryption of end-to-end data and on blockchain technology.
About Inria Center of the University of Lorraine:
The Inria Nancy - Grand Est center is one of Inria's eight centers and has twenty project teams, located in Nancy, Strasbourg and Saarbrücken. Its activities occupy over 400 people, scientists and research and innovation support staff, including 45 different nationalities. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institutes, etc.
About Inria Center of the University of Rennes:
The Inria Center of the University of Rennes is one of Inria's eight centers and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institutes, etc.
Mission confiée
In recent years, there has been increased interest in using computing resources outside of data centers. This approach allows for more efficient use of existing resources and brings applications closer to their data sources, which is critical for data-intensive operations. However, executing tasks in this environment presents new challenges that are often overlooked in traditional cloud computing literature. This PhD study will focus on how to facilitate and optimize the execution of batch jobs (e.g., MapReduce and other intensive applications) in a geo-distributed, peer-to-peer environment. The study will address the following three challenges: (i) platform heterogeneity in terms of computing, memory, and networking; (ii) resource dynamism; and (iii) node availability (churns).
Principales activités
The aim of this PhD is to provide reliable and scalable data processing on large-scale, trusted P2P systems. The recruited PhD student is expected to make innovative contributions in the following aspects:
- We first plan to study static allocation strategies from the perspective of P2P systems where, unlike clouds, resources are extremely heterogeneous. Designing a practical, realistic yet tractable model will be one of our first objectives.
- To cope with the dynamic nature of resources, allocation decisions should be made during execution - at runtime. The aim is to design a scheduling policy that adapts to resource dynamicity, even in the absence of up-to-date information about the whole P2P system. The balance between static and dynamic approaches (and its potential, as demonstrated in earlier work [1]) will be an important aspect of this work.
- We will also work on a scheduling policy when multiple tasks (that belong to different jobs) share part of their input files. For example, in the hive Net Private context, we can consider that the dataset owned by the company can be used for different purposes at the same time. Similar problems have already been addressed for compute-intensive tasks and in HPC systems [2,3], but not for data-intensive applications in highly distributed environments.
- Finally, resource dynamicity lead to performance variability and thus to stragglers (slow tasks). This prolongs the execution of applications as the execution time depends on the completion time of these tasks [4]. We plan to investigate how to detect stragglers in such heterogeneous environments and how to deal with them efficiently - at runtime, by adapting techniques such as cloning and speculative execution [4, 5].
[1] Olivier Beaumont, Thomas Lambert, Loris Marchal, and Bastien Thomas. Performance analysis and optimality results for data-locality aware tasks scheduling with replicated inputs. Future Generation Computer Systems, pages 582-598, 2020.
[2] Kamer Kaya, Bora Uçar, and Cevdet Aykanat. Heuristics for scheduling file-sharing tasks on heterogeneous systems with distributed repositories. Journal of Parallel and Distributed Computing, 67(3):271-285, 2007.
[3] Maxime Gonthier, Loris Marchal, and Samuel Thibault. Memory-aware scheduling of tasks sharing data on multiple gpus with dynamic runtime systems. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 694-704, 2022.
[4] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008.
[5] Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective straggler mitigation: Attack of the clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 185-198, 2013.
Compétences
Avantages
Rémunération
€2300 gross/month
This PhD thesis will be in the context of a collaboration between HIVE and Loreley and Magellan Inria teams. The Ph.D student will be located at Inria Center of the University of Lorraine and will be visiting team Magellan at IInria Center of the University of Rennes and the Hive offices in Cannes.
About Hive:
Hive intends to play the role of a next generation cloud provider in the context of Web 3.0. Hive aims to exploit the unused capacity of computers to offer the general public a greener and more sovereign alternative to the existing clouds where the true power lies in the hands of the users. It relies both on distributed peer-to-peer networks, on the encryption of end-to-end data and on blockchain technology.
About Inria Center of the University of Lorraine:
The Inria Nancy - Grand Est center is one of Inria's eight centers and has twenty project teams, located in Nancy, Strasbourg and Saarbrücken. Its activities occupy over 400 people, scientists and research and innovation support staff, including 45 different nationalities. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institutes, etc.
About Inria Center of the University of Rennes:
The Inria Center of the University of Rennes is one of Inria's eight centers and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institutes, etc.
Mission confiée
In recent years, there has been increased interest in using computing resources outside of data centers. This approach allows for more efficient use of existing resources and brings applications closer to their data sources, which is critical for data-intensive operations. However, executing tasks in this environment presents new challenges that are often overlooked in traditional cloud computing literature. This PhD study will focus on how to facilitate and optimize the execution of batch jobs (e.g., MapReduce and other intensive applications) in a geo-distributed, peer-to-peer environment. The study will address the following three challenges: (i) platform heterogeneity in terms of computing, memory, and networking; (ii) resource dynamism; and (iii) node availability (churns).
Principales activités
The aim of this PhD is to provide reliable and scalable data processing on large-scale, trusted P2P systems. The recruited PhD student is expected to make innovative contributions in the following aspects:
- We first plan to study static allocation strategies from the perspective of P2P systems where, unlike clouds, resources are extremely heterogeneous. Designing a practical, realistic yet tractable model will be one of our first objectives.
- To cope with the dynamic nature of resources, allocation decisions should be made during execution - at runtime. The aim is to design a scheduling policy that adapts to resource dynamicity, even in the absence of up-to-date information about the whole P2P system. The balance between static and dynamic approaches (and its potential, as demonstrated in earlier work [1]) will be an important aspect of this work.
- We will also work on a scheduling policy when multiple tasks (that belong to different jobs) share part of their input files. For example, in the hive Net Private context, we can consider that the dataset owned by the company can be used for different purposes at the same time. Similar problems have already been addressed for compute-intensive tasks and in HPC systems [2,3], but not for data-intensive applications in highly distributed environments.
- Finally, resource dynamicity lead to performance variability and thus to stragglers (slow tasks). This prolongs the execution of applications as the execution time depends on the completion time of these tasks [4]. We plan to investigate how to detect stragglers in such heterogeneous environments and how to deal with them efficiently - at runtime, by adapting techniques such as cloning and speculative execution [4, 5].
[1] Olivier Beaumont, Thomas Lambert, Loris Marchal, and Bastien Thomas. Performance analysis and optimality results for data-locality aware tasks scheduling with replicated inputs. Future Generation Computer Systems, pages 582-598, 2020.
[2] Kamer Kaya, Bora Uçar, and Cevdet Aykanat. Heuristics for scheduling file-sharing tasks on heterogeneous systems with distributed repositories. Journal of Parallel and Distributed Computing, 67(3):271-285, 2007.
[3] Maxime Gonthier, Loris Marchal, and Samuel Thibault. Memory-aware scheduling of tasks sharing data on multiple gpus with dynamic runtime systems. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 694-704, 2022.
[4] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008.
[5] Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective straggler mitigation: Attack of the clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 185-198, 2013.
Compétences
- Engineering and/or Master 2 degree in Computer science / Applied mathematics with an experience in computer networks.
- Theoretical expertise: distributed systems, optimization algorithms
- Good collaborative and networking skills, excellent written and oral communication in English
- Good programming skills
- Strong analytical skills
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
€2300 gross/month