Passer au contenu principal
Publiée 19 juin 2026

Scientific engineer in Bioinformatics (2 years): New Classification of Enzymatic Functions and Datasets for Hierarchical Annotation Using Deep Learning

Inria
Rennes, Hauts-de-France 60420, France CDI

A propos du centre ou de la direction fonctionnelle

The Inria center at the University of Rennes is one of eight Inria centers and has more than thirty research teams. The Inria center is a major and recognized player in the field of digital sciences. It is at the heart of a rich ecosystem of R&D and innovation, including highly innovative SMEs, large industrial groups, competitiveness clusters, research and higher education institutions, centers of excellence, and technological research institutes.

Contexte et atouts du poste

Large language models, such as those powering ChatGPT, have transformed natural language processing and the analysis of complex sequential data. In biology, protein sequences can be viewed as a language, opening new perspectives for functional annotation.

The ECxit project ( Exiting the EC Classification for Better Enzyme Annotation by Deep Learning ), an Inria Exploratory Action led by François Coste , focuses on enzyme function annotation. By moving beyond the traditional EC classification, it aims to develop a novel deep learning-based annotation framework built on a redesigned hierarchical classification of enzymatic functions, enabling accurate predictions directly from amino acid sequences and ultimately improving genome annotation.

The project is hosted within the Machine Learning axis of the new Bioinformatics research team BioGraphs (formerly Dyliss ) at the Inria Centre at Rennes University and the IRISA research laboratory . It benefits from the Genouest platform and from close collaborations with research groups in bioinformatics, biology, and health sciences.

Mission confiée

With the support of researchers from the BioGraphs team, the recruited person will contribute to the design and development of a novel framework for enzyme function annotation based on deep learning and protein language models.

The recruited person will be involved in the development of a new hierarchical classification of enzymatic functions, the construction of reference datasets and benchmarks, the evaluation of state-of-the-art prediction methods, and the deployment of a new annotation tool for the biological community.

Principales activités

Main activities:
  • Design and build a novel hierarchical classification of enzymatic functions by integrating information from major biological knowledge bases, including GO, EC, CAZy, Rhea, Reactome, BioCyc, and KEGG.
  • Develop a high-quality benchmark dataset for training and evaluating machine learning models for enzyme function prediction.
  • Evaluate state-of-the-art deep learning and protein language model approaches on the proposed benchmark in collaboration with machine learning researchers.
  • Construct a comprehensive reference dataset of enzyme functional annotations for large-scale model training.
  • Contribute to the development, evaluation, and deployment of a next-generation enzyme annotation tool based on deep learning approaches.

Additional Activities:
  • Contribute to scientific publications, technical reports, and software documentation.
  • Present results and project progress during team meetings, workshops, and scientific events.
  • Interact with biological end users and contribute to the dissemination and adoption of the developed tools.


Compétences

Technical skills and level required :
  • Good knowledge of bioinformatics, computational biology, molecular biology, or biological sequence analysis.
  • Knowledge of enzyme biology, enzymology, metabolism, and protein function annotation.
  • Experience with public biological resources such as GO, EC, KEGG, Reactome, Rhea, BioCyc, UniProt, or CAZy.
  • Familiarity with ontology-based data representation and annotation frameworks.
  • Ability to analyze, curate, integrate, and structure heterogeneous biological annotations from multiple data sources.
  • Experience with Python and scientific data processing.
  • Experience with Linux/Unix environments and software development best practices.
  • Experience with version control systems (Git) and collaborative software development.

Other appreciated qualifications:
  • Experience in biological data curation or annotation.
  • Experience in benchmarking biological datasets and evaluating prediction methods.
  • Contributions to scientific publications, databases, or open-source bioinformatics software.
  • Basic understanding of machine learning concepts and their application to biological data.


Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training


Rémunération

monthly gross salary from 2695 euros according to diploma and experience

S’inscrire aux alertes d’offres d’emploi