Publiée 9 juin 2026

Lead LLM

Licorne Society

Paris, Île-de-France 75000, France CDI

Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.
What you will own

You will be responsible for one thing:
Make our AI outputs reliable, fast, and indispensable in real workflows.
Concretely:

Design and evolve our LLM / agent architecture
Own output quality across key use cases (emails, document analysis, etc.)
Build evaluation systems (datasets, metrics, regression detection)
Drive fast iteration loops from production data
Improve retrieval, reasoning, and tool usage
Ensure production reliability (latency, failure modes, fallback)
Work directly with product + founders on what to build and why

What this role is really about

Most teams fail because:

they don't know what "good output" means
they don't have evals
they iterate randomly
they overuse agents

Your job is to fix that.
You will turn:

vague user problems
→ into structured AI systems
→ with measurable performance
→ that improve every week

What you need to be excellent at
1. Shipping real LLM systems

You've built systems used in production (not demos)
You understand RAG, tools, agents, structured outputs
You can design full pipelines, not just prompts

2. Evaluation-driven development

You know how to define quality metrics
You build datasets from real usage
You run continuous evals to prevent regressions

3. Debugging complex failures

You can trace issues across:

retrieval
prompts
model behavior

You don't guess - you isolate and fix

4. Speed of iteration

You move from problem → improvement in hours or days, not weeks
You use logs, traces, and data - not intuition alone

5. Strong judgment

You know when to:

use an agent vs a pipeline
add complexity vs simplify

You optimize for reliability and user value, not novelty

What we don't care about

Number of years of experience
Whether you've used a specific framework
Fancy research credentials

If you can build, debug, and improve real systems, you're a fit.
What success looks like (first 90 days)

Clear eval framework for core use cases
Measurable improvement in output quality
Faster iteration cycles across the team
Reduced hallucinations / failures
Stronger system architecture decisions

Stack (context, not requirements)

Python (FastAPI)
Postgres
Google Cloud
LangGraph / LangChain (evolving)
PostHog (product analytics)
Langfuse (LLM traces)
LLM APIs (Azure OpenAI)

Postuler à cette offre

Lead LLM

S’inscrire aux alertes d’offres d’emploi

Partager cette offre d’emploi