Passer au contenu principal
Publiée 9 juin 2026

Lead LLM

Licorne Society
Paris, Île-de-France 75000, France CDI

Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.
What you will own

You will be responsible for one thing:
Make our AI outputs reliable, fast, and indispensable in real workflows.
Concretely:
  • Design and evolve our LLM / agent architecture
  • Own output quality across key use cases (emails, document analysis, etc.)
  • Build evaluation systems (datasets, metrics, regression detection)
  • Drive fast iteration loops from production data
  • Improve retrieval, reasoning, and tool usage
  • Ensure production reliability (latency, failure modes, fallback)
  • Work directly with product + founders on what to build and why
What this role is really about

Most teams fail because:
  • they don't know what "good output" means
  • they don't have evals
  • they iterate randomly
  • they overuse agents

Your job is to fix that.
You will turn:
  • vague user problems
  • → into structured AI systems
  • → with measurable performance
  • → that improve every week
What you need to be excellent at
1. Shipping real LLM systems
  • You've built systems used in production (not demos)
  • You understand RAG, tools, agents, structured outputs
  • You can design full pipelines, not just prompts
2. Evaluation-driven development
  • You know how to define quality metrics
  • You build datasets from real usage
  • You run continuous evals to prevent regressions
3. Debugging complex failures
  • You can trace issues across:
    • retrieval
    • prompts
    • model behavior
  • You don't guess - you isolate and fix
4. Speed of iteration
  • You move from problem → improvement in hours or days, not weeks
  • You use logs, traces, and data - not intuition alone
5. Strong judgment
  • You know when to:
    • use an agent vs a pipeline
    • add complexity vs simplify
  • You optimize for reliability and user value, not novelty
What we don't care about
  • Number of years of experience
  • Whether you've used a specific framework
  • Fancy research credentials

If you can build, debug, and improve real systems, you're a fit.
What success looks like (first 90 days)
  • Clear eval framework for core use cases
  • Measurable improvement in output quality
  • Faster iteration cycles across the team
  • Reduced hallucinations / failures
  • Stronger system architecture decisions
Stack (context, not requirements)
  • Python (FastAPI)
  • Postgres
  • Google Cloud
  • LangGraph / LangChain (evolving)
  • PostHog (product analytics)
  • Langfuse (LLM traces)
  • LLM APIs (Azure OpenAI)

S’inscrire aux alertes d’offres d’emploi