Publiée 3 juillet 2026

ML Research Engineer

White Circle

Paris, Île-de-France 75000, France CDI

TLDR: We are looking for several ML Engineers to train, post-train, and evaluate the LLMs at the core of our platform. This is hands-on modern model training work: large-scale data pipelines, SFT/RLHF/DPO-style alignment, reward models, distributed multi-GPU training, and evaluation.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies - simple natural-language rules that define what an AI model should and shouldn't do. We automatically test, enforce, and continuously improve these policies at scale.

We've raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others
We process over 100M+ API calls every month
We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We're a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built - you're the one we need.

You will:

Train and post-train LLMs for safety and moderation tasks: SFT, RLHF, DPO, and related alignment methods
Build and train reward models from human and synthetic preference data
Design and run high-throughput data pipelines: collection, synthetic generation, filtering, deduplication, and quality control at very large scale
Run distributed training on multi-GPU clusters and debug what goes wrong when it does
Build evaluation systems and benchmarks that actually measure model behavior, and use them to drive training decisions
Optimize models for production inference: quantization, speculative decoding, serving with vLLM/TensorRT or similar
Move fast from experiment to production - your models ship, and you see their effect on real traffic

You'll fit right in if you:

Have hands-on experience with modern LLM post-training - SFT, RLHF, DPO, or related methods - on models you trained yourself
Have worked with data at genuinely large scale: building pipelines for training corpora, preference data, or synthetic data generation
Have trained models on distributed multi-GPU setups and are comfortable in PyTorch or JAX
Have built or worked with reward models and preference data
Understand evaluation deeply: you know why benchmarks lie, and how to build ones that don't
Have experience optimizing inference: quantization, speculative decoding, vLLM, TensorRT, Triton, or similar
Are strong in Python and comfortable with SQL-like data tooling for large-scale data work
Have a strong ownership mindset: you can take an ambiguous modeling problem, make it concrete, ship a working model, and improve it from real feedback

A big plus:

A public builder footprint: open-source models, datasets, or training frameworks on HuggingFace/GitHub, benchmarks, papers (workshop or main conference), or technical posts with real usage
Experience training models at a frontier or near-frontier lab, or leading open-source model releases with documented adoption
Experience with RL methods for LLMs beyond standard RLHF: online RL, GRPO-style methods, or novel alignment approaches
Experience with moderation, safety, or classification models at scale
Multilingual model training experience

Why White Circle

Paid time off in line with your local regulations, no matter where you work from
Work from Paris (hybrid) with a relocation package available, or work from London (note: we are currently unable to provide relocation support and medical insurance for London-based roles)
Comprehensive medical insurance for our France-based team
All the hardware, tools, and services you need
Covered subscriptions for AI agents and IDEs
Team off-sites twice a year: we've recently been to the Alps and to Saint-Tropez

How we hire

Introductory call with HR (25 min)
Take-home test task
Technical interview with Head of Applied Research (60 min)
Final conversation with our CEO (45 min)

Please submit your application in English.

Postuler à cette offre

ML Research Engineer

S’inscrire aux alertes d’offres d’emploi

Partager cette offre d’emploi