Publiée 3 juillet 2026
ML Research Engineer
White Circle
Paris, Île-de-France 75000, France
CDI
TLDR: We are looking for several ML Engineers to train, post-train, and evaluate the LLMs at the core of our platform. This is hands-on modern model training work: large-scale data pipelines, SFT/RLHF/DPO-style alignment, reward models, distributed multi-GPU training, and evaluation.
About us
White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies - simple natural-language rules that define what an AI model should and shouldn't do. We automatically test, enforce, and continuously improve these policies at scale.
We're a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built - you're the one we need.
You will:
You'll fit right in if you:
A big plus:
Why White Circle
How we hire
Please submit your application in English.
About us
White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies - simple natural-language rules that define what an AI model should and shouldn't do. We automatically test, enforce, and continuously improve these policies at scale.
- We've raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others
- We process over 100M+ API calls every month
- We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model
We're a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built - you're the one we need.
You will:
- Train and post-train LLMs for safety and moderation tasks: SFT, RLHF, DPO, and related alignment methods
- Build and train reward models from human and synthetic preference data
- Design and run high-throughput data pipelines: collection, synthetic generation, filtering, deduplication, and quality control at very large scale
- Run distributed training on multi-GPU clusters and debug what goes wrong when it does
- Build evaluation systems and benchmarks that actually measure model behavior, and use them to drive training decisions
- Optimize models for production inference: quantization, speculative decoding, serving with vLLM/TensorRT or similar
- Move fast from experiment to production - your models ship, and you see their effect on real traffic
You'll fit right in if you:
- Have hands-on experience with modern LLM post-training - SFT, RLHF, DPO, or related methods - on models you trained yourself
- Have worked with data at genuinely large scale: building pipelines for training corpora, preference data, or synthetic data generation
- Have trained models on distributed multi-GPU setups and are comfortable in PyTorch or JAX
- Have built or worked with reward models and preference data
- Understand evaluation deeply: you know why benchmarks lie, and how to build ones that don't
- Have experience optimizing inference: quantization, speculative decoding, vLLM, TensorRT, Triton, or similar
- Are strong in Python and comfortable with SQL-like data tooling for large-scale data work
- Have a strong ownership mindset: you can take an ambiguous modeling problem, make it concrete, ship a working model, and improve it from real feedback
A big plus:
- A public builder footprint: open-source models, datasets, or training frameworks on HuggingFace/GitHub, benchmarks, papers (workshop or main conference), or technical posts with real usage
- Experience training models at a frontier or near-frontier lab, or leading open-source model releases with documented adoption
- Experience with RL methods for LLMs beyond standard RLHF: online RL, GRPO-style methods, or novel alignment approaches
- Experience with moderation, safety, or classification models at scale
- Multilingual model training experience
Why White Circle
- Paid time off in line with your local regulations, no matter where you work from
- Work from Paris (hybrid) with a relocation package available, or work from London (note: we are currently unable to provide relocation support and medical insurance for London-based roles)
- Comprehensive medical insurance for our France-based team
- All the hardware, tools, and services you need
- Covered subscriptions for AI agents and IDEs
- Team off-sites twice a year: we've recently been to the Alps and to Saint-Tropez
How we hire
- Introductory call with HR (25 min)
- Take-home test task
- Technical interview with Head of Applied Research (60 min)
- Final conversation with our CEO (45 min)
Please submit your application in English.