Research
We are building a multi-step pathway to safe advanced AI.
At the heart of our breakthrough research is the Scientist AI, a novel approach conceived by Yoshua Bengio that represents a distinctive safety-centered path towards ASI.
The Scientist AI is inspired by an ideal scientist: a mind that has internalized the laws of nature and uses them to make predictions, but without predilection about how things unfold. It is a highly intelligent machine that uses probabilistic reasoning to understand the world, but with no hidden goals or preferences. Its predictions are transparent, auditable and verifiable.
As we build towards safe advanced AI, we expect the Scientist AI to accelerate scientific breakthroughs, provide guardrails and oversight for agentic AI systems while advancing our understanding of the risks posed by AI and how to avoid them.
Featured Publication
We provide evidence that language models can detect, localize and, to a certain degree, verbalize the difference between perturbations applied to their activations. More precisely, we either (a) mask activations, simulating dropout, or (b) add Gaussian noise to them, at a target sentence. We then ask a multiple-choice question such as “Which of the previous sentences was perturbed?” or “Which of the two perturbations was applied?”. We test models from the Llama, Olmo, and Qwen families, with sizes between 8B and 32B, all of which can easily detect and localize the perturbations, often with perfect accuracy. These models can also learn, when taught in context, to distinguish between dropout and Gaussian noise. Notably, Qwen3-32B’s zero-shot accuracy in identifying which perturbation was applied improves as a function of the perturbation strength and, moreover, decreases if the in-context labels are flipped, suggesting a prior for the correct ones—even modulo controls. Because dropout has been used as a training-regularization technique, while Gaussian noise is sometimes added during inference, we discuss the possibility of a data-agnostic “training awareness” signal and the implications for AI safety.1