DeepMind Safety Research – Medium

DeepMind Safety Research

DeepMind Safety Research

Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals

By Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, and Zac Kenton. For more details, check out…

Oct 7, 2022

Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals

Oct 7, 2022

DeepMind Safety Research

Discovering when an agent is present in a system

New, formal definition of agency gives clear principles for causal modelling of AI agents and the incentives they face.

Aug 25, 2022

Discovering when an agent is present in a system

Aug 25, 2022

DeepMind Safety Research

Your Policy Regulariser is Secretly an Adversary

By Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro A. Ortega

Mar 24, 2022

Your Policy Regulariser is Secretly an Adversary

Mar 24, 2022

DeepMind Safety Research

Avoiding Unsafe States in 3D Environments using Human Feedback

By Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, and Jan Leike.

Jan 21, 2022

Avoiding Unsafe States in 3D Environments using Human Feedback

Jan 21, 2022

DeepMind Safety Research

Model-Free Risk-Sensitive Reinforcement Learning

By the Safety Analysis Team: Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, and Pedro A…

Nov 11, 2021

Model-Free Risk-Sensitive Reinforcement Learning

Nov 11, 2021

DeepMind Safety Research

Progress on Causal Influence Diagrams

By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane Legg

Jun 30, 2021

Progress on Causal Influence Diagrams

Jun 30, 2021

DeepMind Safety Research

An EPIC way to evaluate reward functions

How can you tell if you have a good reward function? EPIC provides a fast and reliable way to evaluate reward functions.

Apr 16, 2021

An EPIC way to evaluate reward functions

Apr 16, 2021

DeepMind Safety Research

Alignment of Language Agents

By Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik and Geoffrey Irving

Mar 30, 2021

Alignment of Language Agents

Mar 30, 2021

DeepMind Safety Research

What mechanisms drive agent behaviour?

By the Safety Analysis Team: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus…

Mar 5, 2021

What mechanisms drive agent behaviour?

Mar 5, 2021

DeepMind Safety Research

Understanding meta-trained algorithms through a Bayesian lens

By Vladimir Mikulik , Grégoire Delétang, Tom McGrath, Tim Genewein, Markus Kunesch, Jordi Grau-Moya, Miljan Martic, Shane Legg, Pedro A…

Dec 3, 2020

Illustration of two coin-flip environments: the “fair coins” and the “bent coins” environment.

Dec 3, 2020

DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams