DeepMind Safety ResearchGoal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct GoalsBy Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, and Zac Kenton. For more details, check out…9 min read·Oct 7, 2022--1--1
DeepMind Safety ResearchDiscovering when an agent is present in a systemNew, formal definition of agency gives clear principles for causal modelling of AI agents and the incentives they face.4 min read·Aug 25, 2022----
DeepMind Safety ResearchYour Policy Regulariser is Secretly an AdversaryBy Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro A. Ortega7 min read·Mar 24, 2022----
DeepMind Safety ResearchAvoiding Unsafe States in 3D Environments using Human FeedbackBy Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, and Jan Leike.5 min read·Jan 21, 2022----
DeepMind Safety ResearchModel-Free Risk-Sensitive Reinforcement LearningBy the Safety Analysis Team: Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, and Pedro A…7 min read·Nov 11, 2021----
DeepMind Safety ResearchProgress on Causal Influence DiagramsBy Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane Legg12 min read·Jun 30, 2021----
DeepMind Safety ResearchAn EPIC way to evaluate reward functionsHow can you tell if you have a good reward function? EPIC provides a fast and reliable way to evaluate reward functions.9 min read·Apr 16, 2021----
DeepMind Safety ResearchAlignment of Language AgentsBy Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik and Geoffrey Irving3 min read·Mar 30, 2021----
DeepMind Safety ResearchWhat mechanisms drive agent behaviour?By the Safety Analysis Team: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus…8 min read·Mar 5, 2021--1--1
DeepMind Safety ResearchUnderstanding meta-trained algorithms through a Bayesian lensBy Vladimir Mikulik , Grégoire Delétang, Tom McGrath, Tim Genewein, Markus Kunesch, Jordi Grau-Moya, Miljan Martic, Shane Legg, Pedro A…10 min read·Dec 3, 2020----