DeepMind Safety ResearchAGI Safety and Alignment at Google DeepMind: A Summary of Recent WorkBy Rohin Shah, Seb Farquhar, and Anca DraganOct 18Oct 18
DeepMind Safety ResearchGoal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct GoalsBy Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, and Zac Kenton. For more details, check out…Oct 7, 20221Oct 7, 20221
DeepMind Safety ResearchDiscovering when an agent is present in a systemNew, formal definition of agency gives clear principles for causal modelling of AI agents and the incentives they face.Aug 25, 2022Aug 25, 2022
DeepMind Safety ResearchYour Policy Regulariser is Secretly an AdversaryBy Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro A. OrtegaMar 24, 2022Mar 24, 2022
DeepMind Safety ResearchAvoiding Unsafe States in 3D Environments using Human FeedbackBy Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, and Jan Leike.Jan 21, 2022Jan 21, 2022
DeepMind Safety ResearchModel-Free Risk-Sensitive Reinforcement LearningBy the Safety Analysis Team: Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, and Pedro A…Nov 11, 2021Nov 11, 2021
DeepMind Safety ResearchProgress on Causal Influence DiagramsBy Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggJun 30, 2021Jun 30, 2021
DeepMind Safety ResearchAn EPIC way to evaluate reward functionsHow can you tell if you have a good reward function? EPIC provides a fast and reliable way to evaluate reward functions.Apr 16, 2021Apr 16, 2021
DeepMind Safety ResearchAlignment of Language AgentsBy Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik and Geoffrey IrvingMar 30, 2021Mar 30, 2021
DeepMind Safety ResearchWhat mechanisms drive agent behaviour?By the Safety Analysis Team: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus…Mar 5, 20211Mar 5, 20211