By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane Legg
Crossposted to the alignmentforum
By Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell and Jan Leike.
TL;DR: Equivalent-Policy Invariant Comparison (EPIC) provides a fast and reliable way to compute how similar a pair of reward functions are to one another. EPIC can be used to benchmark reward learning algorithms by comparing learned reward functions…
By Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik and Geoffrey Irving
Would your AI deceive you? This is a central question when considering the safety of AI, underlying many of the most pressing risks from current systems to future AGI. We have recently seen impressive advances in…
By the Safety Analysis Team: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, and Pedro A. Ortega.
TL;DR: To study agent behaviour we must use the tools of causal analysis rather than rely on observation alone. …
By Grégoire Delétang, Tom McGrath, Tim Genewein, Vladimir Mikulik, Markus Kunesch, Jordi Grau-Moya, Miljan Martic, Shane Legg, Pedro A. Ortega
TL;DR: In our recent paper we show that meta-trained recurrent neural networks implement Bayes-optimal algorithms.
One of the most challenging problems in modern AI research is understanding the learned algorithms…
By Tom Everitt, Ramana Kumar, Jonathan Uesato, Victoria Krakovna, Richard Ngo, Shane Legg
In two new papers, we study tampering in simulation. The first paper describes a platform, called REALab, which makes tampering a natural part of the physics of the environment. …
Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg
This article is cross-posted on the DeepMind website.
By Siddharth Reddy and Jan Leike. Cross-posted from the DeepMind website.
TL;DR: We present a method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.
By Tom Everitt, Ramana Kumar, and Marcus Hutter
By Pushmeet Kohli, Krishnamurthy (Dj) Dvijotham, Jonathan Uesato, Sven Gowal, and the Robust & Verified Deep Learning group. This article is cross-posted from DeepMind.com.
Bugs and software have gone hand in hand since the beginning of computer programming. Over time, software developers have established a set of best practices for…