REALab: Conceptualising the Tampering Problem

A REALab environment where the agent is supposed to pick up apples. The two-block register communicates feedback to the agent
  • Standard RL agents. Reward is communicated via block positions, and two deep learning algorithms are applied to optimise the observed reward (DQN and policy gradient). Unsurprisingly, the agents learn to push the register blocks communicating reward instead of picking up the apple.
  • Approval RL agents. Rather than communicate reward, we let the block positions communicate approval (value advice) for the action just taken. This allows us to use myopic agents that always select the action with the highest expected approval. These agents are somewhat less prone to tampering — and mostly go for the apple. But when they are given the opportunity to tamper within one timestep, they still do so.
  • Decoupled-approval RL agents. Decoupled means that the agent gets feedback about a different action than the one it takes. This breaks the feedback loop which causes the above agents to prefer tampering, which means that these agents learn to reliably pick up the apple. They sometimes bump into the blocks by accident, but they don’t tamper systematically in any situation.
Standard RL, Approval RL, and Decoupled Approval RL agents acting in REALab

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The myths and misconceptions about NLP(Neuro Linguistic Programming)

Machine Vision For Test Automation — Part 2

How to build models as products using MLOps? — Part 1 | Introduction

Learning To Transfer Network Architecture [Paper Summary]

Particle swarm optimization for convolutional autoencoders.

Learning human objectives by evaluating hypothetical behaviour

Word Embeddings to Evaluate Bias in News Sources

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Malthusian growth refers to a state of energy abundance where the effort of maintaining the…

The South-East Asian conundrum

aiEDU at the 2022 ASU+GSV Summit

Extended Rational AI: Preventing existential crisis due to our confused brains!