Designing agent incentives to avoid side effects

Choosing a baseline

When choosing a baseline, it is easy to introduce bad incentives for the agent. The starting state baseline may seem like a natural choice. However, differences from the starting state might not be caused by the agent, so penalizing the agent for them can give it an incentive to interfere with its environment or other agents. To test for this interference behavior, we introduced a Conveyor Belt Sushi environment in the AI Safety Gridworlds framework.

Choosing a deviation measure

One commonly used deviation measure is the unreachability (UR) measure: the difficulty of reaching the baseline from the current state. The discounted variant of unreachability takes into account how long it takes to reach a state, while the undiscounted variant only takes into account whether the state can be reached at all.

Effects of the design choices

We compare all combinations of the three baselines (starting state, inaction, and stepwise inaction) with the three deviation measures (UR, RR and AU) with or without discounting. (Note that undiscounted AU is not included because it does not converge.) We are looking for a combination of design choices that does well on all the environments: effectively penalizing side effects in the Box environment without introducing bad incentives in the Sushi and Vase environments.

Future directions

Research into side effects, as a domain of inquiry within the broader field of AI safety, has been relatively neglected until recently. It has been encouraging to see several papers on it in the past year, including some that we did not cover in this post. Many open questions remain, from scaling up impact penalties to more complex environments to developing a theoretical understanding of bad incentives like offsetting. This research area is still in its early stages, and we hope that interested researchers will join us in working on these questions.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: