Progress on Causal Influence Diagrams

What are causal influence diagrams?

  • Agent decisions
  • Agent objectives
  • Causal relationships in the environment
  • Agent information constraints

Incentive Concepts

  • Value of information: what does the agent want to know before making a decision?
  • Response incentive: what changes in the environment do optimal agents respond to?
  • Value of control: what does the agent want to control?
  • Instrumental control incentive: what is the agent both interested and able to control?
  • For S₁, an optimal agent would act differently (i.e. respond) if S₁ changed, and would value knowing and controlling S₁, but it cannot influence S₁ with its action. So S₁ has value of information, response incentive, and value of control, but not an instrumental control incentive.
  • For S₂ and R₂, an optimal agent could not respond to changes, nor know them before choosing its action, so these have neither value of information nor a response incentive. But the agent would value controlling them, and is able to influence them, so S₂ and R₂ have value of control and instrumental control incentive.

User Interventions and Interruption

  • Black-box optimization algorithms such as evolutionary strategies take into account all causal relationships.
  • In contrast, the update rule of Q-learning effectively assumes that the next action will be taken optimally, with no action-modification. This means that Q-learners ignore causal effects PA → Aᵢ. Similarly, SARSA with the action chosen by the agent in the TD-update assumes that it will be in control of its next action. We call this version virtual SARSA.
  • SARSA based on the modified action (empirical SARSA) ignores the effect of action-modification on the current action, but takes into account the effect on subsequent actions.

Reward Tampering

  • rewriting the source code of its implemented reward function (“wireheading”),
  • influencing users that train a learned reward model (“feedback tampering”),
  • manipulating the inputs that the reward function uses to infer the state (“RF-input tampering / delusion box problems”).

Multi-Agent CIDs


  • A convenient syntax for defining CIDs and MACIDs,
  • Methods for computing optimal policies, Nash equilibria, d-separation, interventions, probability queries, incentive concepts, graphical criteria, and more,
  • Random generation of (MA)CIDs, and pre-defined examples.

Looking ahead

  • Extending the general incentive concepts to multiple decisions and multiple agents.
  • Applying them to fairness and other AGI safety settings.
  • Analysing limitations that have been identified with work so far. Firstly, considering the issues raised by Armstrong and Gorman. And secondly, looking at broader concepts than instrumental control incentives, as influence can also be incentivized as a side-effect of an objective.
  • Probing further at their philosophical foundations, and establishing a clearer semantics for decision and utility nodes.

List of recent papers:




We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Online Events to Attend — Sprint & Summer Edition 2020

Check out this blog! Written entirely by Artificial Intelligence

Coevolution In Artificial Intelligence

How To Enhance Lead Generation With Conversational AI Bots

Responsible AI: The Human- Machine Symbiosis

AI Writes Shakespearean Plays

Artificial Intelligence, Morals, and Cars

How Blockchain Can Democratize AI Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work:

More from Medium

Newsletter #57 — IBM, Oracle and Microsoft grapple with AI in healthcare

Data Centric AI Vs. Model Centric AI: How to take maximum advantage of both.

Autopilot Ethics And The Illusory Self

AI for Disaster Response

Pictured here is the Bear Fire as it burned near Oroville, CA. The sky is orange and the hills are alight with constellation flame. A bridge fades off into the fiery distance.