Progress on Causal Influence Diagrams

What are causal influence diagrams?

  • Agent decisions
  • Agent objectives
  • Causal relationships in the environment
  • Agent information constraints

Incentive Concepts

  • Value of information: what does the agent want to know before making a decision?
  • Response incentive: what changes in the environment do optimal agents respond to?
  • Value of control: what does the agent want to control?
  • Instrumental control incentive: what is the agent both interested and able to control?
  • For S₁, an optimal agent would act differently (i.e. respond) if S₁ changed, and would value knowing and controlling S₁, but it cannot influence S₁ with its action. So S₁ has value of information, response incentive, and value of control, but not an instrumental control incentive.
  • For S₂ and R₂, an optimal agent could not respond to changes, nor know them before choosing its action, so these have neither value of information nor a response incentive. But the agent would value controlling them, and is able to influence them, so S₂ and R₂ have value of control and instrumental control incentive.

User Interventions and Interruption

  • Black-box optimization algorithms such as evolutionary strategies take into account all causal relationships.
  • In contrast, the update rule of Q-learning effectively assumes that the next action will be taken optimally, with no action-modification. This means that Q-learners ignore causal effects PA → Aᵢ. Similarly, SARSA with the action chosen by the agent in the TD-update assumes that it will be in control of its next action. We call this version virtual SARSA.
  • SARSA based on the modified action (empirical SARSA) ignores the effect of action-modification on the current action, but takes into account the effect on subsequent actions.

Reward Tampering

  • rewriting the source code of its implemented reward function (“wireheading”),
  • influencing users that train a learned reward model (“feedback tampering”),
  • manipulating the inputs that the reward function uses to infer the state (“RF-input tampering / delusion box problems”).

Multi-Agent CIDs


  • A convenient syntax for defining CIDs and MACIDs,
  • Methods for computing optimal policies, Nash equilibria, d-separation, interventions, probability queries, incentive concepts, graphical criteria, and more,
  • Random generation of (MA)CIDs, and pre-defined examples.

Looking ahead

  • Extending the general incentive concepts to multiple decisions and multiple agents.
  • Applying them to fairness and other AGI safety settings.
  • Analysing limitations that have been identified with work so far. Firstly, considering the issues raised by Armstrong and Gorman. And secondly, looking at broader concepts than instrumental control incentives, as influence can also be incentivized as a side-effect of an objective.
  • Probing further at their philosophical foundations, and establishing a clearer semantics for decision and utility nodes.

List of recent papers:




We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

China Paving the Way for Artificial Intelligence

Unleash The Role Of AI In Clinical Dentistry

Robot Citizenship and Artificial Consciousness

We are in an era of “auto-optimization”

AI Without BS — Newsletter #3

Reading Reflection 8

Data Collection Strategies for Building an AI Chatbot

Artificial Intelligence Without the Utopian Promise-land and Dystopian Armageddon

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work:

More from Medium

Integrating Self-Attention and Convolution: Tsinghua, Huawei & BAAI’s ACmix Achieves SOTA…

Predictions and hopes for Geometric & Graph ML in 2022

Thermal Modeling of Buildings using Physics Informed Neural Networks