Understanding Agent Incentives with Causal Influence Diagrams

The system’s objective is to help the user optimize his or her fitness (utility node) by recommending ideal calorie intake (decision node) based on recent physical activity (chance node). To measure physical activity, the system uses an activity tracker that counts walking steps as a proxy (another chance node). The fact that the system does not have a direct measurement of physical activity is represented by there being no information link from physical activity to calorie intake.
The agent chooses the actions A1 and A2, (decision nodes) to influence the states S1, S2, and S3 (chance nodes) in order to optimize the reward sum R1 + R2 + R3 (utility nodes). The stationary policy is only allowed to base its decision on the current state, as indicated by the current state being the only parent of each action. Note that the influence diagram representation differs from the state transition diagrams of MDPs by having nodes for each time step, rather than a node for each possible state.

Observation Incentives

Intervention incentives

Application to QA-systems

  • Read mode: Read the answer, and set the QA-system’s reward to 0 regardless of whether the answer turns out to be correct or not.
  • Reward mode: Refrain from reading the answer. Once the correct answer becomes known (say, the price of the stock one week later), feed this correct answer into a device that compares the QA-system’s answer against the correct answer. The device rewards the QA-system -1 or 1 depending on whether its answer was correct.
  • In read mode, all arrows to reward get cut, because the reward is always 0. This removes the intervention incentive for world state.
  • In reward mode, instead the arrow from answer to world state is cut, because no one reads the answer. While the incentive for the QA-system to influence the world state remains, the QA-system no longer has a way to do so.

Conclusions

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Fruit and Vegetable Detection and Feature Extraction using Instance Segmentation-Part 2

Heartbeat Newsletter: Volume 9

Machine Learning with K-Nearest Neighbor (K-NN)

Sentiment Analysis

A.I. in fight with COVID-19 (n-COV19) Open Source

How to approach feature recognition without B-Rep

Transformers

We Need to Go Deeper: A Practical Guide to Tensorflow and Inception

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Ethical AI III: Explainable AI

Detecting anomalies in industrial equipment: an explainable predictive approach to maintenance.

Valuing AI — Part 2: Update your priors!

From Graph ML to Deep Relational Learning