Model-Free Risk-Sensitive Reinforcement Learning

Why risk-sensitivity?

Risk-neutral policies, because of the trust they have in their knowledge, can afford to confidently “put all the eggs into the same basket”. XVI Century painting “Girl with a basket of eggs” by Joachim Beuckelaer.

How do agents learn risk-sensitive policies?

Model-Free Risk-Sensitive RL

Estimation of the value for Gaussian- (left) and uniformly-distributed (right) observed target values (grey dots). Each plots shows 10 estimation processes (9 in pink, 1 in red) per choice of the risk parameter β ∊ {-4, -2, 0, +2, +4}. Notice how the estimate settles on different quantiles.

Dopamine Signals, Free Energy, and Imaginary Foes

  • The risk-sensitive update rule can be linked to findings in computational neuroscience [6, 7]. Dopamine neurons appear to signal a reward prediction error similar as in temporal difference learning. Further studies also suggest that humans learn differently in response to positive and negative reward prediction errors, with higher learning rates for negative errors. This is consistent with the risk-sensitive learning rule.
  • In the special case when the distribution of the target value is Gaussian, then the estimate converges precisely to the free energy with inverse temperature β. Using the free energy as an optimization objective (or equivalently, using exponentially-transformed rewards) has a long tradition in control theory as an approach to risk-sensitive control [8].
  • One can show that optimizing the free energy is equivalent to playing a game against an imaginary adversary who attempts to change the environmental rewards against the agent’s expectations. Thus, a risk-averse agent can be thought of as choosing its policy by playing out imaginary pessimistic scenarios.

Final thoughts

References

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Algorithms have gender schemas too

VisionX joins Splunk’s Ecosystem as an Alliance Technology Partner

Artificial Intelligence: Everything You Want to Know

Chatbots - Introducing New Ways Of Customer Service!

Wisdom From The Women Leading The AI Industry, With Kathy Sobus of ConvergeOne

AI Technologies heading towards 2021

Artificial Intelligence is a groundbreaking technology having many unknowns

Living Forward

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Improving Machine Learning Models for Autonomous Vehicles

4 things you should know before starting a project of Edge AI on Coral TPU

AI,ML & Autonomous Networks

Thoughts | xAI | Neuromorphic Self-Driving Cars