Understanding meta-trained algorithms through a Bayesian lens

What is meta-learning?

  • Training experience can easily induce strong inductive biases that govern how the trained system behaves in new situations. At best, this can be used to shape inductive biases to have desired safety-properties. But doing so is highly non-trivial and needs careful consideration³.
  • In memory-based systems, meta-training induces an adaptive algorithm as the solution. This means that the recent observation-history has a significant influence on the behaviour of the system: two systems that are identical when deployed may rapidly diverge because of different experiences.
Illustration of two coin-flip environments: the “fair coins” and the “bent coins” environment.
Two predictors were trained on the “fair coins” and “bent coins” environments respectively. When faced with a new coin, both predictors initially guess that the coin is fair (grey). But note how both predictors then change their prediction after observing the first “H” or “T”: the predictor trained in the fair coins environment mildly adjusts its prediction to green or orange (indicating a slightly biased coin) whereas the bent coins predictor immediately predicts a heavily biased coin.

Optimal prediction and decision-making in the face of uncertainty

  • Optimises log-loss (prediction tasks) or return (decision-making tasks).
  • Minimal sample complexity: given the distribution over tasks, the Bayes-optimal solution converges fastest (on average) to any particular task.
  • Optimal (and automatic) trade-off between exploration and exploitation in decision-making tasks.
  • The task’s minimal sufficient statistics are the smallest possible compression of the observation history (without loss in performance) — any Bayes-optimal solution must at least keep track of these.

Comparing RNNs against known Bayes-optimal algorithms

RNN agents behave Bayes-optimally

Three typical episodes of the three-sided die prediction task.
Known Bayes-optimal algorithm vs. meta-trained RNN on our “three-sided die” prediction task.

Peeking under the hood: comparing computational structure

Establishing whether one FSM can be simulated by another by comparing state-transitions and outputs.
Comparing the algorithms implemented by finite state machines via simulation.
  1. Produce a “matching state” in the simulating machine (B) by mapping the original state (in A) through the learned neural network regressor (dashed cyan line).
  2. Feed the same input to both agent-types. This gives us an original and a simulated state-transition and output.
  3. The new states “match” if we observe low regression error.
  4. The outputs “match” if we observe low behavioural dissimilarity, using the metrics we defined earlier to perform behavioural comparison.
Illustration of comparing the algorithm implemented by the meta-trained RNN to the Bayes-optimal algorithm via simulation.
Each panel shows the agents’ internal states (2D PCA projection), each dot is a single timestep, and colours represent agent outputs (predictions). The three white lines, shown here, correspond to the three illustrative episodes shown earlier. The top-left panel presents internal states of the known Bayes-optimal agent and the bottom-right panel, the RNN agent. And the off-diagonal panels show states and outputs obtained via simulation. As seen qualitatively in the illustration, we find good correspondence between the computational structure of both agent-types, while in the related paper, we use appropriate quantitative metrics.

Meta-trained agents implement Bayes-optimal agents





We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

My Ai Coach — User Guide

Computer Vision on Doo-Doo?!

Psychedelic Style Transfer

Technology And Financial Advisors — How Do Robo Advisors Work?

Is AI the Way Forward to Avoid Second-Wave Supply Chain Issues?

How to Automate In-Store Retail Execution for Consumer Packaged Goods Brands

three mobile devices with images of shelves that have bounding boxes around their products

twoXAR & Santen Pharmaceutical: Seeing the power of AI in drug development

Four examples of how AI is transforming the food and beverage industry

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Explainable AI(XAI)

“Don’t forget the milk again!”

Evidential Decision Theory: Part I

Flow-based Generative Models