Scalable agent alignment via reward modeling

The agent alignment problem

How can we create agents that behave in accordance with the user’s intentions?

Alignment via reward modeling

Schematic illustration of reward modeling: a reward model is trained from the user’s feedback to capture their intentions; this reward model provides rewards to an agent trained with reinforcement learning.

Scaling up

Schematic illustration of recursive reward modeling: agents trained with recursive reward modeling (smaller circles on the right) assist the user in the evaluation process of outcomes produced by the agent currently being trained (large circle).

Research challenges

Challenges we expect to encounter when scaling reward modeling (left) and promising approaches to address them (right).

Outlook

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

RL Algorithms Introduction & Comparison

Generative Adversal Networks in Machine Learning

Firebase ML Kit for face-detection: A easy hack to detect faces in pictures and videos.

Evolution of Graph Computation and Machine Learning

Memorization and Deep Neural Networks

How Facebook AI used this Network to generate more data to train a Pose-Estimation Model.

(You Should) Understand Sub-Sampling Layers Within Deep Learning

How node2vec works — and what it can do that word2vec can’t

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

SimSiam in PyTorch, Part 1: The Data

Running Multiple Applications on The Same GPU

Paper Implementation: Using Unity to Help Solve Intelligence

Reinforcement learning - Implementation using SARSA