Scalable agent alignment via reward modeling

The agent alignment problem

How can we create agents that behave in accordance with the user’s intentions?

Alignment via reward modeling

Schematic illustration of reward modeling: a reward model is trained from the user’s feedback to capture their intentions; this reward model provides rewards to an agent trained with reinforcement learning.

Scaling up

Schematic illustration of recursive reward modeling: agents trained with recursive reward modeling (smaller circles on the right) assist the user in the evaluation process of outcomes produced by the agent currently being trained (large circle).

Research challenges

Challenges we expect to encounter when scaling reward modeling (left) and promising approaches to address them (right).

Outlook

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

“Sentiment Analysis via Natural Language Processing and Machine Learning”

One Class Contrastive Loss for Anomaly Detection

Deep Learning with PyTorch: Zero to GANs -Assignment 1

4 Aspects of learning to machine learn

Which activation function to choose?

Contextual multi-armed bandit — (Intuition behind Netflix Artwork Recommendation )

Extracting shift handover data from paper forms in a hospital environment

Introduction to TensorFlow with Keras API

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Scaling GNNs with Graph Rewiring

Creating Energy Efficient Deep ML Models

A Reinforcement Learning approach to dynamic pricing in an airline simulation competition

Multi-armed Bandits with Constraint