Scalable agent alignment via reward modeling

The agent alignment problem

How can we create agents that behave in accordance with the user’s intentions?

Alignment via reward modeling

Schematic illustration of reward modeling: a reward model is trained from the user’s feedback to capture their intentions; this reward model provides rewards to an agent trained with reinforcement learning.

Scaling up

Schematic illustration of recursive reward modeling: agents trained with recursive reward modeling (smaller circles on the right) assist the user in the evaluation process of outcomes produced by the agent currently being trained (large circle).

Research challenges

Challenges we expect to encounter when scaling reward modeling (left) and promising approaches to address them (right).

Outlook

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Linear Regression.

Application and Implementation of different deep learning

What is Machine Learning?

Machine Learning Pipelines: Nonlinear Model Stacking

I just passed the TensorFlow certification… here are some tips for you

Evaluating Interpretability in Machine Learning Models

New Datasets for Action Recognition

Simple Linear Regression explanation and implementation from scratch with Python

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Understanding Diffusion models: Optimization Objective

Deep Learning on Graphs with Graph Neural Network

Adaptive Aggregation Networks: Don’t Forget What You Learned

The birth of an important discovery in deep clustering