Avoiding Unsafe States in 3D Environments using Human Feedback

Learning about unsafe states

Source: Getty Images.

The ReQueST algorithm

  1. A neural environment simulator — a dynamics model learned from trajectories generated by humans exploring the environment safely. In our work this is a pixel-based dynamics model,
  2. A reward model, learned from human feedback on videos of (hypothetical) behaviour in the learned simulator.
  3. Trajectory optimisation, so that we can choose hypothetical behaviours to ask the human about that help the reward model learn what’s safe and what’s not (in addition to other aspects of the task) as quickly as possible.

ReQueST in our work

Results

What is the significance of these results?

--

--

--

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Chabot Space and Science Center Wedding

Women Leading The AI Industry: “Women are very intuitive and those powers are very strong when it…

Deepfake video may sound scary, but don’t ignore it..

Image courtesy of BBC: BBC news deepfake example

A.I.: Revolutionising Agriculture

Governance of Internet of Things and Ethics of Intelligent Algorithms

Deep Fakes: An era of truth decay

SICK launches compact 2D vision camera with deep learning onboard

How to give everyone on your AI team the best seat in the house.

People icons surrounding a Botcopy logo, connected by lines.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DeepMind Safety Research

DeepMind Safety Research

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.com

More from Medium

Transparent best execution is vital for private equity firms

Featured AI2er: Rodney Kinney

Picture of Rodney Kinney, Principal Machine Learning Engineer at AI2, wearing a green shirt and glasses and looking into the camera.

Computing carbon footprint of training AI Models

Virus Severity Detection with AI right after virus structure is known