RL Foundation Models Are Coming!

21,370

967 4

Published 2023-02-26

AdA is a new algorithm out from DeepMind that combines interesting ideas like curriculum learning, meta reinforcement learning (via RL^2), model-based reinforcement learning, attention, and memory models to develop a prototype for a reinforcement learning foundation model. The results look promising, and the future of this area looks bright!

Outline
0:00 - Intro
1:07 - Example Video
2:40 - ClearML
3:48 - How It Works Overview
4:20 - Meta-Learning & RL
8:20 - Attention & Memory
9:55 - Distillation
12:01 - Auto-Curriculum Learning
15:18 - Results
27:14 - Takeaways & Future Work

ClearML - bit.ly/3GtCsj5

Social Media:
YouTube - youtube.com/c/EdanMeyer
Twitter - twitter.com/ejmejm1

Sources:
AdA Paper: arxiv.org/abs/2301.07608
Museli Paper: arxiv.org/abs/2104.06159
Primacy Bias Paper: arxiv.org/abs/2205.07802

All Comments (21)

@theodoreshachtman7360 1 year ago

This is a really high quality video, on par with 2 minute papers but with a more detail oriented approach. Also you have a lovable vibe king, keep it up
@MickGardner-vc4us 1 year ago

edan bro makes my dopamine policy gradients high everytime. fingers crossed we get open rl foundation models.
@CristianGarcia 1 year ago

Just give this environment to speed runners, watch the true potential of what humans can do with games. Thanks for the video!
@tchlux 1 year ago

Another way to frame the problem of neural network representations becoming “too specific” to learn new tasks at 25:59 is to consider exactly how the gradient of weights is computed. It’s the matrix multiplication between the directional error after a layer and the directional values before the layer. When the values become totally orthogonal to the error (they contain no information relative to the error), then it’s impossible to reduce the error by changing the weights in that layer. The reason weight randomization helps with this problem is it introduces new values after the layer that was randomized. However a much more efficient way to do this is to instead reduce the existing weights in a layer with linear regression over a representative sample of data to “pack” the good information into fewer existing neurons. Then you’re free to randomly initialize the remaining neurons, or even better to initialize weights that produce values already aligned with the directional error! I’ve got some ongoing research in this area if anyone is interested in collaborating. 🤓
@exoqqen 1 year ago

amazing breakdown, thank you for making this paper accessible to me!
@vsiegel 1 year ago

At 7:10, the first pronounciation of Muesli is right. German Müsli, Muesli may be the Swiss-German spelling.
@wpgg5632 1 year ago

Really love it !
@dragossorin85 1 year ago

Been thinking about this for some time
@billykotsos4642 1 year ago

sounds like RL is progressing? maybe I should jump back in !
@chickenp7038 1 year ago

since wandb doesn’t work for me i will actually try clearml thanks to you
@zxgrizzly3401 1 year ago

Thanks for your videos, but at 7:44, efficient zero and mu zero do not reconstruct the raw observation/image, mu zero learns it’s latent representation based on value equivalence only while efficient zero also cares about temporal consistency, so they take next observation to supervise the representation and dynamics part of the model in an unsupervised manner(simsiam)
@pauljones9150 1 year ago

Good stuff
@sitrakaforler8696 1 year ago

Wow x)
@afish5581 1 year ago

Coffee is culture too!
@Kram1032 1 year ago

I wonder if there is any benefit to be had at all from, like, across multiple full training iterations, distill a large model into a smaller one and then distill the small one back into a larger one (vs. just repeatedly distilling a large model into a model of the same size)
@zigzag4273 1 year ago

My 2nd petition on this matter. Please make a video of how you read and implement papers. Thank you *kiss*
@ch1n3du3 1 year ago

Do you think the approaches here could be applied to Dreamer V3?
@ChocolateMilkCultLeader 1 year ago

If you're ever interested in collaborations, let me know. I'd love to have you on my newsletter to cover some of your most interesting ideas.
@kemalware4912 1 year ago

I really liked vscode theme on the clear ml section. Can you share it?
@before7048 1 year ago

7:10 Myu-slee. It's a quick, easy and tasty breakfast so that you too, can be reinforced!