What is ChatGPT doing...and why does it work?

2,149,177

9,745 0

Published 2023-02-17

Stephen Wolfram hosts a live and unscripted Ask Me Anything about ChatGPT for all ages. Find the playlist of Q&A's here: wolfr.am/youtube-sw-qa

Originally livestreamed at: twitch.tv/stephen_wolfram

9:55 SW starts talking

Follow us on our official social media channels.

Twitter: twitter.com/WolframResearch/
Facebook: www.facebook.com/wolframresearch/
Instagram: www.instagram.com/wolframresearch/
LinkedIn: www.linkedin.com/company/wolfram-research/

Contribute to the official Wolfram Community: community.wolfram.com/
Stay up-to-date on the latest interest at Wolfram Research through our blog: blog.wolfram.com/
Follow Stephen Wolfram's life, interests, and what makes him tick on his blog: writings.stephenwolfram.com/

All Comments (21)

@LeakedCone 11 months ago

I fell asleep with youtube on and im at this
@lailaalfaddil7389 11 months ago

The most important thing that should be on everyone's mind currently should be to invest in different sources of income that doesn't depend on the government. Especially with the current economic crisis around the word. This is still a good time to invest in various stocks, Gold, silver and digital currencies.
@martinsriggs2441 11 months ago

The teachings on this channel are always top notch so informative and easy to understand, it's very hard to find good content online these days
@dr.bogenbroom894 1 year ago

Watching this videos is a great way to review all this things and understand them again, maybe a little better. Thank you very much.
@michaeljmcguffin 1 year ago

Starts at 9:53 1:16:25 breakthrough in 2012 1:57:35 "It's crazy that things like this work"
@carson_tang 1 year ago

video timestamps 0:09:53 – start of presentation, intro 0:12:16 – language model definition 0:15:30 – “temperature” parameter 0:17:20 – Wolfram Desktop demo of GPT2 0:18:50 – generate a sentence with GPT2 0:25:56 – unigram model 0:31:10 – bigram model 0:33:00 – ngram model 0:38:50 – why a model is needed 0:39:00 – definition of a “model” 0:39:20 – early modeling example: Leaning Tower of Pisa experiment 0:43:55 – handwritten digit recognition task 0:47:40 – using neural nets to recognize handwritten digits 0:51:31 – key idea: attractors 0:53:35 – neural nets and attractors 0:54:44 – walking through a simple neural net 1:01:50 – what’s going inside a neural net during classification 1:06:12 – training a neural net to correctly compute a function 1:09:10 – measuring “correctness” of neural net with “loss” 1:10:41 – reduce “loss” with gradient descent 1:17:06 – escaping local minima in higher dimensional space 1:21:15 – the generalizability of neural nets 1:28:06 – supervised learning 1:30:47 – transfer learning 1:32:35 – unsupervised learning 1:34:40 – training LeNet, a handwritten digit recognizer 1:38:14 – embeddings, representing words with numbers 1:42:12 – softmax layer 1:42:47 – embedding layer 1:46:22 – GPT2 embeddings of words 1:47:40 – ChatGPT’s basic architecture 1:48:00 – Transformers 1:52:50 – Attention block 1:59:00 – amount of text training data on the web 2:03:35 – relationship between trillions of words and weights in the network 2:09:40 – reinforcement learning from human feedback 2:12:38 – Why does ChatGPT work? Regularity and structure in human language 2:15:50 – ChatGPT learns syntactic grammar 2:19:30 – ChatGPT’s limitation in balancing parentheses 2:20:51 – ChatGPT learns [inductive] logic based on all the training data it’s seen 2:23:57 – What regularities Stephen Wolfram guesses that ChatGPT has discovered 2:24:11 – ChatGPT navigating the meaning space of words 2:34:50 – ChatGPT’s limitation in mathematical computation 2:36:20 – ChatGPT possibly discovering semantic grammar 2:38:17 – a fundamental limit of neural nets is performing irreducible computations 2:41:09 – Q&A 2:41:16 – Question 1: “Are constructed languages like Esperanto more amenable to semantic grammar AI approach?” 2:43:14 – Question 2 2:32:37 – Question 3: token limits 2:45:00 – Question 4: tension between superintelligence and computational irreducibility. How far can LLM intelligence go? 2:52:12 – Question 5 2:53:22 – Question 6: pretraining a large biologically inspired language model 2:55:46 – Question 7: 5 senses multimodal model 2:56:25 – Question 8: the creativity of AI image generation 2:59:17 – Question 9: how does ChatGPT avoid controversial topics? Taught through reinforcement learning + possibly a list of controversial words 3:03:26 – Question 10: neural nets vs other living multicellular intelligence, principle of computational equivalence 3:04:45 – Human consciousness 3:06:40 – Question 11: automated fact checking for ChatGPT via an adversarial network. Train ChatGPT with WolframAlpha? 3:07:25 – Question 12: Can ChatGPT play a text-based adventure game? 3:07:43 – Question 13: What makes GPT3 so good at language? 3:08:22 – Question 14: Could feature impact scores help us understand GPT better? 3:09:48 – Question 15: ChatGPT’s understanding of implications 3:10:34 – Question 16: the human brain’s ability to learn 3:13:07 – Question 17: how difficult will it be for individuals to train a personal ChatGPT that behaves like a clone of the user?
@fatemehcheginisalzmann2189 1 year ago

Amazing & super helpful!!! I really enjoyed watching it and learned a lot.
@Verrisin 1 year ago

Here's a question: how much does the wording of the questions afect it's answers? - Presumably if it just tries to continue, if you make errors, it ought to make more errors after too, right? - How about if you ask with "uneducated" language vs scientific? - Rather than just affect the tone, would it also affect the contents? - What if you speak in a way it has associated with certain biases? - Who knows what kinds of patterns it has came up with, considering it "discovered" those "semantic grammars" we as humans aren't even aware of ...
@ericdefazio4197 1 year ago

this took me a few days to get through... in a good way so much good stuff here, such a great instructor... great ways of explaining and visual aids Amazed Mr. Wolfram is as generous with his time as to share his insights and be as open with everyone given he has many companies to run and problems to solve. i love engineering😊
@WarrenLacefield 1 year ago

This was the most fascinating and informative discussion, particularly, your responses to commenters! Please post the link to the paper you recently wrote (?) that inspired this live video discussion. And thank you!
@Anders01 1 year ago

Amazing presentation. If I were to experiment with machine learning I would examine small-world networks instead of layered networks. And try genetic algorithms such as randomly adjusting the network into a number of variations, then pick the best candidate and repeat the adjustment for the new candidate and continue iterating until a desired outcome is found.
@at0mly 1 year ago

starts at 9:50
@anonymous.youtuber 1 year ago

Thank you so much ! I learned more in these 3 hours than in months of watching other videos about this subject. It would be great if more knowledgeable people used youtube to share their experiences. 🙏🏻🙏🏻🙏🏻
@shrodingersman 1 year ago

Could the randomness process for choosing the next probable word within a certain temperature parameter be consigned to a quantum random process? If so, an essay could be viewed as a flat plane or an entire terrain with peaks and troughs.Within this paradigm, a certain style of writer could be viewed as a dynamic sheet, similar to how different materials when laid over a ragged topology should comply and not comply with what it is laid on top of. With this quantum process an overall view of the essay could be judged at an aesthetic level from most pleasing to least on several different qualities concurrently and not mutually exclusively making an approximate or some sort of conscious viewer
@stormos25one 1 year ago

Absolutely love these sessions!!
@williammixson2541 1 year ago

Remarkable talk, simply outstanding!
@carlhopkinson 9 months ago

Expertly explained in a way understandable to a large set if people. Bravo.
@BradCordovaAI 1 year ago

The weights are Gaussian because they are constrained to be during training via layer normalisation. It makes the gradient signal flow better.
@dockdrumming 1 year ago

At 33:49, it's interesting how the text looks more like English the longer the character length. Great video.
@JustinHedge 1 year ago

I'd Love to see more in-depth analysis like this on the current LLM topic utilizing Dr. Wolfram in this format. Exceptional content. As an aside I've really been missing the physics project live streams.