How To Build Generative AI Models Like OpenAI's Sora

79,508

1,919 0

Published 2024-03-28

If you read articles about companies like OpenAI and Anthropic training foundation models, it would be natural to assume that if you don’t have a billion dollars or the resources of a large company, you can’t train your own foundational models. But the opposite is true.

In this episode of the Lightcone Podcast, we discuss the strategies to build a foundational model from scratch in less than 3 months with examples of YC companies doing just that. We also get an exclusive look at Open AI's Sora!

Read more about the YC AI companies from this episode on our blog: www.ycombinator.com/blog/building-ai-models

Chapters (Powered by bit.ly/chapterme-yc) -
00:00 - Coming Up
01:13 - Sora Videos
05:05 - How Sora works under the hood?
08:19 - How expensive is it to generate videos vs. texts?
10:01 - Infinity AI
11:23 - Sync Labs
13:41 - Sonauto
15:44 - Metalware
17:40 - Guide Labs
19:29 - Phind
24:21 - Diffuse Bio
25:36 - Piramidal
27:15 - K-Scale Labs
28:58 - DraftAid
30:38 - Playground
33:20 - Outro

All Comments (21)

@chapterme 3 months ago

Chapters (Powered by ChapterMe) - 00:00 - Coming Up 00:49 - Intro: Generative AI for Video 01:13 - Sora Videos 05:05 - How Sora works under the hood? 08:19 - How expensive is it to generate videos vs. texts? 08:55 - How do YC companies build foundation models with just $500K? 10:01 - Demos: Infinity AI 11:23 - Sync Labs' hack to train a Lip Sync Model with a single A100 GPU 12:45 - YC deal with Azure 13:41 - How Sonauto Built a Text-to-Song Model 15:44 - Metalware: Hardware Co-Pilot 17:40 - Guide Labs: Explainable Foundation Model 18:20 - Building your own models vs. Using existing open source models 19:29 - Phind's Clever Hack: Synthetic Data 22:03 - Simulating real-world physics: Atmo (Foundational model for weather prediction) 24:21 - AI in Biology: Diffuse Bio 25:36 - Piramidal: Foundational model for the human brain 27:15 - AI in Robotics: K-Scale Labs 28:58 - DraftAid: AI Models for CAD Design 30:38 - Playground going against giants and Suhail Doshi Background 31:42 - Companies pivoting into AI 32:44 - Takeaway Message 33:20 - Outro
@BrianMPrime 3 months ago

The lipsynching on Tim Ferriss looked way off. There was a bit of an uncanny valley with the deepfake switchover as well.
@theniii 3 months ago

All you're really saying here is that people can build any foundational models as long as openai doesn't also do it. That's not very reassuring to hear. We started with words, now pictures and videos, why would anyone not expect music, robotics, hardware etc down the line?
@juanortega7509 3 months ago

I've been waiting for a new episode for weeks!! Thanks for the content guys!
@DiasporaPay 2 months ago

This is awesome thanks!
@jks234 3 months ago

20:15 I personally find the concept of synthetic data to be a fascinating spur for more neuroscientific research. People dream about what they study and are constantly reviewing problems they are working on in their head. In other words, I feel that humans use simulations in their own mind to build out the models they use to understand their world. We might be able to think of this as "generating 1000x more data" than can is directly extracted from the real world. Another example of this that was done to awesome effect is AlphaGo's self-play training.
@alejandroVigano 3 months ago

Thanks for sharing this talks!
@GusKesaranond 3 months ago

Thank you so much!!!!!!!!!!!!
@alicapwn 3 months ago

They didn’t source robotics papers for Sora’s architecture. They combined Diffusion Transformers (developed by Peebles) with the video diffusion methods released by Stability/Google/Meta/Nvidia.
@drgoldenpants 3 months ago

Are there links to the sora videos they are showing?
@atchutram9894 2 months ago

11:40 Hindi demo is perfect. My first language is not Hindi but can definitely tell it is great translation.
@xilluminati 3 months ago

̶f̶ i̶r̶s̶t̶…. no… early adopter
@bahlechonco211 3 months ago

Great insight
@fil4dworldcomo623 3 months ago

I think Sora is better positioned on imagining a new world and totally a different world than to simulate our perception of what the world is and what the world was.
@awesomeo4510 3 months ago

Yes but how do you find the datasets to train for new foundational models? Like their EEG example - how did she acquire this data to train the models?
@kog0824 3 months ago

M 17:20 here seems an interesting approach… but sorry that I am new to this AI space, what does it mean by building its own foundation model but with gpt2.5. Does it mean it fine tune through gpt2.5 with its own data?
@elliptictree 3 months ago

best episode so far.
@jess-e 3 months ago

Who can share the papers which are necessary to get to a level of understanding that is actionable? As explained in the video :)
@Alice8000 3 months ago

NICE VIDEO MY FRIENDS
@pandainvestingco 3 months ago

I love this series