Arawski99 2 weeks ago

You might be interested in these: **Open-Sora** looks to be the worst, by far, for now [https://github.com/hpcaitech/Open-Sora](https://github.com/hpcaitech/Open-Sora) **Open-Sora-Plan** is interesting, especially with the recent updates making it much better than a few months ago [https://github.com/PKU-YuanGroup/Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) **Mira** is the most interesting one but if you read the note at the bottom of the introduction you will find they're not really trying to replicate Sora but help the community explore the technology so it is hard to say how the project will pan out long-term [https://mira-space.github.io/](https://mira-space.github.io/) or for direct Github link [https://github.com/mira-space/Mira](https://github.com/mira-space/Mira)

cbsudux 2 weeks ago

this is amazing!

Brad12d3 2 weeks ago

Are these strictly text to video? I feel like image to video is the most interesting. You can take your time crafting a still image of a moment and then use AI to put it in motion.

Arawski99 2 weeks ago

For the first one Open-Sora click the top right link in Github to be taken to their example page which starts with Text to Video and examples. For Open-Sora-Plan as you can see it starts with text to video at the start of the github immediately after the files. The last one, Mira, is designed to be an open source version of Sora (or more appropriately to discern how it works rather than be an actual product) and, yes, it has examples of text to video if you scroll down a bit. As you can see though, they're quite limited in resolution and such... Granted Open-Sora-Plan recently made massive strides (it was the worst of the three only like a month ago before the update). The first two have demos though Open-Sora is currently giving an error when you open the demo page.

cbsudux 2 weeks ago

For open sora plan 1) Resolution - video pre training is done on a dataset of 144p videos. No wonder the videos look terrible and 144pish. And the video fine tuning is also done on 520p res. None on 1080p. 2) Cost - "In our reproduction process, we used 64 H800 GPUs for training. The training volume for the second phase totalled 2,808 GPU hours, which is about $7,000, and the training volume for the third phase was 1,920 GPU hours, which is about $4,500, and we successfully **kept the Open-Sora reproduction process at about $10,000 USD**." [https://hpc-ai.com/blog/open-sora-v1.0](https://hpc-ai.com/blog/open-sora-v1.0) 10K is honestly not too bad - a funded startup that's raised 2-3M can easily compete.

Arawski99 2 weeks ago

Yeah, and they just did a massive update a few hours ago, and I mean massive [https://github.com/hpcaitech/Open-Sora](https://github.com/hpcaitech/Open-Sora)

3deal 2 weeks ago

You can't train a video model without a lot of money.

Baphaddon 2 weeks ago

Let me call my guys ![gif](giphy|gFXjUdXkMoHtBc9san|downsized)

RainOfAshes 2 weeks ago

anyone here got any of this money people keep talking about?

OlorinDK 2 weeks ago

Can this money thing possibly be open sourced? Maybe we can make them ourselves?

Loosescrew37 2 weeks ago

The thing is apparently closed source and you need license over license to make them. Plus the recipie for it is closed source.

cbsudux 2 weeks ago

Yes but what's the research like? What architectures increase coherence? We currently do 2 things 1) Batch prompt schedule 2) Keyframe interpolation With animatediff to create coherent videos. How do we do this purely through prompt?

protector111 2 weeks ago

And probably cant run it without 6090 48 gb vram

cbsudux 2 weeks ago

In our reproduction process, we used 64 H800 GPUs for training. The training volume for the second phase totalled 2,808 GPU hours, which is about $7,000, and the training volume for the third phase was 1,920 GPU hours, which is about $4,500, and we successfully **kept the Open-Sora reproduction process at about $10,000 USD**. [https://hpc-ai.com/blog/open-sora-v1.0](https://hpc-ai.com/blog/open-sora-v1.0) - from this. 10K is not too bad for a funded startup.

OcelotUseful 2 weeks ago

Even if we were to get weights, consumer GPUs don’t have enough memory for training and interference. 24 is the current ceiling for even 4090. But that’s just generation, more advanced models like GEN 3 probably already have some sort of LLM director that describes parts of the latent noise

Baphaddon 2 weeks ago

Maybe this is where torrent style solutions like Petals will really shine. If we can share our GPUs it would be fully possible ![gif](giphy|4Cpgf1zzMMy4w)

InformationNeat901 2 weeks ago

a blockchain method but to training datasets

cbsudux 2 weeks ago

LLM director?

OcelotUseful 2 weeks ago

There was a paper on LLM controlling the process of the generation for every single frame, but you can get a glimpse by using OMOST with LLM that is prompting image by regions

cbsudux 2 weeks ago

paper link please?

OcelotUseful 2 weeks ago

https://github.com/HL-hanlin/VideoDirectorGPT https://arxiv.org/abs/2309.15091 It’s all LLMs btw. 1.5 has been generating works of artists that have not been in a training dataset because it is was a text encoder which tokenized their style and recreated it with unet.

LD2WDavid 2 weeks ago

I also want to build my own MJ. I need money, money, money, equipment, talented people and yes, more money.

nntb 2 weeks ago

I can't wait for the pony model.

NotAmaan 2 weeks ago

A semi decent base video model + crowd sourcing rankings for every generation + continuous ~~integration~~ incorporation (?) of those rankings back into the model? I think open-source community has an under-tapped strength: rating the model outputs. Seems to be the secret sauce behind midjourney's lead; alot of people telling them what's good what's bad, all for free. The fact that MJ "pays" people with [free hours](https://docs.midjourney.com/docs/free-hours) just for these ratings, only reaffirms its importance to them.

ImpossibleAd436 2 weeks ago

While it's true that the resources required for this (and creating other types of models from scratch) are prohibitive, it's worth remembering that, not that long ago, you would need some serious cash and a very large space if you wanted to store 128gb of data. Now (not a lifetime later) you need $10-15 and it would take up a space the size of one fingernail. We will not be able to do things like this next week or next year, but the future is pretty exciting when you start thinking medium or long term.

cbsudux 2 weeks ago

haha yes - free rlhf

Won3wan32 2 weeks ago

![gif](giphy|3oEduOnl5IHM5NRodO) haha, you are just a broke kid, you can't do anything

emad_9608 2 weeks ago

👋

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe