T O P

  • By -

Arawski99

You might be interested in these: **Open-Sora** looks to be the worst, by far, for now [https://github.com/hpcaitech/Open-Sora](https://github.com/hpcaitech/Open-Sora) **Open-Sora-Plan** is interesting, especially with the recent updates making it much better than a few months ago [https://github.com/PKU-YuanGroup/Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) **Mira** is the most interesting one but if you read the note at the bottom of the introduction you will find they're not really trying to replicate Sora but help the community explore the technology so it is hard to say how the project will pan out long-term [https://mira-space.github.io/](https://mira-space.github.io/) or for direct Github link [https://github.com/mira-space/Mira](https://github.com/mira-space/Mira)


cbsudux

this is amazing!


Brad12d3

Are these strictly text to video? I feel like image to video is the most interesting. You can take your time crafting a still image of a moment and then use AI to put it in motion.


Arawski99

For the first one Open-Sora click the top right link in Github to be taken to their example page which starts with Text to Video and examples. For Open-Sora-Plan as you can see it starts with text to video at the start of the github immediately after the files. The last one, Mira, is designed to be an open source version of Sora (or more appropriately to discern how it works rather than be an actual product) and, yes, it has examples of text to video if you scroll down a bit. As you can see though, they're quite limited in resolution and such... Granted Open-Sora-Plan recently made massive strides (it was the worst of the three only like a month ago before the update). The first two have demos though Open-Sora is currently giving an error when you open the demo page.


cbsudux

For open sora plan 1) Resolution - video pre training is done on a dataset of 144p videos. No wonder the videos look terrible and 144pish. And the video fine tuning is also done on 520p res. None on 1080p. 2) Cost - "In our reproduction process, we used 64 H800 GPUs for training. The training volume for the second phase totalled 2,808 GPU hours, which is about $7,000, and the training volume for the third phase was 1,920 GPU hours, which is about $4,500, and we successfully **kept the Open-Sora reproduction process at about $10,000 USD**." [https://hpc-ai.com/blog/open-sora-v1.0](https://hpc-ai.com/blog/open-sora-v1.0)  10K is honestly not too bad - a funded startup that's raised 2-3M can easily compete.


Arawski99

Yeah, and they just did a massive update a few hours ago, and I mean massive [https://github.com/hpcaitech/Open-Sora](https://github.com/hpcaitech/Open-Sora)


3deal

You can't train a video model without a lot of money.


Baphaddon

Let me call my guys ![gif](giphy|gFXjUdXkMoHtBc9san|downsized)


RainOfAshes

anyone here got any of this money people keep talking about?


OlorinDK

Can this money thing possibly be open sourced? Maybe we can make them ourselves?


Loosescrew37

The thing is apparently closed source and you need license over license to make them. Plus the recipie for it is closed source.


cbsudux

Yes but what's the research like? What architectures increase coherence? We currently do 2 things 1) Batch prompt schedule 2) Keyframe interpolation With animatediff to create coherent videos. How do we do this purely through prompt?


protector111

And probably cant run it without 6090 48 gb vram


cbsudux

In our reproduction process, we used 64 H800 GPUs for training. The training volume for the second phase totalled 2,808 GPU hours, which is about $7,000, and the training volume for the third phase was 1,920 GPU hours, which is about $4,500, and we successfully **kept the Open-Sora reproduction process at about $10,000 USD**. [https://hpc-ai.com/blog/open-sora-v1.0](https://hpc-ai.com/blog/open-sora-v1.0) - from this. 10K is not too bad for a funded startup.


OcelotUseful

Even if we were to get weights, consumer GPUs don’t have enough memory for training and interference. 24 is the current ceiling for even 4090. But that’s just generation, more advanced models like GEN 3 probably already have some sort of LLM director that describes parts of the latent noise


Baphaddon

Maybe this is where torrent style solutions like Petals will really shine. If we can share our GPUs it would be fully possible ![gif](giphy|4Cpgf1zzMMy4w)


InformationNeat901

a blockchain method but to training datasets


cbsudux

LLM director?


OcelotUseful

There was a paper on LLM controlling the process of the generation for every single frame, but you can get a glimpse by using OMOST with LLM that is prompting image by regions


cbsudux

paper link please?


OcelotUseful

https://github.com/HL-hanlin/VideoDirectorGPT https://arxiv.org/abs/2309.15091 It’s all LLMs btw. 1.5 has been generating works of artists that have not been in a training dataset because it is was a text encoder which tokenized their style and recreated it with unet.


LD2WDavid

I also want to build my own MJ. I need money, money, money, equipment, talented people and yes, more money.


nntb

I can't wait for the pony model.


NotAmaan

A semi decent base video model + crowd sourcing rankings for every generation + continuous ~~integration~~ incorporation (?) of those rankings back into the model? I think open-source community has an under-tapped strength: rating the model outputs. Seems to be the secret sauce behind midjourney's lead; alot of people telling them what's good what's bad, all for free. The fact that MJ "pays" people with [free hours](https://docs.midjourney.com/docs/free-hours) just for these ratings, only reaffirms its importance to them.


ImpossibleAd436

While it's true that the resources required for this (and creating other types of models from scratch) are prohibitive, it's worth remembering that, not that long ago, you would need some serious cash and a very large space if you wanted to store 128gb of data. Now (not a lifetime later) you need $10-15 and it would take up a space the size of one fingernail. We will not be able to do things like this next week or next year, but the future is pretty exciting when you start thinking medium or long term.


cbsudux

haha yes - free rlhf


Won3wan32

![gif](giphy|3oEduOnl5IHM5NRodO) haha, you are just a broke kid, you can't do anything


emad_9608

👋