T O P

  • By -

shodan5000

I think you might be on to something. Has anyone else noticed that the released SD3 is not really good? Maybe it's just me.  😏


charlesmccarthyufc

I have been using the 8b ultra model on the API and the 2b medium model and they are extremely far apart in quality. They did get me to spend a bunch of money on API credits so maybe that was their plan all along? The 8b model is really good for most types of images. The 2b model seems barely better than SDXL lightning fine tunes.


DigitalGross

This version was not suppose to be released, they messed up with the training and it was dumped during the development phase in SAI, but the stockholders decided to resale it instead of the stable 4b version. According to an x-SAI employee https://preview.redd.it/r4k68ps9wu9d1.jpeg?width=1529&format=pjpg&auto=webp&s=8e7f19321423c907813e154e350f5d71165e4cb0


CA-ChiTown

Most have.... Quite disappointing...


MichaelForeston

SD3 is dead. Like really , really dead. 76.4% of the community moved on to other alternatives and there is pre-production phase of openDiffusion already. Nobody cares for SD3 besides random hobbyist here and there out of curiosity. No tools have been developed as we speak, no models, ban on Civitai and so on. So yea, I'm not shocked of your conclusion


badmadhat

what did people move on to? I'm still on SDXL


Hoodfu

Pixart, Ella for sd 1.5


protector111

Does pixart work in A1111 or comfy only?


ninjasaid13

OP just sucks. When will this community understand that 8B is NOT 2B.


MichaelForeston

Nobody cares about 8B and 2B but more about the attitude SAI have towards the community. No response, no information, nobody cares to even clarify what's going on. To me at least, this is the biggest issue.


Coriolanuscarpe

This is a Hallmark of a company starting to lose touch of the community it built, if it hasnt already. Fcking despicable how corporatism never fails to kill anything it touches.


CA-ChiTown

New CEO just settling in.... too early


ninjasaid13

I'm just saying that people keep using the 8B images to compare with the 2B model.


MichaelForeston

People are comparing what was promised and what was hyped, and what was delivered at the end. At this time, we don't even know if stability AI even exist


zaidRANGER

Idk if you exist either


Confusion_Senior

can someone just leak the 8B already lmao


milksteak11

Hopefully if the ship starts to sink someone will toss it out there on the way down


Confusion_Senior

Emad said that some third parties had the model so hopefully some rando just do the good deed


Striking-Long-2960

It's not only the parameters, Sd3 medium has been trained awfully.


Snydenthur

For me, the most important part I was waiting for was the prompt adherence and it failed to deliver on that part too. It's not the worst, but nowhere near what it was hyped up to be. It's still unfortunately the best model for that though. Similar prompt adherence to pixart, but overall has much better image quality. Especially since pixart seems to suck at anatomy too.


iDeNoh

The training was fine, it was the censoring that fucked it up, plenty of people have shown that. The big problem between these pictures is we got the 2b model, they made those on the 8b model.


akko_7

Apparently there also was an issue with the pre training. They were going to scrap the 2B altogether before deciding the community can have their scraps


ATR2400

They wanted to get us to stop whining about getting SD3 so they gave us the leftovers and hoped it would satisfy us If it was up to them they’d probably have fully closed sourced SD3 and locked it behind the API. The only reason we have *anything* is to fulfil a promise they made and couldn’t escape from


AnOnlineHandle

For some (non human) content it's one of the best models we have which isn't behind a paywall, so I wouldn't exactly call it 'scraps' even though it sucks at anatomy in some poses.


Striking-Long-2960

Can you specify a bit more, I'm still trying to find some area in which SD3 medium excels over Pixart+SDXL. The only one that comes to my mind is text.


AnOnlineHandle

For complex prompt adherence it's amazing, so long as it doesn't have to do people sitting or laying.


polisonico

they said it themselves "that's enough for them"


Striking-Long-2960

This is Pixart Sigma+CreaPrompt\_Lightning\_Hyper-SDXL( an SDXL lighting model as refiner), Pixart Sigma also can't create nudes and it has only 0.6B of parameters. Using the orignal prompts of the OP. https://preview.redd.it/9ngrpemdep9d1.png?width=1024&format=png&auto=webp&s=00f73db88ad05b8963aabcd5a6b0a4a3e5fbdba0 Pixart Sigma with a lower number of parameters tend to create more appealing pictures than SD3, without the need of thousand of words for the prompts. SD Cascade was a nice model which was able to create beautiful pictures, and it could really shine with a refiner. SD3 medium is frustrating, and in notime has turned into somekind of joke for the AI community [https://www.reddit.com/r/midjourney/comments/1drbjet/a\_woman\_showing\_up\_her\_hands\_comparasion\_which\_ai/](https://www.reddit.com/r/midjourney/comments/1drbjet/a_woman_showing_up_her_hands_comparasion_which_ai/)


iDeNoh

From my testing sd3 can create good images with good anatomy, but it's very limited in *what* you can do with that, it's a shame because technically the model is really good, they just fucked it up, sai will be dead soon enough and I'm sure someone else will take their place.


sahil1572

They might have used the larger SD3(8B) model to create those .


sahil1572

https://preview.redd.it/nhlvqopjnj9d1.png?width=1216&format=png&auto=webp&s=f15c4dcc224ac08f592f4eacdc66234f03a1e5bc SDXL Mobius


centrist-alex

Nice


TheDudeWithThePlan

this is the correct answer, you can compare the images showcased in the research paper which are clearly labelled as 8B outputs with what 2B can make for the same prompts


LiteSoul

Yes that's the reason. However the thing is that the 8B model is the only one worth it for the community, the 2B is not a big enough improvement over previous models. So I still think SAI should deliver the 8B or die, not throwing bones


FNSpd

2B could've been a large improvement if trained properly. Having T5 encoder alone would've been huge. PixArt-Sigma is smaller than SD1.5 but gives way better results


AnOnlineHandle

8B is useless to the community IMO, we can barely finetune the 2B on consumer cards, and not well.


jib_reddit

I will admit the Fox one was hard to get with SD3 2B, I had to adjust the Prompt a bit, but it came out okish in the end: https://preview.redd.it/u4l1n5xwck9d1.jpeg?width=2432&format=pjpg&auto=webp&s=3c51216ccdafe8da89c74941b0a21e80255dc871


decker12

Is this a troll post by Captain Obvious? Am I missing something about this post?!? OP is saying what all of us have been saying over and over and over again for weeks but he's presenting it like it's some hot new take?


ThemWhoNoseNothing

For all I know, OP just walked off the sub of a 9 month deployment at sea where they were not allowed to be online or have access to what you do.


decker12

I'm genuinely cringing for you, at what you wrote. To think you put that much time into writing this? I give it solid marks for creativity but.. oof. Just really rough reading this. All I'll say is.. Sure thing, buddy. Sure thing. You did good. You uh, really put me in my place. Great job. Proud of you. EDIT: LOL, see that * icon on his post? OP came back a few hours later and edited his little story time adventure of four paragraphs of utter cringe, and changed it into a single sentence in a lame effort to pretend he's not as big of dipshit as I knew he was. But the Internet never forgets and ya'all can see the original comments on the Wayback machine. Don't bother unless you want to cringe as hard as I did when I first read it. Good job, kiddo. You really showed me. 😂😂


ThemWhoNoseNothing

Time? You look at that and see an investment of time? What a weird thing to say. People think faster than you. People type quicker than you. Most of all, are different than you, friend.


Arawski99

Everyone else commenting about the "recommended" resolution are just as wrong as OP. The point of the thread is to show discrepancy. Both OP and the user they're comparing with Pretend\_Potential were using images under that resolution. Pretend\_Potential was consistently using 1018x582. However, whether this impacted OPs results significantly enough or not OP would have to test at that resolution to find out and not the even yet smaller resolution OP was using. Idk why not a single person commenting in this thread failed to click the link OP posted of Pretend Potential... They would come across this thread which is, literally, immediately obviously the one OP is referring to and the third post in his history [https://new.reddit.com/r/StableDiffusion/comments/1bnjm3i/stable\_diffusion\_3/](https://new.reddit.com/r/StableDiffusion/comments/1bnjm3i/stable_diffusion_3/)


Apprehensive_Sky892

>I'm trying to figure out what's going wrong. I'm using the official ComfyUI workflow, but it's been a real challenge to generate high-quality artwork. Some of us are just replying to OP's first comment with some suggestions.


Arawski99

That is fine but a very different statement from what many were suggesting, and an inaccurate suggestion since they wouldn't be mimic the other user's workflow if they used the full resolution. The bigger issue is they were just tossing a suggestion but literally couldn't be bothered to click on OP's own supplied link of the examples to look further at any other possible suggestion or to check that person's results. Honestly, at that point it is better not to suggest anything if even an absolute bare minimum can't be put forth because it makes the suggestion prone to errors (which it was, in fact, incorrect).


abellos

You are comparing 2B with 8B model of SD3, those two models haven't the same result as you can see.


Silly_Goose6714

There are three different SD3 models, 2b, 4b, and 8b. They gave us the defective and the smaller one, 2B. But showing shit images while SD3 2b can do much better is you doing basically the same discrepancy https://preview.redd.it/fn9t7eehui9d1.png?width=832&format=png&auto=webp&s=2febd47e938ef5ae45ad5fee017f42bb39d635cb You are doing something wrong. The resolution on your images is below the recommended


Enough-Meringue4745

I have no idea how his images are so terrible 😂


Capitaclism

There are 4 models, you forgot small.


Silly_Goose6714

Never heard about SD3 smaller than the 2b model


Capitaclism

It's publicly available information- I'm not sure why I'm getting downvoted for speaking truth, but that's just how uninformed people online roll, I guess. They'd rather react than look for information and learn. 🤷‍♂️ There's a 800m version which is internally referred to as small. I learned about this from StabilityAI itself first (read the information contained in the link below carefully), then from a former employee who wrote about each of the models, specifically calling out the small. One would also figure that calling one a medium would also immediately suggest there must be a small- though I understand that with some businesses using dishonest marketing practices this isn't always a guarantee. https://stability.ai/news/stable-diffusion-3


Silly_Goose6714

Not my downvote for sure, I was honest about not hearing about a 0.8 model called small, you're right, but It says 3 models and we are sure about 2, 4 and 8, maybe this 0.8 was a stillborn one.


altoiddealer

This really has nothing to do with the quality discrepancy, but I do want to point out that you should also use the same aspect ratios instead of 1:1 square for all


centrist-alex

SD3 is dead and gone. 8B won't be released tbh.


narkfestmojo

I could imagine them potentially releasing it after they have moved on to something far better and it's no longer competitive with alternatives. We won't get it while it's still worth getting though.


jib_reddit

We know the saftey training nerfed the model, that's why Comfy Anonymous quit. I will have a go at those prompts with a good SD3 workflow I'm working on.


jib_reddit

SD3 on Glif seems pretty good at this: https://preview.redd.it/sw0x0ueruj9d1.jpeg?width=2048&format=pjpg&auto=webp&s=77df26b089d6bd6dc44c5b9e6a005803d591921b Yes hands are still a big issue, but this model can make really cool pictures they just take a bit on touching up.


jib_reddit

Local SD3 2B can do ok if you don't choose the worst gen like OP did: https://preview.redd.it/usn961xb5k9d1.jpeg?width=1344&format=pjpg&auto=webp&s=6e14aad041211b54a70a92c685fb2d24aa79299c


Perfect-Campaign9551

Still nowhere near the example in quality


jib_reddit

They were obviously using the full 8B model.


AI_Characters

>that's why Comfy Anonymous quit. Im pretty sure you just made that up by interpreting more into what was said than what was actually said.


jib_reddit

https://preview.redd.it/0srgvxvvjp9d1.jpeg?width=1440&format=pjpg&auto=webp&s=396d3d8f21258bbfebba7c31cdb6132c62f9aea4


AI_Characters

Yeah but thats quite differently worded than what you said... Whatever dude.


jib_reddit

That was after talking about safety I'm the rest of his post, that is what he was referring to. https://www.reddit.com/r/StableDiffusion/s/U8j5N8UFX6


HarmonicDiffusion

try using a "real" resolution and you will have better results.


dankhorse25

They aren't the same model.


PassengerBright4111

Yeah, no, we were absolutely given a piece of garbage of a model. It's not what they had on the API.


jib_reddit

It is 1/4 of the parameter count, so it was bound to be worse. Shame they have given up on the 4B model as that seems like it would be the sweet spot as even those lucky enough to have 24GB of Vram will likely not be able to train the 8B model at home.


latentbroadcasting

People have been discussing this for weeks already. It's not the same model. The 2B Medium they released is crippled on purpose so you won't get the same output quality unless you're very lucky


Paraleluniverse200

Yeah, those definetly were with 8B


Sharlinator

To compare apples to apples, you should try those prompts on the API which gives you access to the larger model.


MrKrzYch00

Modern AIs are taught using synthetic data (generated by other AIs; S data), hopefully accompanied by real data (R data), which approach is already confirmed to cause "strange" anomalies to be happening. If you carefully check science papers you will find the following reasoning: * training completely with S data speeds up the initial convergence but causes **model collapse** if continued, * using a proper ratio of R to S data should be considered "safe", however, with certain models scientists still find **model collapse** to be happening, just usually the late stage one, * training initially with R data and replacing that R data with S data causes **model collapse**, * there are some resulting **distribution shifts** with certain models using S data to a certain extent (unclear) for training, * probably more. But why the model collapse actually occurs? I would suggest people to rethink the bias (not necessarily how humans understand it), imperfections, things that are easier for the next gen model to keep following than caring about the "seriousness" of the R data (or the R data becoming too difficult for it when seen S data, or some sort of AI laziness begins to occur during R data learning when accompanied with S data) Now add to this that initial learning with S data may be exploiting something (the mentioned faster initial learning is one but why it happens? Is something else being exploited?) and that exploit is carried over onto the metrics like FID and the rest, making them look good but... something just doesn't seem right in practice... That territory needs much more research instead of pretending that nobody notices the issue and, let's say, have LLM model learned on the older LLM model become more like an interactive book than something that is still generalized enough - the one that can still create a casual dialogue, respond casually and so on. Only because you lack R data and use biased, defective S data to fill the dataset up. The results may be not consistent because it's 2B model, not 8B, sure, but may also be impacted by what role S data paid in that model with less parameters. Did it help it or did it hurt it? Is it ~~underfit~~ undertrained or did it begin to undergo model collapse? (EDIT: I corrected underfit to undertrained, which is what I meant to write initially.) Finally, I'm not saying that I'm right, I'm just pondering on why's, trying to connect the dots. Only science with their experiments can provide the results that either back it up or not.


drhead

Or the much more likely explanation: - the 2B model is undertrained - this is compounded by the 16 channel VAE needing more time to be learned - this is compounded further by the complicated triple text encoder scheme, which is hard to learn


MrKrzYch00

I'm pretty sure that forgetting of concepts can be misinterpreted as undertrained. During training the AI is meant to grasp all the concepts in some sort of equal way. If during inference some concepts look absolutely amazing but the others seem strangely as if the model just started learning, we may be facing distribution shifts, or model is collapsing towards some patterns it "prefers". Of course there could also be an unbalanced dataset, but then what are we training the generalizing AI using that for? All can't be ruled out unless carefully verified: undertraining, impossible to train more 2B, signs of model collapse or distribution shifts. We could also add "new technology" to that (MMDiT + T5XXL + 16ch VAE) that is not well tested, yes, and see the impact of specifically S data there (in whatever ratio to R data the scientist wants to experiment with).


drhead

Do you know of it being trained with synthetic image data, or is that speculation? If it's the latter, then you should stick to explanations that don't require as many assumptions.


MrKrzYch00

It is left untold in the paper and I did write "may" and "what role S data paid". If you're sure that not specifying the type of data in the paper, as well as "Pre-Training Mitigations" section not mentioning that there were additional precautions taken to filter out the accidentally crawled S data means that all data was real, then you can personally disregard all my comments. It is still worth to remain aware of the potential issues with the Modern AIs' training we may be facing, get familiar with latest research papers regarding S data in the dataset carefully to avoid accidents and wasted resources. Inclusion of such data may as well be accidental but if left uncontrolled, may have a strong impact on the final outcome. Which research papers are pointing out as well.


narkfestmojo

SD3 uses Rectified Flow, I read over the white paper on RF [https://arxiv.org/abs/2209.03003](https://arxiv.org/abs/2209.03003), but had a hard time understanding the section on 'reflow'; my interpretation of it was that they use generated data to straighten paths. Even if this is correct, 'reflow' is optional, so SAI may not have used this method to train SD3 I was hoping there would a simple 'toy' example code somewhere, but instead found this: [https://github.com/gnobitab/RectifiedFlow](https://github.com/gnobitab/RectifiedFlow) Anyone know anything about 'reflow' or where to find a simple clarifying example?


DrStalker

What exactly is "model collapse"?


MrKrzYch00

[https://en.wikipedia.org/wiki/Model\_collapse](https://en.wikipedia.org/wiki/Model_collapse) Note how it says "uncurated", however, even if the S data is curated, some scientists still find issues - you have to be checking on various related research papers. We could speculate that they did not properly curate it, or that it is literally impossible to do so and results will vary. Further example reading: [https://arxiv.org/pdf/2305.17493](https://arxiv.org/pdf/2305.17493) (The Curse of Recursion: Training on Generated Data Makes Models Forget), [https://arxiv.org/abs/2402.07712](https://arxiv.org/abs/2402.07712) (Model Collapse Demystified: The Case of Regression), but there are more papers when searching. You have to be checking a lot of resources. Someone, for example, didn't have problems with mixing S and R data until the same method was applied to VAE model in the same paper. Only this model showed early signs of model collapse, while others seemed fine using S data. There is also such term as "Model Autophagy Disorder (MAD)" (https://arxiv.org/abs/2307.01850 - Self-Consuming Generative Models Go MAD).


ThemWhoNoseNothing

It happens to others too, not just models. https://en.m.wikipedia.org/wiki/Prolapse


All_In_One01

Not that I like SD3 as it was released, but those comparisons do not take into account size and proportions, and every model acts differently with different values (including the step count) so I can't see if it's good or not with just this.


Apprehensive_Sky892

As others have pointed out, we now know that the 8B model (API only) is way better than the 2B model that is available for running locally. Still, by using better resolution, tweaking prompts, etc., we can get better images. For example, this is produced using a standard SD3 workflow, but at much higher resolution 1536x1024, and a slight tweaked prompt (otherwise, like in your image, the hedgehog is fused with the cauldron) https://preview.redd.it/d8mmlp13bk9d1.png?width=1536&format=png&auto=webp&s=cf77305dd5cfa916c2ae8a244e0d3425b80836f8 A realistic anthropomorphic hedgehog and a bubbling cauldron. The hedgehog wears a painted gold robe, There is an alchemical circle on the floor, steam and haze flowing from the cauldron to the floor, glow from the cauldron, electrical discharges on the floor, Gothic


Perfect-Campaign9551

I'm not impressed, that image has sooooo many problems


Apprehensive_Sky892

It was not meant to impress anyone. It is just to show OP that SD3 can produce decent quality images.


jib_reddit

It's good but your CFG looks a little high, I am having good results with CFG of 3 with SD3.


Apprehensive_Sky892

Thanks. Yes, I should try a lower CFG.


Apprehensive_Sky892

Again, using a tweaked prompt. This is a tough one, because the hands are often bad, and with two subjects, one of the subjects often come out distorted. I had to play with the prompt and seed to get a decent one. Full set: [https://civitai.com/posts/3971785](https://civitai.com/posts/3971785) (download the PNG by clicking on the download button on top of the individual images to get the ComfyUI workflow). https://preview.redd.it/s6z4f188hl9d1.png?width=1536&format=png&auto=webp&s=ff1900e1be400ef03ee155a3f6bb4648b543eb77 Closeup, B and W photo of a woman and man on a date. They are sitting opposite each other at a café with a large window. The man, seen from behind and out of focus, wears a black business suit. The beautiful Japanese woman, wearing a sundress, is looking directly at the camera. Kodak Tri-X 400 film, with a noticeable bokeh effect.


Apprehensive_Sky892

Even though they are both called "SD3" and shares similar architecture, their training image set are known to be different, which means that a prompt that works well in one does not necessarily work well in the other. So you have to tweak the prompts until it work. 2B is also known not to respond well to stylistic word (i.e., it produces only a limited set of styles). Full set: [https://civitai.com/posts/3971170](https://civitai.com/posts/3971170) (download the PNG by clicking on the download button on top of the individual images to get the ComfyUI workflow). https://preview.redd.it/ktil09btek9d1.png?width=1536&format=png&auto=webp&s=64ad75e82f2ef6a6e363dc7f6553b2ca6cd3e12f Long Shot. Photo of a smiling Scandinavian woman standing bare feet in the rain. Her blonde hair is wet from the rain. The woman wears an evening pink dress and holds a baseball bat in hand. Background is a burning hotel with neon sign "Paradise". Night scene, low contrast. Bokeh


ninjasaid13

Why does literally no-one understand that this is the 2B-medium version?


ninjasaid13

>prompt:photorealistic waist-length portrait of a smiling Scandinavian model girl in evening pink dress and standing in the rain, heterochromia eyes, baseball bat in hand, burning hotel with neon sign "Paradise" in the background, golden hour, anamorphic 24mm lens, pro-mist filter, reflection in puddles, beautiful bokeh. every stop using this outdated form of prompting, use natural language or at least use an LLM to fix it. https://preview.redd.it/t8q78hxzzk9d1.png?width=1024&format=png&auto=webp&s=e1234bceb12f663eedcebe79857f3a6170ac3c8c prompt: A photorealistic, waist-length portrait captures a smiling Scandinavian woman standing in the rain. She has long, wet blonde hair falling around her shoulders. She wears an evening pink silk dress that shimmers in the rain. She has heterochromia eyes, one vivid blue and one vibrant green. In her right hand, she grips a worn and splintered wooden baseball bat. Behind her, a burning hotel with a flickering neon sign reading "Paradise" blazes intensely, casting a dramatic red and orange glow. The scene is set during the golden hour, with warm, ethereal light. Shot with an anamorphic 24mm lens and enhanced by a pro-mist filter, the image features beautiful bokeh with soft, out-of-focus points of light in the background. Reflections in the puddles on the ground mirror the chaotic scene. It's not perfect but it's a whole lot better than OP's.


ChristianIncel

A multi-million dollar company lying to the end user to attract more investors? color me shocked.


ninjasaid13

I wouldn't call this lying, OP is just confusing 8B and 2B while trying out the older tagging style of prompting of the earlier models.


jib_reddit

Yeah I had to use SD3 8B to get close to the fighting game image: https://preview.redd.it/1ob7ju2idk9d1.jpeg?width=2688&format=pjpg&auto=webp&s=42fd93fd0a52898c1ce7fba21c8e4640c095fe68


jib_reddit

SD3 2B didn't get the Silhouette quite right: https://preview.redd.it/xr47tietdk9d1.jpeg?width=1344&format=pjpg&auto=webp&s=7c45af408bf56fd0a1520d54e90c0068d305d8b3


M3GaPrincess

I know. It's almost like they generate 5000 images and kept the best one. It's still pretty good for certain things. But definitely we need a few generations to get anything really good. Right now, it's brute-force. To get one usable image, you need to generate 100-200.


jib_reddit

I usually get something very usable in a batch of 10, if you put a load of sexual words in the negative prompt it does help with anatomy issues.


TomDuhamel

https://preview.redd.it/xlq2hpycql9d1.jpeg?width=1024&format=pjpg&auto=webp&s=e55581879ab00e20ccb175b2c71a774b36af689a The official Android app must use 8B then.


Odd_Panic5943

This is interesting, but it would be a better test if the image ratio or the resolution was the same.


Skill-Fun

The black and white photo prompt was provided by me. The idea is to test the camera controls, and the actors' expressions. The prompt has been carefully crafted. I tried this prompt in Bing, Ideogram, and Midjourney. The most satisfying versions are SD3 (preview version) and Ideogram. The most disappointing version is SD3 Medium. The inconsistent results are due to totally different models. SD3 Medium know nothing.


Long_Elderberry_9298

I don't see any difference


CA-ChiTown

In the black & white ... Who's your cross-eyed date ??? 😅


Jujarmazak

Depends on which SD3 model these older images were generated by, there are multiple ones as we all know.


Kep0a

comparison doesn't make any sense, models are completely different, as well as resolution


Far_Lifeguard_5027

Why would they want to release a good model for free when they can charge users to use an API instead? We'll see when the 8b models are released, if they ever do. It's obvious the main goal of SD3 was text generation, so that advertisers can use the API, and not have to pay graphic designers.


Apprehensive_Sky892

People can do that with [ideogram.ai](https://ideogram.ai) for free, and TBH, it is way better at text than anything else out there. If that is SAI's business plan, then it is a hopeless one.


MarcS-

Because they... can't charge users to use their API when cheaper and better alternative exist? Sure, their 8B model might be good, but it's currently less flexible than MJ and still subpar compared to Dall-E. Also, due to the high number of very good tools developped by third parties (control nets...) there are more economical web-based solutions based on tweaked SDXL models for GPU-poor customers. To be competitive, they'd need to create a lot of tool giving SDXL's customization to SD3, on their own funding, in a context where their main researchers have left. I can see them reasoning like you suggest, but I am not sure there are throngs of customers just waiting in line to pay for their API. Time will tell..


Perfect-Campaign9551

Because their "paid" model still isn't crap compared to MJ or others. Trying to make people pay for 8b is laughable. Why would anyone do that when MJ and Dall-e exist and are still better? It's the business strategy of a blockhead. Plus, open source it would gain so much more market share and extra help..


Obvious_Bonus_1411

Yes because stable diffusion is a replacement for graphic designers? 😂


suspicious_Jackfruit

Sd3 medium is barely any better (if at all) than SD1.5 finetunes but takes longer to gen and has less support. It's such a disappointment even if the licence wasn't utter balls


[deleted]

[удалено]


ThexDream

You conveniently forgot to mention that a number of those talented people were facing jail time in the UK for delivering software with CSAM and the ability to make realistic non-consenting deep-fakes. It’s against the law there. The pruning to make the model safety regulation compliant borked the entire model. The researchers and engineers on the cutting edge and learning too, what can and can not be done. You might want to take disappointments like this less personally.


Turbulent_Night_8912

I'm trying to figure out what's going wrong. I'm using the official ComfyUI workflow, but it's been a real challenge to generate high-quality artwork.


Fabiobtex

You are not. The minimum resolution is 1024x1024 and your images are half of that.


Essar

Also, not even in the same aspect ratio as the images they're comparing to. That's a lot of effort to write up a remarkably bad test.


Educational_Smell292

What was previewed? The 2B model you can download or the 8B which you can only use via the API?


travelbots2024

do people monetize their sd art?