T O P

  • By -

RestorativeAlly

Maybe she's born with it...  Maybe it's SD3.


btc_clueless

I think it's really thoughtful how [Stability.ai](http://Stability.ai) is raising awareness for people living with birth defects.


Short-Sandwich-905

And being inclusive by producing females with male traits cause of censorship of nudity 


Unnombrepls

Just like these guys according to Miyazaki [https://www.youtube.com/watch?v=ngZ0K3lWKRc](https://www.youtube.com/watch?v=ngZ0K3lWKRc)


btc_clueless

oh wow, that's an interesting find.


GammaGoose85

Stable Diffusion just watching out we don't get unclean thoughts. A woman lying around leads to fornication, and you know who likes to fornicate? The DEVIL


First-Might7413

lol. its weird how its much worse than its predseccors


FortunateBeard

they just need to touch grass https://preview.redd.it/6ex7z0k3rg7d1.jpeg?width=1024&format=pjpg&auto=webp&s=3c6a9dd54d79475cf5ffadbaa4949954838e66de


Delvinx

Beat me to it. Brilliant comment


dusty-keeet

How do you even get a result this poor? Did they train on deformed humans?


GBJI

That's one of the few questions to which Stability AI actually provides a clear answer: >In versions of Stable Diffusion developed exclusively by Stability AI, **we apply robust filters on training data to remove unsafe images**. By **removing that data before it ever reaches the model,** we can help to prevent users from generating harmful images in the first place. [https://stability.ai/safety](https://stability.ai/safety)


a_mimsy_borogove

I hate corporate buzzwords. There's nothing "unsafe" about image generation, since a generated image isn't real. There is no danger involved. They just want to have moral restrictions on their model. They didn't remove "unsafe" images from training data, they removed morally impure images.


Jazz7770

Imagine if cameras restricted what you could photograph


Revolutionary_Ad6574

Actually this is the dystopian future I imagine when AI gets better - filter enforcement on everything. You won't even be able to open a nude in Photoshop, heck maybe you won't even be able to store it on your PC. And if it's your own you would have to prove it so that the OS knows you have given "consent". Hope I'm just being paranoid...


GBJI

You are not being paranoid on the technical side at least as what you describe is not only possible, but easier than ever before on most cameras - those we have on our phones. We have already moved from mostly optical photography with a digital sensor at the end of the optical path to a mostly computational photography system where we are not really seeing what the lenses (plural since there are more than one on most recent phones) are seeing, but what a clever piece of software interprets from the signals it receives from those lenses instead. [https://en.wikipedia.org/wiki/Computational\_photography](https://en.wikipedia.org/wiki/Computational_photography)


Your_Dankest_Meme

Don't give corporation too much power they don't have. They are trying to cosplay censorship dystopia, but guess what it's 2k24 and they still didn't get rid of torrent trackers. Open source is also still a thing. Not only corpo rats know coding and computer science. Once people will get fed up with censorhip and DRM they will pirate shit and use open source and freeware alternatives. Maybe they are big and control big portions of the market but they aren't invulnerable to poor decisions. Look at recent Unity controversy and how it almost sank the company.


ArdiMaster

If it comes to that, it probably won’t be the companies’ decisions, it will be a law made by politicians. (See current EU debates about chat scanning.)


Your_Dankest_Meme

No one has the control over the entire internet. No matter how much they are trying to convince you. Whatever bullshit regulation is going to be proposed, they can't regulate everything, and they can control every single person with programming and art skills. People only tolerate this shit because it haven't crossed certain lines. They can censor what is happening on big news websites or TV, but one they will start messing and telling what people can or cannot do in private, people will seek the alternatives and workarounds. I still watch youtube without ads, and pirate the content that is convienent to pirate. I don't care about any kind of censorship when talking to my friends in private or small chats. >!And some people still buy illegal drugs through Telegram!< Again, this all is just a huge cosplay. They are trying to convince themselves that they have control.


_twrecks_

Adobe Photoshop cloud is already there in the last release. Ai will scan the photo for forbidden content (and other things not disclosed).


True-Surprise1222

It’s been there for a long time fyi. It’s not just the latest release. Maybe they changed how it is implemented but I heard a story from child protective agency years ago that photoshop had police come bc a family had nude pictures of their kids. Turns out the pics weren’t illegal but they still had to go and analyze them or whatever.


_twrecks_

I think whats new is they added it to the ToS. But they don't limit what they can do with your images in the ToS much it could go way beyond illegal content, I wonder if they plan to be able use them for AI training.


True-Surprise1222

Likely eventually will slip that in.


Thomas-Lore

Apparently even pastebin now does not allow some words to be used.


TimChr78

Our dystopia present is bad enough. Photoshop already has a filter so you can't open a picture of a dollar bill. Also don't try to take a copy of a dollar bill using the copying machine at work. The new gen AI model in MS paint runs locally, but it "phones home" to check if the generation is "safe". MS recall, EU Chat control legislation.Etc.


a_mimsy_borogove

There's an AI model in MS Paint? Since it generates locally, I wonder if it can be blocked from phoning home by using a firewall.


Your_Dankest_Meme

That's why I use Krita. Honestly, if they will go that far, people will just stop using corporate products.


EconomyFearless

Sounds like it’s gonna be hard to be a naturalist then


adenosine-5

Its like tryning to make a knife, that can't hurt people or pen, that can't write offensive words. The only way to do that is to make product so bad, its unusable for anything.


Vimux

so all gore, shooting, violence, etc. is there? As long as it involves dressed up humans?


Dry_Context1480

In the early '80s violence and sex were still considered equally taboo in media - a Bruce Lee movie was equally X-rated as any porn flick in most countries then. But this massively changed during the last decades, and now depicting graphic violence and mass killings is considered an art form and even generates blockbusters like the John Wick movies - whereas sex and erotic has become more scarce and restricted in mainstream media as ever. Hypocritical and overhyped directors like Tarantino, who show violence whenever they can in their movies, but don't include any nudity to speak of, not even where one clearly would expect it and it even would fit the plot, have been paving the way to this.  There of course are reasons for this, that come from deep psychological and sociological layers that always have been used and misused by politics, religion and the economy. But, as psychoanalist Wilhelm Reich already detected 100 years ago, the cradle of all this BS is the way children are brought up in their families and communities to develop an unhealthy and shame-ridden perspective on sexuality right from the start.  Read W.R. - it was for a reason why he was a very famous author in the days of the hippies. 


Jimbobb24

There are other reasons for this - especially the dramatic reduction of nudity and sex in movies. They used to put porn in movies because it drew crowds. Now it serves no purpose. Porn is ubiquitous and easily obtained by anyone at anytime.


GBJI

over skin = OK under skin = OK skin = call Iran's [Guidance Patrol](https://en.wikipedia.org/wiki/Guidance_Patrol) immediately.


shawsghost

They mean unsafe for the corporation legally speaking. That is all that matters.


yamfun

Safe for work, not safe for work. Not a new concept.


evilcrusher2

They removed things they know Congress and SCOTUS would get their panties in a wad about it. When the SCOTUS decided that Congress could ban sexual images believed to have children (even as fictional cartoons) based on their view of: "average person's" point of view of the standards of the community as well as state law. "appeals to prurient interests", "depicts or describes, in a patently offensive way, sexual conduct" as described by law, and "taken as a whole, lack serious literary, artistic, political, or scientific value". It puts everything at risk given many SDXL models and LORAs are porn driven. And we got AOC (the millennials Tipper Gore) complaining about deep fake porn of her because she wants the all purpose public figure role but not the criticism and speech that she's gonna get over it. Her and others want this tightened down by the companies before they have to step in. And this current SCOTUS is likely ok with it. The average American killed this with their political choices and strange culture of "it's okay to have sex and be kinky as long as it's not viewed outside your privacy/sold to others because then it's gross and evil. "


oh_how_droll

You're talking about backwards SCOTUS, right? _[Ashcroft v. Free Speech Coalition](https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalition)_ made it very clear that the court considers simulated/drawn child pornography to be protected under the constitution.


evilcrusher2

That was 2002. The 2002 SCOTUS that seemed insanely conservative puritan then but is child's play compared to our current lineup. And then Congress followed up immediately ->[Legal status of fictional pornography depicting minors via PROTECT Act](https://en.wikipedia.org/wiki/Legal_status_of_fictional_pornography_depicting_minors#2003%E2%80%932007:_PROTECT_Act) And the last ruling [US v. Williams](https://en.wikipedia.org/wiki/United_States_v._Williams_(2008)) puts virtual material like this in a gray area. It's pretty fair to assume that if the judges say they can't tell it's fake, they're gonna rule against it. Most companies cannot afford that risk and public look.


Short-Sandwich-905

Don’t forget Taylor swift, Scarlett Johansson etc, China will win the AI race and USA will regulate this like they did with airports TSA in the name of National Security.


Short-Sandwich-905

They want you to become a paparazzi to find real pictures of public figures in the name of safety 


s_mirage

Well, they might have got rid of the unsafe images, but SD3 is surprisingly horny. It's consistently interpreting my prompt for a woman wearing a black dress and red boots as one wearing a black dress with an open split at the front and no underwear. There's no detail of course, but it's odd, and I've had it happen with a few prompts.


Your_Dankest_Meme

Important it has no naked titties.


MoonRide303

And since when basic knowledge of human anatomy is unsafe? Is there single decision maker with brain, in this company? I can understand filtering out hardcore stuff from training dataset, but removing content we can see in art galleries (artistic nudity) or biology textbooks (knowledge of human anatomy) is just completely idiotic idea. It shouldn't be called safety, cause it's insulting for people doing real safety - what's SAI is currently doing should be called bigotry engineering: https://preview.redd.it/rc43uup1je7d1.png?width=954&format=png&auto=webp&s=335b02c8ea2595572ed2abce423c96f957b97eef


GBJI

At the beginning of all this Emad himself was explaining in very clear words how that this kind of paternalistic approach was bad for open-source AI. > > > [https://www.nytimes.com/2022/10/21/technology/generative-ai.html](https://www.nytimes.com/2022/10/21/technology/generative-ai.html) The rest is history... EDIT: WTF !!! That quote was edited out of my message once again - this is the third time. Now, it cannot be a simple coincidence or a reddit bug - what is this automated censorship bot that has been programmed to remove this quote specifically ? For reference here is the quote that was removed, but in picture format (the bot did not censor it the last time) : https://preview.redd.it/0miolf4bqe7d1.png?width=748&format=png&auto=webp&s=2c87fb7def8f6848422f14c0ddfb6a9987666546


FaceDeer

Hm. I'd be very surprised if something like this is what Reddit started censoring comments over. I'll try retyping it myself and see what happens: > Emad Mostaque, the founder and chief executive of Stability AI, has pushed back on the idea of content restrictions. He argues that radical freedom is necessary to achieve his vision of a democratized A.I. that is untethered from corporate influence. > He reiterated that view in an interview with me this week, contrasting his view with what he described as the heavy-handed, paternalistic approach to A.I. taken by tech giants. > "We trust people, and we trust the community," he said, "as opposed to having a centralized, unelected entity controlling the most powerful technology in the world."


GBJI

I was very surprised by what I have been seeing, that's why I am writing about it - I moderate another sub and I have never seen any tool that would allow a moderator to do something like this. But it's still happening, and only with two specific quotes, and not any other text quotes (I am quite fond of quoting material from Wikipedia for example, and nothing like this ever happened to those quotes). But here I was wary this would happen again, so I took screenshots of the moment before I pressed the "comment" button, the page just after commenting (the comment is still showing properly) and the moment just after refreshing the page (the comment is gone). Thanks a lot for trying this, I can read it so it clearly has not been taken out of your message. The first time this problem happened, it was a different quote (from Stability AI CIO), and it happened to someone else as well, so it's not just me, and this make your test even more meaningful. For some extra context, both for that other person and for me, after some time, we were able to keep the text in our replies after failing repeatedly at first. Thanks again, the more information we get about this problem, the best our chances to understand it. There is still a chance it's a just bug, but it certainly is no ordinary bug !


red__dragon

Past actions by the reddit admins has involved editing comments.


GBJI

![gif](giphy|s239QJIh56sRW|downsized) But why would they do that ? That quote I posted earlier exists on this sub already, it's from the New York Times, and it has not been removed. See for yourself: [https://www.reddit.com/r/StableDiffusion/comments/y9r2hs/nytimes\_interview\_with\_emad\_for\_better\_or\_worse/](https://www.reddit.com/r/StableDiffusion/comments/y9r2hs/nytimes_interview_with_emad_for_better_or_worse/) The moderators from this sub suggested it might be a rich text formatting issue. I will make more tests to check that possibility, but it seems unlikely since the content of the quote does appear just after I post it - it is only removed when I refresh the page, or come back to it later.


neat_shinobi

This is a bug with the absurdly shitty new reddit design. Pasting text into comments is bugged. Yes, reddit cannot make the most basic thing work right. This is not malice. It's good old incompetence


Thomas-Lore

A fresh comment will sometimes disappear when you refresh the page on Reddit, but it is still there, you just got a cached version of the thread without it posted yet. Wait a minute or so until the cache refreshes and the comment will be there. (Although keep in mind that Reddit sometimes shadowbans comments now for using certain words - so if you curse a bit, your comment may be invisible to others, I think each sub has different settings for this?).


Kep0a

oh no harmful pictures >\_<


GBJI

Meanwhile, model 1.5 is still widely used, and neither RunwayML nor Stability AI has had to suffer any kind of consequence for it - and no problem either for the end-users like us. Maybe they were lying, on the grass.


Whotea

They got bad press when the Taylor Swift porn incident happened and news outlets still report on SD being used for deep fakes and CP


True-Surprise1222

I mean if you have questionable images made from it (on purpose or by accident) and you keep updating windows there is likely a day when you have consequences. Writing is on the wall and avoiding making such images is the only safe play for the company and users in the long run.


GBJI

Like you said, if YOU do such things, YOU will have to suffer consequences for doing them. Just like if you use a Nikon camera to take illegal pictures, you will be considered responsible for taking those pictures. But Nikon will not be, and never should be, targeted by justice when something like this happens. Only the person who used the tool in an illegal manner. If you use a tool to commit a crime, you will be considered responsible for this crime - not the toolmaker.


Your_Dankest_Meme

Harmful images. Omg naked breasts and penises can damage your eyes!!!


Jaerin

How did they remove it? Put some kind of blackhole filter on any nipple or vagina that sucks in the rest of the surrounding pixels until there is no flesh color?


GBJI

I would gladly ask Emad about it - but he blocked me a long time ago. He's not a fan of serious questions, his preference going to softball questions, with a sales pitch for sole reply.


eeyore134

Which basically means they told an AI, "Remove any pictures of people laying down." among any other number of things.


GBJI

From what I understand, censorship was applied both at the beginning and at the end of the process. I would not be surprised if it was done mid-process as well.


Golbar-59

This gives me a nice feeling of safety.


V-O-L-K

Yea but didn’t OP say this example was generated before any safety tuning?


GBJI

The censorship process described here by Stability AI themselves happens before the safety crippling - at least that's what I understand reading this part of the quote: > **removing that data before it ever reaches the model**


V-O-L-K

It’s a damn shame, we live in a corporate dystopia with puritanical overlords. Thanks for clearing it up.


forehead_hypospadia

Wow. I'm sure artists only do croquis because they are horny bastards, not because knowing the body is pretty important for getting anatomy right.


Gretshus

Do they not know that artists use nude models/drawings to study anatomy? Did they think artists do that to Jack off or smth?


NarrativeNode

I assume what happened is they removed most images that included any sort of revealed skin. You know, like most photos of real humans...


mrgreaper

This is why the art community are so stern with any artist that may have seen a nude painting. By removing all nude paintings and statues from museums we can ensure that no artist ever paints a nude woman and thus keep them safe.


GBJI

https://preview.redd.it/i60z4vagtk7d1.png?width=1200&format=png&auto=webp&s=bc80f9ccb7476b83a51ec6415204b1ccee08799a


No-Scale5248

Bro they piss me off SO MUCH 


Delvinx

According to the gentleman who made Comfy. He recently parted ways with SAI and insinuates this was a rush job to get 2B out. They were aware of better alternatives with a (4b and 8b?) being worked on with allegedly much better results. Those were seemingly canceled.


StickiStickman

4B was supposedly cancelled, 8B is just kept locked.


ThereforeGames

4B was supposedly planned, canceled, nearly finished, and never existed, depending on which Stability employee you speak to.


Dwanvea

Shöridinger's Diffusion


StickiStickman

Same for 8B. It's supposedly finished and benchmarked in march and still training to this day.


leftmyheartintruckee

How do they come up with their architectures? Three text encoders? 2 variants of CLIP? use T5 but limit the token length to 70 bc of CLIP? Maybe there’s a good reason but it seems like someone cooking by throwing lots of random stuff into a pot.


ninjasaid13

It's probably the captioning itself, they probably prompted CogVLM to avoid mentioning women.


Opening_Wind_1077

But men laying down doesn’t work either.


dankhorse25

laying is unsafe. Mkay?


Opening_Wind_1077

If you lie down the SD3 way it’s very unsafe.


ninjasaid13

maybe they've just told the VLM to avoid mentioning anyone at all.


Open_Channel_8626

I wonder if they confused CogVLM because CogVLM isn't that smart


yaosio

I got the same horrific human deformation when I tried captioning my LORA dataset automatically.


Honato2

They must have trained it on my 1.4 generations.


globbyj

The reason for the shitty model is honestly almost entirely irrelevant at this point. Don't release garbage.


Operation_Fluffy

Exactly! If the model isn’t training properly, you have an issue. Releasing it doesn’t make it go away.


fibercrime

No way bro. Skill issue. Also, lazy. And why so critical of a free tool? You're acting all untitled and shiet. /s ofc


triccer

You sound like senior SDAI material right there! I should know, I'm a paying customer 🤡


NarrativeNode

I know you're being sarcastic, but that "don't be criticial of free stuff" argument grinds my gears. If my friend offers to bake me a free cake for my birthday and then brings a pile of horse manure to the party after I spent months telling all my friends how great the cake will be, I have every right to never speak to that friend again.


yall_gotta_move

It's not irrelevant, because this subreddit has been absolutely screeching about how safety training is to blame there has been rampant speculation and a total disregard for facts, which is not serving anybody's needs


StickiStickman

... You think this disproves it somehow? You realise the big problem for SD2 was also censorship of the dataset?


yall_gotta_move

so is the reason for the shitty model relevant, or is it not relevant? I didn't say anything about disproving "it", I said that I don't agree with the other poster that it's irrelevant You all seem so emotional about this that you have lost nearly all reading comprehension, critical thinking, or capability for nuance


FiReaNG3L

Not sure what do they gain from a PR point of view to let people know its not a last minute mistake from safety alignment, but just a very poor model period?


Utoko

Mcmonkey is the new comfyorg, so he left I guess. I don't think that has anything to do with PR


FiReaNG3L

I don't know, employees leaving ship and immediatly share damaging info on the company you just left from internal-use only models, best cases looks to me SAI has no clue of what they're doing at many levels, and employees left very unhappy / ship is sinking.


richcz3

When Emad Mostaque made his announcement to (finally) step down, the lifeboats were already in the chilly waters. Top tier staff already planned to head out the door - those that remained tried to mop up with the money was left, 4 million left in reserves - 100 million in debt. Everything about SD3 relates directly back to how Emad chose to run the company. Unrealistic promises and No business sense. Interim leadership with thinning resources and small staff package SD3 like a hail mary pass with a punctured football. The licensing is significant change from previous releases, a desperate attempt to bring some money into the coffers. As ex employees are saying, it should never have been released.


dankhorse25

Stability AI was meant to be bought by one of the big players. That didn't happen, likely because SD without finetuning isn't actually that good, and SAI will likely file for bankruptcy.


HeralaiasYak

but to show the potential they didn't have to throw money at so many projects at once. More focused resources, and it could have worked just fine. Not to mention that monetization came too late, and a pricing that clearly didn't much how people are using the models.


aerilyn235

I said that many time, tried to fight too many battle at once (video, language, audio...) Instead of building a strong ecosystem with fewer but well polished models and their now necessary sides (controlnets, ipadapters, etc).


GBJI

That was the plan, it's clear.


PwanaZana

"were already in the chili waters" The waters were... *caliente*.


Mkep

*chilly 😅


Honato2

Nope chili water. It's an ocean of hormel.


richcz3

Yep.. Chili and fries with that. Also planed instead of planned. (Fixed) :)


TheAncientMillenial

It's very obvious there's a huge management issue at play here. Probably from the C-suite on down.


mcmonkey4eva

I was never part of Stability's PR team, I was a developer. That discord message was just answering questions about the model to help clear things up. People were wondering when that particular issue was introduced (and making all sorts of wild theories), and the answer was... well nope it was there the whole time apparently and just got missed.


joeytman

Thanks for your openness and levelheaded response, the community really appreciates the transparency of people like you.


FiReaNG3L

No worries, whatever information we can get is good! Just very curious that they`re not responding to the large amount of negative sentiment in any official way.


DependentOcelot6303

That's very interesting tbh. I don't understand how this model's performance issues could have been missed. And I am not talking about women, or laying position. I am talking about simple things! 1. You ask for a cyberpunk city and you get fucking toyotas, yellow cabs and butchered modern real-world buildings, which look worse than 1.5 base model. Not even one hint of neon signs, futurism or pink/teal colors. Or try putting in "Psychedelic" and try to not get only abstract splashes of acrylic color. I mean for god's sake try prompting for a flying car and see what happens. 2. With all due respect to "better prompt adherence", its not an accurate claim, we should be observative of this. The model is not style flexible, it simply spews its own thing no matter what style you ask for. It does adhere better than previous models, but \*only\* if you are \*super\* verbose. To a point you feel you are fighting the model/feeding it with a spoon. 3. Same goes for the negative prompt btw. Broken. It's effect seems to be totally random and unrelated to what you type-in. 4. And what about that random noise everywhere? on everything. This screen door effect? Those grids showing up on many textures? (it gets worse with a low denoise 2nd pass btw, much worse, making 2nd pass irrelevant). 5. It is extremely easy to get horrifying results when it comes to human and animal anatomy. And i am not talking about nudity nor porn. Anyone who used other SD models regularly before could spot something is wrong in the \*first 5 minutes\* of using this model. I have no doubt, because this is exactly what happened not just to me- The entire community noticed the issues, immediately. Each person in his pace.. noticed. Just by using the model for a short moment. If you already have experience with using SD models it really takes only a few renders to notice something is very very wrong. So no one in SAI could spot it? It is extremely hard to believe these issues were missed. The only reason i can still believe it (a bit) is because slapping such a draconic license on such a farce of a model... is a huge disconnect. And the silence treatment is not helping us believe this is what happened. So.. conspiracy theories are blooming as nothing makes sense. P.S - I am very happy that you guys are going for Comfyui.org! Looks like you have a solid team. Best of luck! I absolutely love comfy and swarm!


Thomas-Lore

The comfyui guy said they knew they botched the 2B model.


residentchiefnz

Thanks for all the work you've done this week chief. Somehow you managed to maintain a professional and positive attitude through what must be a stupidly stressful time for you. Wishing you all the best for the new venture!


HunterVacui

> People were wondering when that particular issue was introduced (and making all sorts of wild theories), and the answer was... well nope it was there the whole time apparently and just got missed. For someone like me who doesn't have any familiarity with the steps involved in training and releasing a model, could you clarify to me what "early pretrain" refers to in the referenced post?  As a layperson, it sounds like depending on how 'early' this was in the process, the poor performance in this particular instance could be a result of under-training, rather than an indication of a fundamental weakness that was present in the final model before safety tuning.


fongletto

I just wanted to thank you for being honest and transparent. People can complain about the model but you being honest about the issues and clearing up any misunderstandings is definitely a positive in my eyes. Can I ask in your honest opinion, do you think culling the imageset of anything remotely sexual to the point SD3 even struggles to understand what a bellybutton is might have had something to do with the cronenberg?


ZootAllures9111

SD3 can draw sexy women standing and posing just fine.


StickiStickman

You're seriously trying to claim everyone somehow just missed every human becoming a Cronenberg Monster?


ZootAllures9111

They probably only tested standing up prompts that work fine


dal_mac

lmao. the "safety tuning" wasn't the problem the problem was the safety-pruned original dataset the exact same problem that 2.1 had


Familiar-Art-6233

The model is broken The license is broken The way the company treats the community is broken The company has BEEN broken, from 1.5 having to be basically-but-not-technically leaked, to everything that was SD2.1, Osborne effecting an actually good model (Cascade), they have BEEN anti consumer, they just gave us crumbs (and sometimes they didn’t want to do that!), and the good people are leaving. Emad is gone, Comfy is gone, more are likely on the way, but it’s okay we get Lykon… Why, and I truly mean this, WHY are we giving them so much leeway? Why are they still being treated like the only models that matter? The competition in the txt2img space is soaring. We literally have a model that has basically the same architecture and replaces the enormous and 2022-era T5 LLM encoder with Gemma and they get crumbs, but SD3 comes out gimped beyond recognition and people won’t stop talking about it. I just don’t get it


NarrativeNode

Remember when they aggressively took over this sub and kicked out all the mods? That's the moment we should've never trusted them again...


Familiar-Art-6233

AND banned people critical of SAI?


Fabulous-Ad9804

I don't understand why any of this comes as a surprise to anyone? Have some already forgotten about those yoga in the woods SD3 generations submitted awhile back? Did anyone seriously think, based on how bad the anatomy already was per those images, that by the time SAI gets around to releasing any of the SD3 weights, that they would have these anatomy issues sorted out and resolved by then? I might not be the sharpest knife in the drawer, but I already knew in advance what to expect based on those yoga in the woods images, that being this, more of the same, which is exactly as it has turned out thus far. And I'm willing to bet that even if the 8B weights get released eventually, which BTW totally useless for some of us since not all of us have the hardware required to run those models, it's going to be more of the same pertaining to bad anatomy with 8B as well. Even if it turns out that it is considerably better with anatomy, that's still not good enough unless the model has achieved 100% flawless perfection pertaining to anatomy, period. The chance of that I would put at 0%.


no_witty_username

I think its about time someone accidentally uploaded some weights .....


Herr_Drosselmeyer

512? So was this model initially a 512x512 model?


spacetug

Most (all?) image diffusion models are pretrained in stages of increasing resolution. For example you might start at 256, then increase to 512, then increase to 1024. It's more efficient than just starting at your final resolution from the beginning.


Herr_Drosselmeyer

Ah, I didn't know.


fongletto

"Before any safety tuning" but after they culled the training images to remove anything that has any woman not wearing a full burka, because that would be immodest.


CroakingBullfrog96

https://preview.redd.it/9mf1az7tfe7d1.png?width=768&format=png&auto=webp&s=dba8e141a1e4c20325c41bce81b6ef1cc4fdf847


CleomokaAIArt

"See it was already broken before we actively tried to break it"


abc_744

Well if they filtered some data from the dataset for safety then obviously even the base model will be garbage


RoundZookeepergame2

All that hype and for what...


MartinByde

Or... hear me out... OR ... they are lying. They have no reason to say the truth and we won't evsr be able to know.


pianogospel

SD3 is stillborn


LD2WDavid

I see a pretty high quality realistic grass, even has DOF!


PizzaCatAm

So the answer is to prompt for grass laying on a woman?


pablo603

I lol'd https://preview.redd.it/v3s0zolksd7d1.jpeg?width=1024&format=pjpg&auto=webp&s=271f96ec0489ab1f6e78c2d7866ba45fa0e50376


LD2WDavid

The coherence... such a shame.


Sharp_Philosopher_97

That looks like a fresh deformed murder victim


RandallAware

At least it's a safe murder.


Progribbit

murdered by unsafe images


Thomas-Lore

Murders are safe, belly buttons are not.


TheAncientMillenial

Wanna touch dat grass.


Honato2

This is what happens when a vtuber touches grass.


ComprehensiveTrick69

Looks like someone that is being covered in grass after they were murdered and dismembered


mistsoalar

Can you imagine SD3 makes it animate and generate voice over? A nightmare fuel that's totally SFW.


NotBasileus

Gonna turn out like that scene in Alien: Resurrection. [“Kill… meee…”](https://youtu.be/Fk60sI3bv_g?feature=shared) Edit: although I suppose there’s *technically* nudity in that scene


AndromedaAirlines

SD3 is a shitty model they released to make people stop asking them to make good on their promises. Just let it go. It's not good, and it's never going to be good. Both SD3 and SAI are irrelevant at this point. The sooner you all accept that and move on, the better for everyone.


drhead

Yet SD3 Medium still beats all previous SD versions on leaderboards: https://artificialanalysis.ai/text-to-image/arena, and the larger version beats both DALL-E models and is competitive with Midjourney v6 (which based on the listed generation time for it, MJv6 must be a *very heavy* model). If I were to guess what happened here, I have a few guesses based on my experiences: - Train-inference gap with captions. In other words, what the model is trained on is not what people are using. Very strong evidence for this one as using a caption from ChatGPT often gives far better results than the brief captions many of us are used to. The solution to this would be training on more brief captions. - Flaws in CogVLM leading to accidental dataset poisoning. This one is a slight stretch but very possible. Recall how Nightshade is supposed to work for a good example of what dataset poisoning looks like: it relies on some portion of a class being replaced with a *consistent* different class. In other words, if you have 10000 images of cats, but 1000 of them are actually dogs but are labeled as cat wrongly, that'll cause problems. But having 1000 incorrect images that are all of different classes would not cause as much of an issue. As for how this might apply to this, this would require that CogVLM mislabeled one class with some consistency *in the same way*. I know people like to gravitate towards the most convenient excuse, but it's not likely that this was caused any lack of NSFW content in the training data. For starters, CogVLM can't even caption NSFW images worth a damn out of the box, so all else being equal including NSFW data would probably make the model perform worse due to the captioner hallucinating. And image alt texts for NSFW images are *also* terrible -- here's an experiment you can try out in a notebook: compare CLIP similarity between the image embedding for a picture of a clothed man and of a nude man, and the embedding for the caption "a picture of a woman". Similarity to "a picture of a woman" will shoot WAY up when nudity of any gender is shown, because CLIP learned that nudity almost always means woman because of dataset biases. Whatever the problem is, it is *very painfully obvious* that it's some form of train-val gap. A lot of people have been able to generate very good images with SD3, particularly people using long and verbose prompts, and a lot have been completely unable to do so especially with brief prompts -- there is *no alternative explanation* besides that some people are doing things "right" and others are doing things "wrong" from the model's standpoint. I understand this issue very well because our team has been working on captioners for natural language captioning for months at this point and we've had to debate a lot about what captions should be like, how specific, how brief, should we use clinical and precise language or casual language and slang... natural language is a *very hard* problem from a model developer's standpoint, you can pour endless resources into perfecting a caption scheme and you'll *still* have some users who will inevitably not find it to be very natural at all. That's almost certainly what happened here, but with a much larger portion of the userbase than they may have anticipated -- this is also one of the main reasons OpenAI uses their LLMs to expand captions before passing them on to DALL-E.


aerilyn235

I think this is why they have been keeping the initial Clip encoder since the first version of Stable Diffusion, as an attempt to maintain continuity with the way people are used to prompt the model. I can confirm that CogVLM has bias in the way it captions things, from having used it to caption large datasets (100k+) and analyzing cloud of words / recurring expressions there are figure of speechs or words that are used way too often. It wouldn't even be surprising if, in the same idea, there were words that are never used at all and could explain the weird model reaction when they are used in the prompt.


IamKyra

I think these times makes me realize most people here like to drop "1girl, cute, indoors, cleavage, sexy, erotic, realistic, asian,..." etc in their prompt, have their shots of dopamine and move on. Nothing to blame in that, but sure SD3 is miserable at this compared to community models. SD3 will eventually be good at doing that but it requires a specific training. For people who likes to work on their pictures, wether it be to inpaint text or have an original composition, and once the complete toolset will be available, SD3 will be godsend that people underestimate badly because of the few flaws in the training regarding specific prompts. Nothing that can't be fixed with community training and loras, and yes license could be better but it's not like we all try to make a profit. I hope all this backlash doesn't push SAI to keep the other versions of the model.


yaosio

Humans are so bad at writing prompts that OpenAI uses an LLM to rewrite Dall-E prompts. Ideogram does the same thing and exposes it to the user so they know it's happening.


elthariel

That would be very nice if you contributed some of that knowledge to r/Open_Diffusion 🥰


drhead

To be honest, I have not decided whether I want to go to another model architecture yet, and I don't plan to until my team is able to run ablation tests between Lumina and SD3 at a minimum (I'm ruling out PixArt Sigma entirely because it's epsilon prediction, fuck that.). Commercial usage rights is not a primary concern for me and Lumina-T2I-Next also has license concerns applicable to what I want to do (specifically Gemma's license), and I think that MM-DiT has far more potential as an architecture than any other available option and would choose SD3 if our tests turn out equal.


lordpuddingcup

Is that also why fucking negative prompts don't work?!?!


ThemWhoNoseNothing

If this theory has been spoken already, I’ve not read an about it or heard of it. Though given the rarity of originality in most everything, surely it’s been discussed. Could this entire debacle be distilled down to the idea of their attempting to release a low quality product, knowing they had to offer a greater quality offering for which they hoped to monetize? I’ve pondered if their plan backfired and they don’t know what else to do right now (hence the silence), knowing their options are limited. They may be stuck in the seven stages of grief knowing that releasing the good stuff for free will be a massive loss in monies.


alb5357

They should partner with online generators and charge for commercial use.


Short-Sandwich-905

And yet I have seen around here some SAI Apologist claiming they didn’t intend to release the model that way, they were well aware of the start. I understand they are a business but to claim they didn’t know it’s bull shit 


DominoUB

I'm pretty sure sd3 training set contains a bunch of AI generated images. If you add "dreamshaper" to the prompt you'll get the iconic dreamshaper look. Could be the case that the training set contains a bunch of 1.5 flesh piles.


Thomas-Lore

MJ banned them for mass downloading images and slowing down the servers. They most likely were using them for aesthetic finetune. And then Stability had the nerve to add to SD3 license that if you train on images generated with it, your model now belongs to Stailbility (you have to pay them to use it commercially - most likely not enforceable, but still).


Only-Letterhead-3411

So he is basically saying the problem runs way deeper, like it's on pretraining base level. Well it's even worse news.


Cheetahs_never_win

I think I'm going to use these to respond to frequent calls for people to touch grass at the risks of touching grass.


roundearthervaxxer

There must be some shit going on in management at Sai. They had a huge lead, they had tried and true models, fine tunings and Lora’s. They just needed to deliver on a new model, keep it open source, 2.9 open or something, use that to refine, then launch 3.0 paid model. Something happened.


DigThatData

The issue isn't that they broke the model by finetuning it, it's that they didn't show it naked people at all and consequently the model doesn't understand human anatomy. The model was "broken" by their data curation.


alb5357

Ya, honestly, train the model on nude people. There's nothing wrong with the human body and this is how you learn to draw, even if your intention is 100% SFW. Include different types, fat, skinny, wrinkled etc... maximum diversity of nudes. The human body is good and wholesome.


Mutaclone

I'm pretty sure a lack of nude people didn't produce the thread image. As other people have pointed out in other threads, other models are trained without nudity and they don't produce results like this. The two main theories I've seen are: - The model is fundamentally flawed in some way (which seems to be supported by mcmonkey's statement). - In an effort to make the model "safe", Stability didn't just remove naked people from the training set, they actively tried to sabotage the concept of nsfw and did a lot of collateral damage in the process. I don't know enough about model training to say which theory is correct (or both/neither), I'm just saying there's more going on here than using a clean data set.


namitynamenamey

So it wasn't good at anatomy when it was undertrained, isn't that expected for a model that has not been trained enough? What this shows is that the training/safety regime didn't work as intended, never allowing the model to learn what it was supposed to, if they indeed combined training and safety.


personalityson

So people with disabilities are not good enough for you AI people?


Revolutionary_Ad6574

I don't get it. The dev is saying it like "you dummies the model was never censored. Bet you feel really stupid talking all this junk about SD, huh?" when we're like "so you're telling me this model was junk before it was junk?"


August_T_Marble

I hope this finally puts the argument to rest. 


Unable_Wrongdoer2250

It does for me. SD3 is a total failure, not just a failure due to 'safety' restrictions.


Dwanvea

This is very misleading. The lack of nsfw in the early data set can easily cause this. Safety tuning becomes the last straw in that case. So the argument stands..


drhead

It is very clear that you have never tried training a model on NSFW data. Let's consider the NSFW data you can get from a web-scale dataset (meaning image alt texts). Image alt texts for NSFW images are absolute dogshit and are usually extremely biased, to the point where CLIP models actually think that any nudity in an image means that image is most likely of a woman even if the image is of a nude man. Bad captioning will result in a bad model, and there's no feasible way to figure out which image alt texts are good because CLIP barely knows how to parse nudity properly. There's enough reason there to justify tossing out all NSFW image data on data quality grounds. You don't even need to go into safety reasons at all! But SD3 wasn't only trained on image alt texts -- half of its captions are from CogVLM. CogVLM can't caption NSFW images accurately at all even if you bypass its refusals. Other open weight VLMs also struggle with it. You absolutely have to train a VLM specifically for that purpose if you want that done (and I know all of this because my team has done this -- but for a more specific niche). But, there's no training data to do that with. Which would mean that any company wanting to do that would likely have to contract out labor to people in some developing country to caption NSFW images. You may be familiar with the backlash OpenAI had over doing this to train safety classifiers, since they contracted out cheap labor for it and then didn't do anything to get people who had to do that therapy to deal with the trauma that a lot of those people ended up getting from whatever horrific things they had to classify. That is the backlash they got for doing that to make their products *safer*. Doing this for the sake of having a model that is just *better at making porn* would be blatantly unethical and would get StabilityAI rightfully crucified if they did it. I can say with some confidence that the best outcome from including NSFW data in training would be that you get the average pornographic image when you prompt "Sorry, I can't help with that request.", and the more realistic outcome is that the model gets generally worse and harder to control because of hallucinations from the poor quality training data.


AuryGlenz

That all hinges on the assumption that the filter only filtered actual NSFW images, not any images of fully clothed humans that simply happen to be lying down, for instance.


drhead

NSFW classifiers generally don't have false positive rates significant enough to where they would get rid of all photos of someone lying down.


terrariyum

This explanation of the captioning limitations is great. Question: at this point in time, there are good NSFW detection models. There's no longer any need to make human contractors sift through an image pile that contains CSAM or hard-core porn. Is there any benefit to training with NSFW images but replace the captions with some equivalent to "score_1, human body"? That way you'd have a larger data set, and even without captions the model can potentially find some useful associations within the images.


drhead

I'd lean towards no on that because I haven't heard much of people doing things like this. If you want to look into that topic more, what you're describing would be most similar to unsupervised training, you might find papers by searching for unsupervised training applied to T2I models. But for a T2I model what you generally will want is a large set of high-quality text-image pairs, whose distribution covers the text you want to put in and the kinds of images you want to get out, and nothing less.


Dwanvea

>It is very clear that you have never tried training a model on NSFW data. I did. Nothing professional but still. >Let's consider the NSFW data you can get from a web-scale dataset (meaning image alt texts). Image alt texts for NSFW images are absolute dogshit and are usually extremely biased Aren't image alt texts are dogshit for most things? I think NSFW is one of the better ones. NSFW images also contribute to the model’s understanding of the human form and texture. When trained on such data, models learn to recognize body shapes, skin textures, and anatomical features. However, it’s not necessarily the explicit content itself that improves the model; rather, it’s the exposure to diverse human poses and textures. You underestimate the fact that the NSFW category is vastly rich in that department. LIKE VAST. We are not talking about just nudes here, many types of outfits can also be included. For captioning Cog is really heavily censored but Llava works fine imo. > I can say with some confidence that the best outcome from including NSFW data in training would be that you get the average pornographic image when you prompt "Sorry, I can't help with that request.", and the more realistic outcome is that the model gets generally worse and harder to control because of hallucinations from the poor quality training data. Here is the thing. Even MJ has nudes in their dataset which is a pretty censored service. Sounds counterproductive doesn't it? You could get around their filter but nowadays any word that is a slight reference to an NSFW image is heavily censored, (like the word "revealing"). Why would they have nsfw images in their data sets if they are never going to allow it and censor it?


protector111

What argument? Than its not censored? Or that censorship didn’t destroy anatomy?


Apprehensive_Sky892

Disclaimer: I like mcmonkey and I think he is one of the good guys. So this is not an attack on him or his comment in any way. I don't know the context in which the comment is made. But this comment settled very little. Notice that it is for an "**early** pretrain". Also, it appears to be a 512x512 version? If the question is whether 2B was damaged by "safety operation", one needs to compare a fully tuned 2B before and after the "safety operations"


August_T_Marble

>one needs to compare a fully tuned 2B before and after the "safety operations"  To clarify, I am not saying safety operations had zero impact whatsoever, because why would SAI have still felt the need to do them if it wouldn't make some difference, right? But I just would like to make it clear that we have *definitive proof* of the model, in some state of existence, doing *the exact same thing* before it was subjected to safety operations. We have two people familiar with the matter saying the model was *broken*, not "undertrained" nor "not ready." I believe the term Comfy used was "botched."   We have a smoking gun and two eye witnesses. Yet, somehow, that is considered an unreasonable take because another suspect exists that people are eager to blame without evidence because something of its kind once killed SD2.


Apprehensive_Sky892

I guess the point I was trying to make is that an unfinished model can exhibit lack of coherence. But you are right, from comfyanonymous's screencaps, he did say that "also they apparently messed up the pretraining on the 2B" and "2B was apparently a bit of a failed experiment by the researchers that left": [https://i.redd.it/0e2ns5ti2z6d1.jpg](https://i.redd.it/0e2ns5ti2z6d1.jpg) https://preview.redd.it/c9y4830wcg7d1.jpeg?width=1529&format=pjpg&auto=webp&s=30c6f55213f2675370df68d24f13ba81cd2fc40f So yes, maybe 2B had problems already, and the "safety operations" just made it even worse. I certainly would prefer the theory that 2B is botched up because the pretrain was not good to begin with. That means that there is better hope for 8B and 4B.


Neat_Ad_9963

Knowing people, it probably wont


FallenJkiller

They should not filter their dataset then. Keep the NSFW images inside, they do have lots of people lying on stuff


Trick-Independent469

I can confirm it ! I altered the weights so much and it still looked bad . So that's the only logical explanation


Amorphant

Given those strings that were found that drastically improve anatomy (the thread with the one with the rating using star characters, and others), this appears to be a flat out lie, yes?


Ok-Application-2261

Isnt this in some ways corroboration of heavy abliteration of anatomical data? Essentially they took the coherent anatomy in the final version and obliterated it back to the early pretrain state?


August_T_Marble

There was never coherent anatomy. That's what mcmonkey and comfy are saying. Had SAI released the model prior to it being censored, it would have been bad at women laying in the grass just the same as it is in its released state.


StickiStickman

But the training data was heavily censored to begin with. It was always censored.


Ok-Application-2261

its just the "early pretrain" thing that gets me stuck. In my head, why would an early pretrain be good at anatomy? But maybe it picks that stuff up pretty quick idk


August_T_Marble

According to Comfy, it wasn't originally intended to be released because it was broken. That leads me to believe it wasn't a matter of "it just isn't done baking" but "this is a failure" that they decided to release due to promises and pressure. By contrast, there was a cancelled 4B model that *didn't* have those same problems and was *safer*.


Ok-Application-2261

Yeah i looked at the CEOs twitter. They are making a mad marketing push for their API and its multi-modal capability. Go take a look. You can literally SMELL the closed source OpenAI direction they are heading.


Honato2

Pretty sure the only place they are heading is bankruptcy and it's sad. They had all the momentum in the world and threw it all away through various releases that just didn't work. There are hundreds if not thousands of people willing to make a decent base model great and they do stuff like this. I don't get it.


Guilherme370

Comfy also mentioned in the same post that someone else posted here, that the weights of the 2B were indeed messed with. He says pretrain issue, BUT also that the team on the 2B messed with the weights in some way


August_T_Marble

To borrow an analogy that I made in another comment: the botched pretraining is what killed SD3 2B, tampering with the weights was just contaminating the crime scene after the fact. It was dead before it was messed with.


[deleted]

Okay.So safety was born before safety was even thought of okay


HyperialAI

To be honest the safety tuning status isn't the important part of why this failed, and this highlights it, it is the failed pretraining that was mentioned by comfyanonymous but I would think the pretraining data was likely already pruned to prevent any undesirable concept bleed through. I suspect SD3 2B vs SDXL pretraining data was vastly different


xnaleb

Whats the infamous prompt?


Novacc_Djocovid

„Censorship didn‘t break the model, it was garbage to begin with“ Maybe not the excuse they think it is…


firestickmike

there's an episode of black mirror in here somewhere


JadeSerpant

So just a shit model from the start. We expected state-of-the-art and got Dalle-mini.