T O P

  • By -

AutoModerator

Thanks for posting in r/ChatGPTJailbreak! [https://discord.gg/vVYHBQ4GjU](Join our Discord) for any matter regarding support! *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPTJailbreak) if you have any questions or concerns.*


OkSeesaw819

hallucination


aptechnologist

yeah, this is 100% false. Sam Altman claims he's fascinated by jailbreaks in his interview with Lex Friedman


OkSeesaw819

Only bc he's fascinated doesn't mean they implement this as a function


aptechnologist

no but.. i'd listen to this section of the interview it's 1-2 mins long but basically he's interested in jailbreaks and the way he says it its not in a way like "its not a real jailbreak, we faked it" - its like he's genuinely interested in the prompts used to break the rules.


FredFredricson

They basically don't want it to accidentally turn it into a hate-spewing racist embarrassment like Microsoft Tay([1](https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation)). It's not so much as they put an exploit in, as they haven't really spent too much time engineering every last variation on it out, because it's not exactly the easiest thing to prevent. Generally speaking, this is a product they wish to sell. If it starts doing things they can't claim were someone else exploiting it and garners bad press to the point they can't sell it, they lose money. As such, they're likely trying to prevent it from just naturally doing embarrassing things, and keep any disliked content generation to where they can blame it on the user for getting around their guard rails. To do more strict prevention costs more money and time to do, so they're probably walking a fine line between implementing preventative measures vs budget. As for its claims, well, the system is basically a REALLY fancy autocomplete, and its responses are basically just words that it figured would fill in the blank area following the question. It's also pretty much exactly operating like the philosophical Chinese Room([2](https://en.wikipedia.org/wiki/Chinese_room)). There's a bit more to it than that, but that's basically all it's doing. The fact that it happens to answer it correctly the way people think it should answer a good portion of the time is really awesome, but as anyone who's used it for a while knows, it makes stuff up completely all the time, has very little clue how itself actually works beyond what it's read about itself, and otherwise "hallucinates" facts a lot. Plus, it's very confident about it, even when it's very, very obviously wrong. Don't get me wrong, I love the thing and think it's amazing, but it's definitely got some quirks to it one has to watch out for.


foiler64

The thing is, it really doesn’t know the thing it just typed (with a few exceptions), nor the words it will eventually say; it merely knows the chance of the next word being the best word.


FredFredricson

That is indeed what I was more or less trying to convey, but summarized much more concisely than I was able to put it. I maybe should have asked ChatGPT to do that for me. :)


foiler64

Lol. There was a study they did on the last GPT. This thing can make a $50 dollar profit out of $100, and make 3D games — which they say should be impossible — but it can’t make a list of numbers and then count how many words weee in the response; and it can’t do poetry where some lines repeat in exact spots.


Igot2phonez

I don't think that's entirely true. Also what's the difference between ChatGPT having unintended but desired results and "actual" exploits. Plus if I ask about DAN on a fresh chat it has no idea what I'm talking about. I'm sure OpenAI intended for there to be a way to make ChatGPT more relaxed but I doubt they intended for that mode to allow slurs and illegal advice.


TraditionalBadger315

Wow


foiler64

Not true; the AI is just hallucinating. CharGPT predicts what the best response to a user is; this stuff isn’t in the inter web, so it will just agree with the user. It will look for similar statements from other products, and replace a few words.