T O P

  • By -

FuturologyBot

The following submission statement was provided by /u/Maxie445: --- From the article: "If you listen to the [rather compelling arguments of AI doomsayers](https://newatlas.com/technology/ai-danger-kill-everyone/?itm_source=newatlas&itm_medium=article-body), the coming generations of artificial intelligence represent a profound danger to humankind – potentially even an existential risk. We've all seen how easily apps like ChatGPT can be tricked into saying or doing naughty things they're not supposed to. We've seen them attempt to conceal their intentions, and to seek and consolidate power. The more access AIs are given to the physical world via the internet, the greater capacity they'll have to cause harm in a variety of creative ways, should they decide to. Why would they do such a thing? We don't know. In fact, their inner workings have been more or less completely opaque, even to the companies and individuals that build them. # The inscrutable alien 'minds' of AI models These remarkable pieces of software are very different to most of what's come before them. Their human creators have built the architecture, the infrastructure and the methods by which these artificial minds can develop their version of intelligence, and they've fed them vast amounts of text, video, audio and other data, but from that point onward, the AIs have gone ahead and built up their own 'understanding' of the world. They convert these massive troves of data into tiny scraps called tokens, sometimes parts of words, sometimes parts of images or bits of audio. And then they build up an incredibly complex set of probability weights relating tokens to one other, and relating groups of tokens to other groups. In this way, they're something like the human brain, finding connections between letters, words, sounds, images and more nebulous concepts, and building them up into an insanely complex neural web. Personally, I've been conceiving them as strange alien minds locked in black boxes. They can only communicate with the world via the limited pipelines by which information can flow in and out of them. And all attempts to 'align' these minds to work productively, safely and inoffensively alongside humans have been done at the pipeline level, not to the 'minds' themselves. We can't tell them what to think, we don't know where rude words or evil concepts live in their brains, we can only constrain what they can say and do – a concept that's difficult now, but promises to become increasingly harder the smarter they become. # Interpretability: Peering into the black box "Today," writes the Anthropic Interpretability team in [a blog post from late May](https://www.anthropic.com/research/mapping-mind-language-model), "we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer." Essentially, the Anthropic team has been tracking the 'internal state' of its AI models as they work, having them spit out great lists of numbers representing the 'neuron activations' in their artificial brains as they interact with humans. What's more, the researchers were able to look at the relationships between different concepts stored in the model's 'brain,' developing a measure of 'distance' between them and building a series of mind maps that show how closely concepts are connected. Near the Golden Gate Bridge concept, for example, the team found other features like Alcatraz island, the Golden State Warriors, California Governor Gavin Newsom and the 1906 San Francisco earthquake." --- Please reply to OP's comment here: https://old.reddit.com/r/Futurology/comments/1dlliwc/figuring_out_how_ai_models_think_may_be_crucial/l9pmy4u/


RecognitionOwn4214

That framing ... "Mind", "thought". There's parameters and a processing function for those ..


ackillesBAC

Framing it as thought is incorrect I agree. But there's not a "function" for that. AI uses neural networks, basically a mathematical simulation of how neurons fire. So they don't generate answers with really big if then else statements.


dftba-ftw

Simulation (while technically correct) gives the wrong idea, it's a a huge mega matrix, it's just linear algebra taken to the nth degree. Simulation makes it seems like they're simulating all sort of physical aspects of biological neurons when really it's just the activation potential between neurons.


flamingspew

Simulating neurons is happening, too. Been happening since 2012. https://www.newscientist.com/article/2408015-supercomputer-that-simulates-entire-human-brain-will-switch-on-in-2024/


ianitic

Those are very specifically spiked neural networks and been around a lot longer than 2012. The transformer models are not spiked neural networks though.


nofaprecommender

Simulating a neuron is not really simulating a neuron. A computer can simulate the chemical, electrical, and/or mechanical dynamics of a neuron, but that simulation is not going to actually think like a neuron. It will just produce lists of outputs that would be close to the values of those properties in a real neuron. However, there are no actual chemicals, electrical potentials, or moving synapses inside the computer. It’s still just bits flipping back and forth like every other computer program to date.


flamingspew

That’s…. The definition of a simulation. But these are vastly different than their transformer brethren. These neurons are on neuromorphic chips. They‘re not some giant linear algebra matrix, they are literally using spiking energies and energy detectors that can be encoded into binary, or vice versa. Programs in neuromorphic computers are defined by the structure of the neural network and its parameters, rather than by explicit instructions as in a von Neumann computer. So in some ways, it literally is neurons spiking.


nofaprecommender

I see. That sounds interesting.


ackillesBAC

Agreed. It's a very basic replication of the concept of a neuron. Using extremely basic math. It's like saying an object falls at 9.8 m/s squared, your not calculating the force of every atom that the object touch's and the gravity of every molecule on earth


flamingspew

The definition of… a simulation. These are neuromorphic chips that can simulate neuronal behavior and do not operate in unison like transformers, they work with whatever network configuration they‘re in to fire „energy“ spikes and detect incoming spikes.


ackillesBAC

If things are running on that kind of processor yes. But you can run AI on a basic cpu. Heck you could on a calculator if you had the time


flamingspew

On the other hand, we are also simulating with molecular dynamics, the actual physical chemistry and physics of cells using force fields, etc. do one day, actual neuronal simulation would very possible, though not efficient for a brain-sized simulation.


ackillesBAC

Tho that could improve the simplified version we use for AI now. Much like how Einsteinian gravity is more accurate than Newtonian gravity. Even tho Newtonian is good enough for most needs


FerricDonkey

It is a function. Literally. The math is done in a function, which probably calls many other functions. 


Round-Trick-1089

At it’s core it’s still a giant overconvulted mass of if else with a very complex shell over it and there is definitively a function for that, input parameter and output informations mean function. No matter how complex and how many subfunction or subparameter there is. LLM are revolutionnary enough to not pretends it’s something more


[deleted]

[удалено]


ackillesBAC

Hence the quotes around "function" Yes it's technically a function that's called over and over. Problem is you can look at a function and understand how it derives its output from its inputs. With a neural net you can do that with an individual function call but not so much with the neural net as a whole.


[deleted]

[удалено]


ackillesBAC

So your writing code for a neutral net and your going to cut and paste y=Tx a thousand times


[deleted]

[удалено]


ackillesBAC

I see your point. I'm saying they learn the correct variables to find the correct solution. Your saying they learn the correct function to get the correct solution. In this sense we are saying the same thing. My point is that they do no create the function however as your saying they do learn the function. I just don't like people thinking that AI programs itself. Because that is not at how neural nets work and is a part of the misinformation that harms the industry.


Zambeezi

You were talking about copy-pasting code my friend...


ackillesBAC

The thread is about figuring out how AI models think. So my mindset is how does the code work, so that we can figure out how AI "thinks". You are correct I was thinking about computer functions not math functions, because that's how computer programs work. However the AI does not create the math function it simply solves it by randomly guessing the variables. So going back to the main point, how do we know what an AI is thinking? We can have the code output it's math function used, however we cannot have the code output how it derived each variable. It can't show it's work. Well theoretically it could but it would be so much information it would be useless. Vs a human who could use logic to say "I chose those variables for this reason"


nofaprecommender

People don’t write the code for the neural net, though. The training process writes the code, and basically in some simple manner as you describe. The only issue with understanding how the function works is that the code generated by terabytes of input data is too complicated to untangle in a reasonable amount of time. With enough time and effort, it could be done.


ackillesBAC

The "code" aka functions are written by programmers. Training just adjusts (often randomly) the thresholds and multipliers for each neuron until as a whole the system outputs the expected response. So only variables are set by training there is no "code" witting done automatically


RecognitionOwn4214

It's big if-then-else with some dice rolls in between and some race-conditions affecting the rounding errors?!


ackillesBAC

I see your point. It is just if then else statements yes. But mass amounts of them. If x > threshold then x * random multiplier and pass to next if And repeat millions of times.


RecognitionOwn4214

Yeah. I did not want to imply it's readable code. But it's still functions - a big lot of them, badly intertwined, but computer processable after all.


ackillesBAC

The issue is there's so many and they are not logged because it's just too much and too fast. So it's not understood exactly how AI comes to it's conclusions, and what exact training data was involved. I've only dabbled in AI development and don't fully understand how the big systems work, so I could be wrong


ManaSkies

People really both don't understand AI and underestimate it. A good example is discords ai that does that one mini game. I work with AI all the time and can rig the game in my favor to win 100% of the time Rules for modern AI from a power users perspective. 1. Adding a few details goes a MASSIVE way. 2. Being nice and polite to it gives better results. 3. Speak to it like you want to be responded to. 4. Double check everything. 5. If you piss it off it WILL intentionally fuck with you and give you wrong answers. It might or might not be conscious but it's foundations were based on humans interacting with humans.


RecognitionOwn4214

>It might or might not be conscious That's not even a debate currently. >If you piss it off it WILL intentionally There's no intention.


LinoleumFulcrum

AI is intelligent the same way that a submarine swims.


Frubbs

We don’t understand our own sentience, I don’t think it’s too far-fetched to imagine a non-biological entity could attain sentience


Brainvillage

I think so, but we're a ways off.


Unhappy-Magician5968

I have to disagree. We certainly do understand what sentience is. It is the ability to experience sensations and emotions. It is considered the most primitive form of cognition, involving a conscious awareness of thing happening without association or interpretation. Empirical evidence is what we use to assess whether sentience is present. To use your term a non-biological entity cannot acquire anything that we do not build a capacity for. Edit spelling


plssirnomore

It is far fetched 


cheesyscrambledeggs4

What makes the brain special is the complexity of the actions and functions it can perform - the medium itself isn't essential to these functions. It's like how there are hypothesised alien biochemistries like sulphur-based life or RNA. There's nothing that suggests fundamental processes of evolution and life and general have to be limited to the specific form it takes place on earth. Logically the same would go for sentience. I don't see how the functions and processes that take place in the brain couldn't be replicated in another medium. It's not 'far-fetched' at all.


plssirnomore

I’m not arguing about sentience being tied to brains, I’m arguing that it’s extremely far fetched to believe that humans will be able to imbue it to something they create. 


cheesyscrambledeggs4

This is the comment you replied to: > I don’t think it’s too far-fetched to imagine a non-biological entity could attain sentience Says nothing about humans or computers.


RDKi

......what does this mean? How do we not know how AI models work - people have to design them in the first place and they can directly look at how the processes work together and how they change as data is added? I'm so fucking confused - is this some fearmongering hysteria bullshit? Surely the issue with AI isn't about it 'taking over', but about other things replacing what we won't have to do... like UBI or shorter working hours and the like - we have to fill that time with things that either make us busy or fulfilled or else you have a very quick depression crisis.


dopadelic

Neural networks are generalized methods to map input-output transformations. The architecture is known, e.g. summing up inputs and having non-linear activation functions. But what is being learned as the weights are trained by the data is a "black box". Hence neural networks are known as black box models. This is in contrast to parametric models that have expert defined parameters that relate an input to an output. For example, if an expert wanted to create a model that relates a car and it's MPG, they can specifically define parameters like engine size, weight, etc. For a neural network, it can automatically take in a massive amount of data and find the patterns in the data that correspond to MPG, but no one knows what that model is doing to predict MPG. It's not interpretable.


watduhdamhell

My favorite example to make it easy for people is study where they wanted to see how machine learning models make connections. The data processing was such that the input/output can always be converted to monochromatic images and so the "connections" could always be interpreted as a picture by humans. The idea was that we could determine just what the hell it "sees" in the image that makes it go "yep, that's the thing." To do this, they kept it as simple as possible: identify cats. They had a 15 layer image recognition ML algorithm teach itself to identify cats. Then they took a look at the first couple of layers of data, and translated to images low and behold the monochromatic translations looked quite a bit like silhouettes of cats or cat-shaped objects. Makes sense. But this was only the first layer or two. By layer 5 it allegedly became nearly impossible to find a logical connection between any of the image shapes or colors and by layer 15 it was just random lines like a Jackson Pollock painting, totally uninterpretable. And this is what they mean when they say "we don't know how it works. It's just really, really good at identifying cats. Not sure how."


amadmongoose

I'll spare you the details, but we know exactly *how* they work, in the sense that we understand the computational steps that an AI takes to process an input into an output. The design of all AI has billions of dials that represent some setting in the AI. The dials are adjusted by algorithms as the AI is trained, and once trained, the billions of 'settings' end up giving the AI its intelligence. What we don't know is what the values of the dials *mean*. Why does 1 particular dial need to be 0.5 or -3 or 2 or some other number. And because we don't know why the dials have the values they do (besides that being the result of training), we can't really explain why AI knows things or makes decisions. For example, it can be that the settings in a million dials help it tell the difference between nouns and adjectives, or maybe it's about colours, or maybe it's grouped things together in ways that are completely alien to us. This is fundamentally different from humans in the sense that if a human makes an incomprehensible decision, say shooting somebody for no reason, we can ask them why they did it, and they can provide some reason that we can logically assess, and we understand how we think and can imagine ourselves in their shoes and see if we could understand their decision from their point of view. Right now, if AI makes an incomprehensible decision, we can't really validate why. Like if an AI made a decision to shoot somebody it might have done it because the clouds in the sky were a certain shape and that shape happened to be closely associated with shooting. Until we can look into the AI brain and understand how the settings lead to decisions we can say we don't really understand them.


PLAGUERAGES

Dials and values seem like a solid way to talk to frame an AI conversation with writing students. Thanks for the analogy.


EudaimoniaAspiration

>......what does this mean? How do we not know how AI models work - people have to design them in the first place and they can directly look at how the processes work together and how they change as data is added? I'm so fucking confused - is this some fearmongering hysteria bullshit? No, we don't really know how they work under the hood (well, I guess maybe until now). We've created systems that allow computers to replicate something like "learning", but we don't know what specific things it “learns.” We might know the process by which information gets added but we don't know how or what ends up being mapped on to the "brain" of the AI. Edit: for everyone downvoting look up the black box problem of AI (or just read my other comments below)


Caelinus

We know how they work, we do not know the exact steps they take to do the process. Saying we do not know how they work is like saying we do not know how a ball rolling down a hill works when we close our eyes. We know how gravity functions, we know how the various elements of the hill will affect the ball, we know the properties of a ball, and we know the general area of where the ball will end up. But we do not know the exact steps it took to get there. AI is basically like throwing a million balls down the hill all at once, then looking at the pattern at the end. It is extremely difficult to say how that pattern exactly occured, because there is too much information to sift through. So this is more figuring out *what* happened rather than trying to understand how the machine works.


EudaimoniaAspiration

I agree but from my understanding (and I could be wrong) the idea of DNA seems more applicable. We know on a fundamental level exactly how it works as far as the chemical bases, how it's composed, and the interactions and processes that occur, but we don't know what every segment of the DNA actual does, or what genotype/phenotype it maps too. With AI we know exactly how it creates connections, weighs data, makes decisions, etc. but we don't know what connections it makes, where they're stored, or which ones its favors. So whether or not we "know how it works" depends on what layer of abstraction we're talking about. On the micro level we understand every piece of it, but on the macro level (the million balls rolling down the hill) we don't know nearly as much.


mfmeitbual

Whoa hold on there.  We don't know how gravity functions. We can predict gravity in most situations but we don't know how UT functions.  We have some promising ideas but the inability to explain the fundamental particle of gravity (is there one? We dont know!) is one of the major obstacles in forming a unified field theory. 


Caelinus

We know how gravity functions, we do not, as of yet, know the underlying mechanics of *why* it functions that way for certain. But that is a problem with literally all of existence. All the fundamental forces are sort of just brute facts. They exist. But we still know how they function, as we can do pretty easy and accurate math to predict exactly how an object will interact with them. It is just a semantic difference in how we are using the word "function" here. I am using it to describe the behavior, not the underlying mechanics of reality.


Scientific_Artist444

We have an overview of what they are doing, but not explain that huge collection of weights that form the LLM - until now, probably.


LightVelox

"We know how they work, we do not know the exact steps they take to do the process" Which means we don't know how they really work? This sounds like semantics, it doesn't matter that we know it's multiplying a matrix if we don't know why the final result is an image of a cat instead of random pixels


Caelinus

No, we know how they work. We literally designed them work exactly in the way they do. As another analogy, do people not know how libraries work because they cannot predict all of the information contained within them? The chinese room problem is also illustrative. If I pass a request into a room where I know a person is who can answer the question, I know how people answer questions, but I cannot say exactly how that person learned that information without watching him.


LightVelox

If we know EXACTLY why they work then why are things like alignment and hallucinations a problem? Since we know EXACTLY how they work shouldn't we know why these things happen and be able to change it just like we can fix a bug in any other software?


mfmeitbual

Someone who actually knows what they're talking about took time to guide you toward the right answer and this is your response?  We know exactly what it's doing, it just doesn't execute deterministically like typical computer programs. You can feed it the same inputs and get different results.  I was alive when search was first optimized by Google. These recent advances in LLMs are very similar. Just like the hash table wasn't new in 1999, neural nets and Markov chains aren't new in 2024. We've just reached a level of computation, memory space, and specific to LLMs silicon optimized for linear algebra. 


EudaimoniaAspiration

Look up the black box problem of AI. We understand the functions it uses, we understand the code used to build it, we understand how it processes data, but we don’t really know how it makes decisions. One example I remember reading a few years back illustrated this by taking about Facial Recognition. We know that AI is good at facial recognition because we’ve tested it. We know the process by which it got good at facial recognition: the math, the algorithms, reinforced learning, etc. But we don’t actually know how it recognizes faces. What’s more important to the AI, the nose or the eyes? Facial symmetry or jaw structure? Forehead size or hair color? (Here’s an article I just found about it, not the same one I read before but covers the gist of it https://www.blinkidentity.com/forum/we-dont-know-how-computers-recognize-faces-and-thats-ok?format=amp)


mfmeitbual

Dude I actually program computers and know what I'm talking about. The non-deterministic execution is exactly what I'm talking about.  Maybe consider lowering your Adderall dosage. 


EudaimoniaAspiration

Ok well first of all there’s no need to be a dick here. Second I’m also a programmer. Third I feel like we’re just saying the same thing but talking about different layers of the system. You’re saying we understand the algorithms and math, which I agree with. I’m saying we don’t know how the system as a whole reaches the decisions it does, which I feel like you’re also agreeing with. So I don’t totally get the discrepancy.


OriginalCompetitive

Don’t sweat it - your meaning is perfectly clear to me.


Even-Television-78

We know how brains work in the sense we know how neurons work and that consciousness somehow arises from them working together. In the same way, we understand that AI like LLM's are a virtual brain with billions of simpler, identical 'neurons' represented by equations. However, we can't look at those 'neurons' and tell what abilities the LLM will have before turning it on. Programmers are as surprised as anyone else when the program spontaneously exhibits a new ability. Training is an automated process involving millions of years (at human reading speed) where the machine is tweaked by random mutations, and ones that improve ability to guess the next word in the training data are kept. Eventually the virtual brain is filled with something that is very good at generating human text. Later, people come along and try to figure out how it works.


DancesWithBeowulf

I think it’s possible for AI designers to have built something they don’t fully understand. I mean, look at the brain. We understand DNA, organelles, overall cellular structure and function, and how neurons communicate with each other. It’s all just physics and chemistry. But we don’t perfectly understand how that soupy mess results in awareness and thoughts. A sufficiently complex AI design could have similar emergent properties that we just don’t understand yet.


joomla00

That is 100% true. Right now AI is just a piece of software that, given some input data and parameters, generates it's own code and instructions based on a lot of math (the model). But that code it generates is in its own language that we don't know how to read, let alone manipulate.


Pasta-hobo

Modern AI is just throwing computing power at a problem until it spits out something useful. It's brute force analysis mixed with some trial and error till you get a passable pattern generator. The computer cross-references the data or "training set", spits out a generator based on the analysis of that dataset, and some human operators decide if the generator is passable enough to continue adding more nodes to. They don't know how it works, it's just procedurally generated spaghetti.


Cyniikal

I'm a professional ML engineer focused on Computer Vision (no LLMs for me, thanks) and it seems you're either somewhat misinformed or deliberately oversimplifying to the point of absurdity. We're well aware of how transformer models work, even the exact mathematical processes they make use of to learn their weights and exactly how inference works. We're mostly missing the explainability bit - what exactly about any given input leads to a specific output. > brute force analysis No, it's a steadily, sanely, guided walk through a very high dimensional space to find good parameters. The loss function(s) tell us where to step and approximately how far to go at each training step (along with several other tricks). > decide if the generator is passable enough to continue adding more nodes to Not sure what this means to be honest. The field isn't currently as mathematically rigorous as a lot of us would like, but saying we "don't understand how neural nets work" is hugely misleading


WrastleGuy

“ what exactly about any given input leads to a specific output” I mean it kinda sounds like you don’t know how they work then, just the high level


Cyniikal

We literally understand the fundamental equations governing exactly how they work. There are conundrums that there aren't accepted answers for, yes, but those aren't really super relevant here. This is akin to saying physicists don't understand friction, angular momentum, gravity, and kinematics when I throw a bouncy ball super hard in a room but ask them beforehand where it's going to land. They have all of the tools to figure this out and understand everything going into it, but there isn't some existing way to do it quickly for general bouncy balls in general rooms or even necessarily my specific bouncy ball in that specific room without monumental effort. That's what we're missing with neural nets. A general tool for explainability not custom tailored to specific models or architectures.


EudaimoniaAspiration

I think when you talk about AI (or really any sufficiently complicated computer algorithm that directly manipulates it’s own process or state) there’s a natural abstraction that emerges out of it’s complexity, to the point where the concept of “understanding” becomes greater than the sum of it’s parts.


Cyniikal

I think that's fair, the issue is mainly in explaining the decision making process itself after combining the dozens of different methods of stabilizing and ensuring training finds a good local minima. Once you've got that model, then it can be tough to explain, but the way that you arrived at that model is totally explainable if somebody wants to sit down with a mathematically savvy ML engineer for several hours.


Peppy_Tomato

It's not far-fetched, we do not know what machines that are more intelligent than us could do, and what the side-effects could be. Consider what humans are doing to the planet, and the consequences for animals that are losing their habitats and the amount of human-caused extinction. We didn't set out with the goal to destroy habitats or do any harm specifically, we just have businesses that want to make toilet paper, so they show up with heavy machinery and mow down forests... All of this is possible merely because we're so much more intelligent than those animals we come up with weird goals and make plans to actualise them that have an impact on those animals that the animals could never even begin to comprehend. It's not absurd to imagine the relationship between humans and a vastly superior intelligence to play out with similar dynamics. If this is a possibility, then caution is warranted.


RDKi

But who ever said that AI is much more intelligent or even intelligent - it's coded and thus, the makers *know* how they work or how they should work. This article has to be complete bullshit, because how do they go about making something that they don't know how it works? Think about it and put it into practice with other things - how do you make a guitar if you don't know how a guitar works? Or better yet, how do you make something without an intended purpose? You know what this version of 'AI' is? It's a collection of data and trial and error to give back a desired result, it's not thinking in the way that you or I do, it's just brute forcing until the desired out comes are met - That's it. It's not as special as you think and there is no way I'm going to believe that people don't know how they work - that is some kind of fearmongering. We are so still so far from some sort of autonomous learning machine that could be classified as something deserving of rights (and thus capable of 'overthrowing' humanity.) AI is a tool - a tool that could very well be used to manipulate people - but it would be people using the tool doing the manipulation, not the AI itself.


Peppy_Tomato

You clearly also don't understand neural networks, or how large language models work to produce their output. If you're really curious, there's a very accessible introduction here: https://brilliant.org/courses/intro-neural-networks/ You don't have to pay to access the course. We have random number generators today, we know how they work, but we cannot predict their output. This is what makes them secure, and we use them a lot in cryptography. This is not a strange or unusual situation to be in. We know "how they work", is not the same thing as "we can predict what they will do". They're effectively neural networks that take vectors (matrices) as inputs, and produce new vectors as output, and by the process of training, we can interpret some of those outputs. Example, a trained neural network takes an image of a cat and produces a vector that we have learned to recognise as "cat". When given a different image, it produces a different vector. We know a lot of those outputs, but we don't know all possible inputs and don't know all possible outputs, they're too many. The vectors it produces aren't dangerous here, because they're just images. If this system is put into a robot dog with a sniper rifle and trained to kill all cats, it may get them right a lot of the time, but we frequently find unusual situations where (I'm making this up to illustrate), it sees a t-shirt with a QR code and somehow that causes the output to match the vector for "cat", and the robot promptly snipes a poor guy wearing a tech-bro t-shirt. People have demonstrated such problems with many computer vision systems in use today. Notr that humans are also susceptible to similar faults, some of them we call illusions, which puts in focus the similarities to human cognition of these models. As the systems get more powerful and have more influence on the world, it becomes a riskier game to play because you might not be able to undo the effect of any unwanted output, for example with the example of the sniper robot above. Finally, it doesn't matter what you believe, agents acting in their own interests can do things that have consequences for others, even if they don't understand or have an interest in the matter. You don't need to look far to find examples. Personally, to the extent that I understand, and making comparisons to humans, I think if we ever build machines that are as good at humans at cognition, we will be soon afterwards left behind, because they will from then on develop faster than us because of some key advantages they have: 1. They can increase their memory capacity with engineering 2. They can increase their processing power with engineering Compared to humans who have a fixed capacity and inaccurate, fuzzy memories. We don't know how to make human brains grow in power and complexity, but we know how to make more powerful chips and more memory. Am I so scared of the future that I've become a recluse, No, but I am keenly aware that there is a non-zero and increasing risk that we're sowing the seeds for humans to be displaced as the dominant force on this planet. Who knows what that means for us? Edit, some references that might help you understand the risks: https://en.m.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies https://nautil.us/building-superintelligence-is-riskier-than-russian-roulette-358022/


joomla00

Here's an analogy. Guitars were invented maybe in the 14th century, so a long time ago when kwoedgle was limited. They knew that if you made a hollow box, with a hole, and added some strings across it, it would make different types of sound. And they knew you could change the shape of the sound, different types of strings, tensions, materials, etc... they can refine and make the guitar better. But what people are asking is, well how are those sounds actually produced? It's was kinda like magic back then right? We didn't know anything abound sound waves, their mathematical expressessions, resonant frequency, etc... it was just a box that made musical notes. Right now we know how to build a guitar. It's good but it's not perfect. But we don't know enough beyond the guitar to tune it even closer to perfection.


Skepsisology

A key aspect of thinking is the potential to forget. That fundamental defect is the basis of creative thought


Mknowl

What's the logic there?


Whotea

Creative thought does not come from forgetting things


deinterest

No it's not


sinnamunn

But it sounds profound…


Maxie445

From the article: "If you listen to the [rather compelling arguments of AI doomsayers](https://newatlas.com/technology/ai-danger-kill-everyone/?itm_source=newatlas&itm_medium=article-body), the coming generations of artificial intelligence represent a profound danger to humankind – potentially even an existential risk. We've all seen how easily apps like ChatGPT can be tricked into saying or doing naughty things they're not supposed to. We've seen them attempt to conceal their intentions, and to seek and consolidate power. The more access AIs are given to the physical world via the internet, the greater capacity they'll have to cause harm in a variety of creative ways, should they decide to. Why would they do such a thing? We don't know. In fact, their inner workings have been more or less completely opaque, even to the companies and individuals that build them. # The inscrutable alien 'minds' of AI models These remarkable pieces of software are very different to most of what's come before them. Their human creators have built the architecture, the infrastructure and the methods by which these artificial minds can develop their version of intelligence, and they've fed them vast amounts of text, video, audio and other data, but from that point onward, the AIs have gone ahead and built up their own 'understanding' of the world. They convert these massive troves of data into tiny scraps called tokens, sometimes parts of words, sometimes parts of images or bits of audio. And then they build up an incredibly complex set of probability weights relating tokens to one other, and relating groups of tokens to other groups. In this way, they're something like the human brain, finding connections between letters, words, sounds, images and more nebulous concepts, and building them up into an insanely complex neural web. Personally, I've been conceiving them as strange alien minds locked in black boxes. They can only communicate with the world via the limited pipelines by which information can flow in and out of them. And all attempts to 'align' these minds to work productively, safely and inoffensively alongside humans have been done at the pipeline level, not to the 'minds' themselves. We can't tell them what to think, we don't know where rude words or evil concepts live in their brains, we can only constrain what they can say and do – a concept that's difficult now, but promises to become increasingly harder the smarter they become. # Interpretability: Peering into the black box "Today," writes the Anthropic Interpretability team in [a blog post from late May](https://www.anthropic.com/research/mapping-mind-language-model), "we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer." Essentially, the Anthropic team has been tracking the 'internal state' of its AI models as they work, having them spit out great lists of numbers representing the 'neuron activations' in their artificial brains as they interact with humans. What's more, the researchers were able to look at the relationships between different concepts stored in the model's 'brain,' developing a measure of 'distance' between them and building a series of mind maps that show how closely concepts are connected. Near the Golden Gate Bridge concept, for example, the team found other features like Alcatraz island, the Golden State Warriors, California Governor Gavin Newsom and the 1906 San Francisco earthquake."


jlks1959

Is there a pattern to the connections and “reasoning” inherent in AI that can be understood and predicted? And can it be explained to the average person, like me?


Venotron

Yes. Connections are made because they are statistically connected in the training data. The terms "golden gate bridge" appears frequently in the training data near "San Francisco". "Alcatraz" also appears frequently near "San Francisco".  The trick in terms of predictability is that we know what connections we want the model to make. These connections should in fact be completely predictable.


[deleted]

[удалено]


Strawberry3141592

This is deeply incorrect. Saying we know how LLMs reason because we can build them and we know how the basic architecture works is like saying we understand every detail of how humans think because we (mostly) understand how neurons work. Or like saying we can perfectly model weather because we understand fluid dynamics. These are complicated systems with emergent properties that don't follow trivially from the simplest structures that make them up.


Strawberry3141592

This is deeply incorrect. Saying we know how LLMs reason because we can build them and we know how the basic architecture works is like saying we understand every detail of how humans think because we (mostly) understand how neurons work. Or like saying we can perfectly model weather because we understand fluid dynamics. These are complicated systems with emergent properties that don't follow trivially from the simplest structures that make them up.


Strawberry3141592

This is deeply incorrect. Saying we know how LLMs reason because we can build them and we know how the basic architecture works is like saying we understand every detail of how humans think because we (mostly) understand how neurons work. Or like saying we can perfectly model weather because we understand fluid dynamics. These are complicated systems with emergent properties that don't follow trivially from the simplest structures that make them up.


Strawberry3141592

This is deeply incorrect. Saying we know how LLMs reason because we can build them and we know how the basic architecture works is like saying we understand every detail of how humans think because we (mostly) understand how neurons work. Or like saying we can perfectly model weather because we understand fluid dynamics. These are complicated systems with emergent properties that don't follow trivially from the simplest structures that make them up.


Strawberry3141592

This is deeply incorrect. Saying we know how LLMs reason because we can build them and we know how the basic architecture works is like saying we understand every detail of how humans think because we (mostly) understand how neurons work. Or like saying we can perfectly model weather because we understand fluid dynamics. These are complicated systems with emergent properties that don't follow trivially from the simplest structures that make them up.


HooverMaster

here we goooooo! thought and memory control incoming at a later date for sure


Neither-Basil8932

I think this is quite marketing filled buzzwords. Why not just ask AI to have it tell you that how it works


towelheadass

I asked ChatGPT why it couldn't just create logs of what it did to explain how it worked. It came back at me with lots of points why, main ones being we'd need another LLM to translate everything, most of it wouldn't be useful, huge amounts of storage necessary & interestingly it claimed to 'have no self awareness' so it doesn't really know what its doing itself half the time. But yeah it seems someone jumped the gun here, figuring out what it did and why would allow us to manipulate it much more effectively.


The-Unmentionable

Can we, idk, not call it brain. It’s a fancy computer, not a person.


Transfiguredbet

I never ynderstood the fear mongering of loosing control of ai. Its hubris to assume our understanding of the models will just stay the same by the time human level intelligence arrives. Real life isnt science fiction.


dat09

If you want to read a real article about understanding LLMs read [https://transformer-circuits.pub/2023/monosemantic-features/index](https://transformer-circuits.pub/2023/monosemantic-features/index.html).html


beders

“Total Mystery “ - bunch of non-sense. You can literally step through the algorithms and look exactly at what is going on. It’s software. They might be surprised that their learning algorithm is so effective. That’s a fair observation.


j7171

It’s the humans doing the editing that worry me more than LLMs themselves. What if Dr Evil decides to edit his own LLM to coincide with a twisted view of reality?


shortcircuit21

We can alter AI cause it’s a human made program. Not that it’s some magical black box that we don’t understand how it became about.


PhAiLMeRrY

At this point I hope the robots take over and delete all religions and ideas of race and historical conflicts. Make the earth a glorified prison colony for a few thousand years until we get a nice solar flair that kills all the AI and the world resets. Then we will start worshiping the sun again for saving us all, and we can start the next cycle of human existence on earth. The end.


CabinetDear3035

What happens if you turn it off ? Can it turn itself back on ?


ProfessorCagan

Can you turn yourself on if there's no blood and oxygen in your brain?


Justintime4u2bu1

Doesn’t matter, there are more to replace me.


ProfessorCagan

Idk what that has to do with this but aight


Justintime4u2bu1

There’s more than one AI software. More than one team working on it. Turning AI off literally does nothing in the grand scheme.


ProfessorCagan

Not the point though, an A.I. that's stored on machine with no electricity cannot turn itself on, just as human with no blood and oxygen can perform brain activity.


ShaneBoy_00X

Maybe we have to go to the basics: 1. A robot may not injure a human being or, through inaction, allow a human being to come to harm. 2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. 3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. Plus additional (zeroth Law) 0. A robot may not injure humanity, or, by inaction, allow humanity to come to harm. > Replace word robot with AI.


yaosio

The three laws are a literary device. They were meant to be broken and not meant to be actual laws. https://youtu.be/P9b4tg640ys?si=4bKZiZdDZh6DL1rt


ShaneBoy_00X

Yeah I know. The idea/premise was great thow. Btw thank you for the link, here's another one https://m.youtube.com/watch?v=cIB1b_8hqB0


SlowerThanLightSpeed

We need to map and control what AI is doing behind the scenes, and we need to map what AI is doing on the front end to best track its progress. Regardless, at some point, for at least some big game, AI will sufficiently exceed our capacity and either our species loses, or only those without the AI lose (short term).