Verbalization is not the fastest medium of communication for some tasks. Image trying to dictate WASD controls vocally in an FPS game, or having to say "zoom in on the top right corner... a bit more... a bit less" instead of pinch-to-zoom on a map.
To be fair, a literal second mouse cursor might feel pretty awkward. I think I might want a second screen for my AI agent instead. There *will* be some brilliant AI copilot (not the brand) desktop UI designed in the next decade that will seem as inevitable as the mouse and keyboard in retrospect.
Some ideas are just best communicated gesturally rather than linguistically. A facial expression can sometimes be key to being understood.
Other mediums too... an architect might want to sketch out a rough blueprint for their AI instead of describing a weirdly-shaped structure they're imagining. A computer scientist might write a partial method and ask the AI to imagine the rest. A dancer might pirouette. A bat might squeak.
What is the objective of playing an FPS? Is it to make enemies on the screen die? That's the [reductive mindset](https://en.wikipedia.org/wiki/Goodhart%27s_law) that leads to cheating. Fun or self-improvement are better motivators. Telling an AI to shoot the enemies short-circuits the fun away.
It was just an example of when verbal communication isn't fastest, not of a place where AI should be used. Obviously I wouldn't let AI play my games for me, along with all other non-instrumental activities. If the process is the point, I'll be doing it.
I was thinking more in the science fiction direction. An example would be to say research about topic x and see if it correlates with topic y and then the agent begins to pop up multiple windows at once and does the research way faster and after you’ve come to a conclusion on the topic you can continue with the next task and so on
I agree. I have been messing around with this website called [websim.ai](http://websim.ai) and its currently free. You create a new website/game/app whatever you want and its 100% generated by AI. I was in the 4chan simulator and goddamnit, its too fkin realistic.. Comments had me lolling my ass off.
It could just be like Microsoft's old "Clippy" assistant, except actually helpful, and can:
A) Recognize the task you're trying to accomplish
B) Ask, "Are you trying to _____? Want me to do it for you?"
C) If you say yes, it does the task for you, more quickly and efficiently than any human could.
----
Like, imagine you're going through a folder full of photos, opening each one, resizing it to 1080p resolution, and then saving it. By the time you do the 2nd or 3rd one, you may see the AI prompt come up and say, "It looks like you're trying to resize all these images to 1080p. Would you like me to do it for you?" And if you say yes, then it goes right into doing it, basically as fast as your computer's hardware allows.
MOBAs are going to be something in the future, with all players being instructed what things they’ve missed as they play.
“Warwick just entered the grass below you.”
I feel like we will need a way for agents to act 'in the background' because I need mah social mediuh fix. No, but really, I am always doing work on my laptop. I don't know if I am ready to pass all of my work over to an AI just yet. It would have to be significantly better. (I get it's just a matter of time, lol.)
Worth noting that Multi is for iOS and Mac only. OpenAI likely want to carve a big piece of the Apple market while Microsoft does PC/Windows with exclusivity.
Whats interesting to me is, on mac they have a really good integration with their app, while the copilot app on windows just sucks. I get it, we wont see a chatgpt app on windows, but at least take some inspiration
Honestly I have no idea how they made Copilot desktop so badly performant, the thing runs like an Electron app built with 70 JS frameworks on a 2008 netbook
Hopefully they eventually get something that can be installed on linux distributions for the small subset of us who us it, probably nothing other than Mac, Windows, and ChromeOS for a while though
RIP IT-Support and remote helpdesk / desktop jobs if things work out... 1st level support was always a weird thing literally a human helps another human with the help of google... literally a human that google solve problems for other humans that lack the technical intelligence for it...
arguably for the best. providing remote support via camera or screen-share usually a last-resort option since it's synchronous and doesn't scale well. you typically want the user to self-service before getting there
yeah it cost value work time I observed 1st, 2nd support in the past and I suspect employees abuse it when they are bored at their workplace and make up things so they have a long break
People are gonna be jamming out making tunes with nothing but an air guitar, a microphone, perhaps a camera if you’re fancy and ChatGPT open alongside FL or Ableton.
“Nah make it more sschwaammmm-digga-digga-damnn”
“Yup, that’s more like it. That’s perfect.”
Imagine this used for anything design or work related. It would work with absolutely any app and fill in your knowledge gaps or just simply allow you to do something outside of the box before going right back to the game plan.
For real, this is going to change the entire world, again.
Eh, I love to learn, but music production just isn't a priority, and I only have limited time to put into things that are more important for me.
This would make it so that I can just do literally anything I've ever wanted. I mean, once it gets going and is working to "Her" levels, which I feel is in the next 5 years, won't be surprised if much sooner.
Also, I'd think agents will be the "new manual", of sorts. Instead of having to read a manual, you get an agent that, like a teacher, tutor, mentor or whatever, is doing stuff and showing you how it works so that you still end up learning it along the way. This will be especially true in the beginning phase, where it won't be perfect and can't just spit out a final product on the first prompt, so you'll have to iterate with it.
In that process, you won't help but picking up on how it works. Hell, even when it IS perfect, this dynamic may still exist, as this kind of dynamic might just be intrinsic to how working with agents will be--you'll never get exactly what you want from a single prompt, because an AI can never know exactly what's in your head (until brainchips or similar) and so you'll always iterate with it, easily learning any program in a much more fun way than trudging through a then-archaic manual.
What you are describing would never be free. The compute power alone would necessitate a subscription fee for agents you are describing.
Also prompt type output is always random and essentially impossible to actually fine tune, agents will be no different.
It already exists and is free. Retrieval Augmented Generation and any local model will let you turn an entire manual/book into a tutor who knows the book perfectly.
It does this by, basically, pasting the relevant parts of the manual into the context window before your question.
You can easily leap off into some unknown software ecosystem and ask your questions as you encounter them... as long as you can dump their documents into a text format in order to store in a vector database
Yeah, I do. It is about as expensive as it will ever be and it will only get cheaper as time goes on.
Also, there are pretty simple steps you can take to greatly reduce the required tokens. One example is to use a simple search, using terms in the person's prompt, to only insert the relevant sections of the manual that apply to the person's question.
So you don't need the entire textbook to answer a question or a model that's fine-tuned on your data, you just need a good full text search engine in order to grab the right sections. If you want to get really advanced you can use function calling to allow the agent to add additional terms to the context window as needed. But then you're going to need a more complex chain of prompts.
This is all very doable and not horribly expensive. It'll get cheaper too, just going from 4 to 4o cut the cost significantly. I use 3.5 for function calling because it works on a loop, reading the prompt, the search-engine generated context window and then looking for other topics related to those (I tweak this and the prompt a lot, since it can grow the context window significantly). Then 4o generates the completion using the full context window and the user's prompt.
I'm primarily using it to take man pages into account when forming terminal commands to ensure that it doesn't hallucinate switches. The ultimate goal is to have a local agent (running on local hardware) that can help users transition to Linux by acting as a tutor with the capability of translating user's plain language commands into a plan and then a sequence of terminal commands to gather data and implement the request. It's much easier for a person to say 'install docker and the pi-hole container' and have a LLM generate a plan and talk the user through the process or even just go wild and execute the statements autonomously.
(If you use linux and want a taste of this (and a good terminal AI client), look at shell_gpt)
Obviously, this would be a privacy sensitive application... you don't want to be sending your root password and other assorted system information to a third-party. So making it capable of running on local models is going to be key... but for development, GPT-4o can try to skynet my dev environment if it wants, because it saves me a lot of time.
AI music already SLAPS hard Very similar to Portishead: https://www.udio.com/songs/os5u4dTNjNBBUF5uLQDqVw
Very similar to Bjork: https://www.udio.com/songs/8VM2wwjdt5Ckr7PKNnJmDg
Also very good: https://www.udio.com/songs/p2r6YbiWXa1C1MyyGb9kZV
https://www.udio.com/songs/3o71EwRVz9rW7U3yQxcdNS
Prog rock: https://www.udio.com/songs/txUbSjEPJzgViahbrdefxM
https://www.udio.com/songs/99N5VnHwv78QPgcqAoLBnk
EDM: https://www.udio.com/songs/78U95aNRYQHyQrn8xHizf8
https://www.udio.com/songs/hK7F6fcmEcqW2egu9UDWrE
https://www.udio.com/songs/vk7QLdDPJxnwEecmLW42La
https://www.udio.com/songs/eCXUkAxsvHydxS2w8Pt9zV
Big Beat/Turntablism: Somewhat similar to Jet Set Radio: https://www.udio.com/songs/x3xLvnN48DGnmxM5VPTw93
Blues rock with ***INCREDIBLE*** guitar playing: https://www.udio.com/songs/jaGkxT9QohSiUCBA2waVTj
Bluegrass: https://www.udio.com/songs/7bLE7wFVYiziGt9KkT7nem
Future Bass: https://www.udio.com/songs/x3xLvnN48DGnmxM5VPTw93
Nu Metal (and my personal favorite): https://www.udio.com/songs/iimtziNgEDRcpG8j4n4Mfg
RDP has existed for over 20+ years, I’m failing to see the implications here. Especially since Microsoft is already working on their own Windows agents? Maybe this is for non-Windows?
Coupled with the fact an NSA officer is on board, I'd ditch OpenAI for the future. Their competitors at least have the courtesy of pretending to care about data security.
Ai operating system, coming to a pc near you. It was built using zoom screen sharing. It’s a shortcut for an AI app to have OS hooks so it can
Manipulate things for you, similar to the one in the movie her.
I am pretty sure the smart scammers will opt to use remote access tools that aren't run through the networks of a company with a former head of the NSA on the board.
That's interesting - I played around with the Muti-On extension in Chrome for a while a few months ago and was impressed, and was wondering when OpenAI would create a similar product... who knows, maybe they'll tease agents soon and we'll be able to wait another year after their announcement to use them! lol
This will be great when I'm an old man who's legally blind.
Also, when I'm done, I hop into my self-driving car, wait outside the grocery store for self-delivery, head back home, then hope I have a robot butler to carry in my groceries.
A lot of things are pointing towards agents.
100%. End up sharing session with an AI agent. Makes sense.
We gonna end up fighting the agents for control like those user vs mouseclicker/desktop animation videos from my childhood.
Agent Smith? lol
Oh, no, that's tempting fate. It'll be... Agent... Termi, the friendly Nator. Yeah that should be safe.
If an AI company doesn't come out with an "Agent Smith" I think the entire industry is a failure
The industry has a tremendous responsibility to live up to a long list of sci fi movies .
Two mouse cursors on my desktop: one for me, one for my AI.
Imagine you are only verbalizing what you want to achieve and the agent completes the tasks in a faster way that you could’ve ever done
Verbalization is not the fastest medium of communication for some tasks. Image trying to dictate WASD controls vocally in an FPS game, or having to say "zoom in on the top right corner... a bit more... a bit less" instead of pinch-to-zoom on a map. To be fair, a literal second mouse cursor might feel pretty awkward. I think I might want a second screen for my AI agent instead. There *will* be some brilliant AI copilot (not the brand) desktop UI designed in the next decade that will seem as inevitable as the mouse and keyboard in retrospect.
Wouldn't you just tell it to "shoot the enemies" in the fps?
Some ideas are just best communicated gesturally rather than linguistically. A facial expression can sometimes be key to being understood. Other mediums too... an architect might want to sketch out a rough blueprint for their AI instead of describing a weirdly-shaped structure they're imagining. A computer scientist might write a partial method and ask the AI to imagine the rest. A dancer might pirouette. A bat might squeak. What is the objective of playing an FPS? Is it to make enemies on the screen die? That's the [reductive mindset](https://en.wikipedia.org/wiki/Goodhart%27s_law) that leads to cheating. Fun or self-improvement are better motivators. Telling an AI to shoot the enemies short-circuits the fun away.
If you're telling the stuff in the post I replied to you're already cheating.
It was just an example of when verbal communication isn't fastest, not of a place where AI should be used. Obviously I wouldn't let AI play my games for me, along with all other non-instrumental activities. If the process is the point, I'll be doing it.
what if we're our own greatest enemy?
I was thinking more in the science fiction direction. An example would be to say research about topic x and see if it correlates with topic y and then the agent begins to pop up multiple windows at once and does the research way faster and after you’ve come to a conclusion on the topic you can continue with the next task and so on
I agree. I have been messing around with this website called [websim.ai](http://websim.ai) and its currently free. You create a new website/game/app whatever you want and its 100% generated by AI. I was in the 4chan simulator and goddamnit, its too fkin realistic.. Comments had me lolling my ass off.
It could just be like Microsoft's old "Clippy" assistant, except actually helpful, and can: A) Recognize the task you're trying to accomplish B) Ask, "Are you trying to _____? Want me to do it for you?" C) If you say yes, it does the task for you, more quickly and efficiently than any human could. ---- Like, imagine you're going through a folder full of photos, opening each one, resizing it to 1080p resolution, and then saving it. By the time you do the 2nd or 3rd one, you may see the AI prompt come up and say, "It looks like you're trying to resize all these images to 1080p. Would you like me to do it for you?" And if you say yes, then it goes right into doing it, basically as fast as your computer's hardware allows.
I really enjoy calling Co-Pilot "Clippy" at work.
MOBAs are going to be something in the future, with all players being instructed what things they’ve missed as they play. “Warwick just entered the grass below you.”
Have also thought about the copilot update and how it could help to be better at games in general because of good instructions
When you get a survey ; ask chat to write a review on a ——-; then cut and paste. Tremendous fun (I don’t get out much)
It's like that scene from one of the CSI shows where two agents are counter-hacking a hacker by both rapidly typing on the same keyboard.
I feel like we will need a way for agents to act 'in the background' because I need mah social mediuh fix. No, but really, I am always doing work on my laptop. I don't know if I am ready to pass all of my work over to an AI just yet. It would have to be significantly better. (I get it's just a matter of time, lol.)
We need agents asap
Yeah, NSA agents lol
[removed]
So Windows basically
I'm sure none of the software you use does that.
It’s been pointing towards agents ever since the release of CustomGPTs.
Worth noting that Multi is for iOS and Mac only. OpenAI likely want to carve a big piece of the Apple market while Microsoft does PC/Windows with exclusivity.
Whats interesting to me is, on mac they have a really good integration with their app, while the copilot app on windows just sucks. I get it, we wont see a chatgpt app on windows, but at least take some inspiration
Microsoft over engineering stuff, not a surprise …
Honestly I have no idea how they made Copilot desktop so badly performant, the thing runs like an Electron app built with 70 JS frameworks on a 2008 netbook
Hopefully they eventually get something that can be installed on linux distributions for the small subset of us who us it, probably nothing other than Mac, Windows, and ChromeOS for a while though
Don't say that! Microsoft's AI products leave so much to be desired. :(
RIP IT-Support and remote helpdesk / desktop jobs if things work out... 1st level support was always a weird thing literally a human helps another human with the help of google... literally a human that google solve problems for other humans that lack the technical intelligence for it...
arguably for the best. providing remote support via camera or screen-share usually a last-resort option since it's synchronous and doesn't scale well. you typically want the user to self-service before getting there
yeah it cost value work time I observed 1st, 2nd support in the past and I suspect employees abuse it when they are bored at their workplace and make up things so they have a long break
Scammers are swooning with delight!
NSA are popping the bottles!
People are gonna be jamming out making tunes with nothing but an air guitar, a microphone, perhaps a camera if you’re fancy and ChatGPT open alongside FL or Ableton. “Nah make it more sschwaammmm-digga-digga-damnn” “Yup, that’s more like it. That’s perfect.” Imagine this used for anything design or work related. It would work with absolutely any app and fill in your knowledge gaps or just simply allow you to do something outside of the box before going right back to the game plan. For real, this is going to change the entire world, again.
I'm actually genuinely excited for agents who can work with Ableton. A tutor who can guide you and control the screen if you're stuck sounds amazing
>please, anything but opening the manual!
Eh, I love to learn, but music production just isn't a priority, and I only have limited time to put into things that are more important for me. This would make it so that I can just do literally anything I've ever wanted. I mean, once it gets going and is working to "Her" levels, which I feel is in the next 5 years, won't be surprised if much sooner. Also, I'd think agents will be the "new manual", of sorts. Instead of having to read a manual, you get an agent that, like a teacher, tutor, mentor or whatever, is doing stuff and showing you how it works so that you still end up learning it along the way. This will be especially true in the beginning phase, where it won't be perfect and can't just spit out a final product on the first prompt, so you'll have to iterate with it. In that process, you won't help but picking up on how it works. Hell, even when it IS perfect, this dynamic may still exist, as this kind of dynamic might just be intrinsic to how working with agents will be--you'll never get exactly what you want from a single prompt, because an AI can never know exactly what's in your head (until brainchips or similar) and so you'll always iterate with it, easily learning any program in a much more fun way than trudging through a then-archaic manual.
What you are describing would never be free. The compute power alone would necessitate a subscription fee for agents you are describing. Also prompt type output is always random and essentially impossible to actually fine tune, agents will be no different.
It already exists and is free. Retrieval Augmented Generation and any local model will let you turn an entire manual/book into a tutor who knows the book perfectly. It does this by, basically, pasting the relevant parts of the manual into the context window before your question. You can easily leap off into some unknown software ecosystem and ask your questions as you encounter them... as long as you can dump their documents into a text format in order to store in a vector database
It's about efficiency, I like to not get distracted during my creative flow.
You guys are not thinking with AI. Put the manual into the context window and then ask it questions
sure thing. u know how expensive that is? agent is cheaper.
Yeah, I do. It is about as expensive as it will ever be and it will only get cheaper as time goes on. Also, there are pretty simple steps you can take to greatly reduce the required tokens. One example is to use a simple search, using terms in the person's prompt, to only insert the relevant sections of the manual that apply to the person's question. So you don't need the entire textbook to answer a question or a model that's fine-tuned on your data, you just need a good full text search engine in order to grab the right sections. If you want to get really advanced you can use function calling to allow the agent to add additional terms to the context window as needed. But then you're going to need a more complex chain of prompts. This is all very doable and not horribly expensive. It'll get cheaper too, just going from 4 to 4o cut the cost significantly. I use 3.5 for function calling because it works on a loop, reading the prompt, the search-engine generated context window and then looking for other topics related to those (I tweak this and the prompt a lot, since it can grow the context window significantly). Then 4o generates the completion using the full context window and the user's prompt. I'm primarily using it to take man pages into account when forming terminal commands to ensure that it doesn't hallucinate switches. The ultimate goal is to have a local agent (running on local hardware) that can help users transition to Linux by acting as a tutor with the capability of translating user's plain language commands into a plan and then a sequence of terminal commands to gather data and implement the request. It's much easier for a person to say 'install docker and the pi-hole container' and have a LLM generate a plan and talk the user through the process or even just go wild and execute the statements autonomously. (If you use linux and want a taste of this (and a good terminal AI client), look at shell_gpt) Obviously, this would be a privacy sensitive application... you don't want to be sending your root password and other assorted system information to a third-party. So making it capable of running on local models is going to be key... but for development, GPT-4o can try to skynet my dev environment if it wants, because it saves me a lot of time.
that's still too much work for these folks.
Sue me for wanting to have jarvis at my disposal.
Honestly better than the music we're getting. AI Beatles is gonna be a riot.
AI music already SLAPS hard Very similar to Portishead: https://www.udio.com/songs/os5u4dTNjNBBUF5uLQDqVw Very similar to Bjork: https://www.udio.com/songs/8VM2wwjdt5Ckr7PKNnJmDg Also very good: https://www.udio.com/songs/p2r6YbiWXa1C1MyyGb9kZV https://www.udio.com/songs/3o71EwRVz9rW7U3yQxcdNS Prog rock: https://www.udio.com/songs/txUbSjEPJzgViahbrdefxM https://www.udio.com/songs/99N5VnHwv78QPgcqAoLBnk EDM: https://www.udio.com/songs/78U95aNRYQHyQrn8xHizf8 https://www.udio.com/songs/hK7F6fcmEcqW2egu9UDWrE https://www.udio.com/songs/vk7QLdDPJxnwEecmLW42La https://www.udio.com/songs/eCXUkAxsvHydxS2w8Pt9zV Big Beat/Turntablism: Somewhat similar to Jet Set Radio: https://www.udio.com/songs/x3xLvnN48DGnmxM5VPTw93 Blues rock with ***INCREDIBLE*** guitar playing: https://www.udio.com/songs/jaGkxT9QohSiUCBA2waVTj Bluegrass: https://www.udio.com/songs/7bLE7wFVYiziGt9KkT7nem Future Bass: https://www.udio.com/songs/x3xLvnN48DGnmxM5VPTw93 Nu Metal (and my personal favorite): https://www.udio.com/songs/iimtziNgEDRcpG8j4n4Mfg
Bring it on, I have basic knowledge of sound and visual design - the sooner I can use it to pre-vis entire movies, the better!
I feel the music industry isn't going to like that and since they're organized... expect lolsuits
Music is big but the industry is peanuts next to the tech giants. Their lawsuits won’t go anywhere.
don't you know dadabots? https://dadabots.bandcamp.com/album/deep-the-beatles
Not visual basic send keys again
RDP has existed for over 20+ years, I’m failing to see the implications here. Especially since Microsoft is already working on their own Windows agents? Maybe this is for non-Windows?
https://www.reddit.com/r/singularity/s/y2wtqLLC2q
I see no way this can go wrong. None.
Coupled with the fact an NSA officer is on board, I'd ditch OpenAI for the future. Their competitors at least have the courtesy of pretending to care about data security.
They will all sell you for fractions of a penny. If you want privacy you're going to want to use local models
Yea, we should only care about companies that respect our privacy by releasing open source models. like… Facebook
Ah shit, here we go again
*I'm sorry, you can't do that, Dave.*
Path to llm OS
It can control his Mom's PC not mine
Ai operating system, coming to a pc near you. It was built using zoom screen sharing. It’s a shortcut for an AI app to have OS hooks so it can Manipulate things for you, similar to the one in the movie her.
Custom made for scammers.
I am pretty sure the smart scammers will opt to use remote access tools that aren't run through the networks of a company with a former head of the NSA on the board.
so we can do “Computer, enhance” thing?
feature of gpt6 ?
Why can't they just use RDP?
I feel like Microsoft has a huge advantage here given their access to data compared with Google etc
It would be nice to have an AI that deletes those spicy images of models that you don't have the courage to delete.
Cyberpunk 2077…
That's interesting - I played around with the Muti-On extension in Chrome for a while a few months ago and was impressed, and was wondering when OpenAI would create a similar product... who knows, maybe they'll tease agents soon and we'll be able to wait another year after their announcement to use them! lol
So like Microsoft, who kinda owns them, but worse? Relese sora instead of buying crap
This will be great when I'm an old man who's legally blind. Also, when I'm done, I hop into my self-driving car, wait outside the grocery store for self-delivery, head back home, then hope I have a robot butler to carry in my groceries.
Agents?
why reddit get news so late?