Tuesday, December 10, 2024

Latest Posts

OpenAI Rolls Out ChatGPT Voice and Picture Processing Options


In its most vital replace because the launch of GPT-4, OpenAI is including voice and picture processing capabilities to ChatGPT. Quickly ChatGPT Plus and Enterprise subscribers will be capable to present the AI mannequin photos and even have whole voice conversations.

The options can be rolling out over the subsequent 2 weeks for Plus and Enterprise subscribers and shortly after for everybody else. Voice options can be restricted to the iOS and Android ChatGPT app however picture processing can be rolled out on all platforms from the start.

Whereas the addition of picture processing is probably going a significant enchancment for some use instances of ChatGPT, the brand new voice chat options are an much more thrilling addition. You’ll quickly be capable to have a full spoken dialog with among the best AI fashions on the planet.

It is possible for you to to report audio prompts on the press of a button (or enter textual content prompts) and ChatGPT will communicate again utilizing one in all a handful of distinctive voices.

Some onlookers are skeptical, anticipating ChatGPT’s voice performance to sound robotic and unnatural. The critics might show to be flawed, judging by the few voice snippets that OpenAI supplied in its weblog publish asserting the options.

The publish included a clip of somebody asking ChatGPT to learn them a bedtime story and the AI mannequin responded inside seconds with an virtually solely real looking voice. These paying shut consideration can inform that it’s not an actual particular person because of oddities like small pronunciation errors however it sounds remarkably good.

The weblog publish included samples of 5 totally different AI voices (together with each female and male) studying 5 totally different textual content samples they usually all sound nice. OpenAI talked about that it partnered with established voice actors to craft these voices utilizing Whisper, its very personal speech recognition mannequin.

It’s vital to notice that the samples might have been cherry-picked and that the ultimate characteristic gained’t often sound pretty much as good. ChatGPT Plus and Enterprise subscribers will quickly be capable to check it themselves.

OpenAI additionally introduced that it partnered with the king of music streaming, Spotify, to construct a brand new characteristic that can translate podcasts into different languages utilizing the identical voice expertise. It’s presently solely obtainable for a couple of giant podcasts from podcasters like Dax Shepard, Lex Fridman, and Monica Padman however extra can be added quickly.

OpenAI and Spotify seemingly determined to solely make this characteristic obtainable to giant podcasts to keep away from giving voices to probably dangerous content material.

OpenAI Seems to be to Impress With Picture Processing Capabilities

One pioneering new characteristic apparently wasn’t sufficient for OpenAI. ChatGPT’s new picture processing capabilities look fairly spectacular too. The instance that OpenAI gave in its weblog publish confirmed somebody utilizing the app to show him learn how to decrease his bicycle seat.

The pattern video confirmed that customers will be capable to snap a fast image and draw on it to inform the mannequin what to concentrate on. As soon as they’re pleased with the image they’ll add a textual content immediate to ask the mannequin one thing concerning the picture. You may also add a number of photos to a message.

The pattern video confirmed the person going forwards and backwards with ChatGPT, sending photos of elements of the bike to provide the mannequin sufficient data to assist him. With only a few photos of the bike, the handbook, and the person’s toolbox it was capable of stroll them via the method, even together with telling them what software they wanted to make use of.

chatgpt image processing promptschatgpt image processing prompts
Screenshot of OpenAI’s pattern video

After all, this explicit session was picked particularly for the weblog publish announcement and ChatGPT might not be as useful for a lot of prompts.

Will These New Options Be Secure?

Each time a pioneering tech firm like OpenAI releases probably world-changing merchandise or options, we’ve got to ask: is that this protected? OpenAI clearly thought of these dangers intently and tailor-made the implementation of its spectacular voice tech to keep away from them for essentially the most half.

Releasing a public model of the bottom voice mannequin that may generate a voice from a couple of samples may very well be fairly harmful. Unhealthy actors might attempt to use it to impersonate public figures, maybe combining the voice with an identical ‘deep faux’ video.

OpenAI prevented this fully by proscribing the characteristic to some set voices that couldn’t be used to impersonate public figures.

The AI agency clearly made positive that its picture processing characteristic was as protected as doable too. It claimed that it examined the mannequin with crimson teamers (also referred to as white hat hackers), cybersecurity professionals that attempt to assault the mannequin to seek out its vulnerabilities, in addition to many alpha testers.

Curiously, the weblog publish talked about that it restricted the mannequin from analyzing and making direct statements about individuals in photos as a result of “ChatGPT will not be at all times correct and these programs ought to respect people’ privateness.” Maybe it was inadvertently insulting individuals within the photos it obtained.

The standard of the voice and picture processing options in addition to their potential flaws will quickly be explored by thousands and thousands of curious AI fans. If they’re as highly effective because the weblog publish makes them out to be, they could have an amazing affect on AI and the world basically.

Latest Posts

Don't Miss

Stay in touch

To be updated with all the latest news, offers and special announcements.