Meta’s new ‘Voicebox’ AI is a text-to-speech tool that learns like ChatGPT



meta ai recent Unveiled A “breakthrough” text-to-speech (TTS) generator it claims produces results 20 times faster than state-of-the-art artificial intelligence models with comparable performance.

The new system, dubbed Voicebox, eschews traditional TTS architecture in favor of a model similar to OpenAI’s ChatGPT or Google’s Bard.

The main difference between Voicebox and similar TTS models, such as ElevenLabs Prime Voice AI, is that the meta offered can be generalized through in-context learning.

Like ChatGPT or other Transformer models, Voicebox uses a large-scale training dataset. Previous attempts to use large chunks of audio data have resulted in severely garbled audio outputs. For this reason, most TTS systems use small, highly curated, labeled datasets.

Meta overcomes this limitation through a novel training scheme that eschews label and curation for an architecture capable of “in-filling” audio information.

As put out by Meta AI on June 16 blog postVoicebox is “the first model that can generalize to speech-generation tasks that it was not specifically trained to accomplish with state-of-the-art performance.”

This makes it possible for Voicebox to translate text to speech, remove unwanted noise by synthesizing replacement speech, and even apply the speaker’s voice to different language outputs.

According to an accompanying research paper published by Meta, its pre-trained voicebox system can accomplish All this using only the desired output text and a three second audio clip.

The arrival of strong speech generation comes at a particularly sensitive time as social media companies continue to struggle with moderation and the US presidential election threatens to test the limits of online misinformation detection once again.

Former US President Donald Trump, for example, is currently facing allegations that he improperly handled confidential government material after leaving office. amid alleged evidence Cited The case against him consists of audio recordings in which he has allegedly admitted to possible wrongdoing.

While there is currently no indication that the former president intends to disavow the material described in the audio files, his case suggests that data integrity resides at the core of the US legal system and, by extension, its democracy.

Voicebox isn’t the first tool of its kind, but it appears to be one of the most robust. As such, Meta has developed a tool to determine whether speech was generated by it, which the company claims can “trivially detect” the difference between real and simulated audio. According to the blog post:

“As with other powerful new AI innovations, we recognize that this technology brings the potential for misuse and unintended harm. In our paper, we detail how we built a highly effective classifier that can differentiate between authentic speech and audio generated with Voicebox to mitigate these potential future risks.

In the world of cryptocurrency, AI has become as integral to the day-to-day operations of most businesses as the internet or electricity. The largest exchanges rely on AI chatbots for customer interactions and sentiment analysis, and trading bots have become commonplace.

Connected: Bybit Plugs Into ChatGPT For AI-Powered Trading Tool

The advent of robust text-to-speech systems such as Voicebox, combined with automated trading, could help bridge a gap for cryptocurrency traders who rely on TTS systems, which are currently limited to crypto jargon or multilingual support. can struggle.