TikTok owner ByteDance’s “self-controlled memory system” can access a data bank of hundreds of dialogues and thousands of characters to give it any language model capabilities better than ChatGPT for answering questions about past events. bytedance
When you type things into the prompt of a generative artificial intelligence (AI) program like ChatGPT, the program gives you feedback based not only on what you’ve typed, but also on everything else that you’ve typed. Typed first.
You can think of that chat history as a kind of memory. But that’s not enough, according to researchers at several institutions, who are trying to equip generative AI with something like an organized memory that could enhance its output.
Too: How to use ChatGPT: Everything you need to know
A paper titled “Enhancing Language Models with Long-Term Memory” published this month by researcher Weizhi Wang of the University of California at Santa Barbara and colleagues at Microsoft, and Posted on arXiv pre-print serverAdds a new component to the language model.
The problem is that ChatGPT and similar programs can’t take in enough text to hold very long references to things at any one moment.
As Wang and team observe, “the input length limitation of existing LLMs prevents them from being generalized to real-world scenarios where the ability to process long-form information beyond a session of a certain size is critical.”
For example, OpenAI’s GPT-3 takes an input of up to 2,000 tokens, meanings, characters, or words. You can’t feed the program a 5,000-word article, say, or a 70,000-word novel.
Too: This New Technology Could Blow Up GPT-4 and Everything Like It
It is possible to keep expanding the input “window”, but this turns into a complex computing problem. Attention operations – an essential tool of all large language programs, including ChatGPT and GPT-4 – have “quadratic” computational complexity (see “time complexity“of computing). That complexity means that the time it takes for ChatGPT to respond grows as the square of the amount of data fed as input. The computation required increases as the window grows.
And so some scholars, Wang and team say, have already tried to come up with a raw memory. Yuhui Wu and Google co-workers introduced last year What they call the Memorizing Transformer, which stores a copy of previous answers that it may receive in the future. This process allows it to work on 65,000 tokens at a time.
But Wang and team believe the data may be “out of date”. The process of training a memory transformer causes certain things in memory to become out of sync with the neural network as its neural weights, or parameters, are updated.
Wang and team’s solution, called the “enhanced language model with long-term memory,” or LONGMEM, uses a traditional large language model that does two things. As it examines the input, it stores some of it in a memory bank. It also sends the output of each current prompt to another neural network, called a sidenet.
Too: how i cheated chatgpt lied to me
SideNet, which is also a language model like the first network, is tasked with comparing the current prompt typed by a person to the contents of memory to see if there is a relevant match. Unlike Memory Transformer, SideNet can be trained on its own in addition to the main language model. In this way, it gets better and better at picking up content in memory that won’t go stale.
Wang and team ran tests to compare LongMem to both Memorizing Transformer and OpenAI’s GPT-2 language model. They also compare LongMEM to results reported in the literature for other language models, including the 175-billion parameter GPT-3.
They use tasks based on three datasets that contain summaries of very long texts, including entire articles and textbooks: Project Gutenberg, the arXiv file server, and ChapterBreak.
To give you an idea of the scale of those works, Chapterbreak, was introduced last year By Simeng Sun and colleagues at the University of Massachusetts Amherst, complete books are taken and a language model is tested to see whether, given a chapter as input, it can accurately predict many candidate passages. Can recognize which one is the beginning of the next chapter. Such a task “requires a rich understanding of long-range dependencies”, such as changes in the space and time of events, and techniques including “analepsis”, where, “the next chapter opens at an earlier point in the narrative”. Flashback.”
Too: AI is more likely to cause world destruction than climate change, according to an AI expert
And this involves processing tens or even hundreds of thousands of tokens.
When Sun and team ran those Chapterbreak tests, as they reported last year, the dominant language models were “struggling.” For example, the larger GPT-3 was correct only 28% of the time.
But the LongMEM program “surprisingly” beat all standard language models, Wang and team report, including GPT-3, which provided a state-of-the-art score of 40.5%, despite the fact that LongMEM only had 600 million characters. The neural parameters are , much less than the 175 billion of GPT-3.
Wang and team wrote, “The substantial improvement in these datasets shows that LONGMEM can understand past long-references in cached memory to perform language modeling well enough for future inputs.”
Microsoft’s work matches recent research from ByteDance, the parent company of social media app TikTok.
in a paper Posted on arXiv in AprilByteDance researcher Xinian Liang and colleagues have developed an add-on program titled “Unleashing Infinite-Length Input Capacity for Large-Scale Language Models with Self-Controlled Memory Systems” that can convert any large-scale language model. Gives the ability to store very long sequences. The thing talked about.
Too: MongoDB CTO says AI will change software development in a big way
In practice, he argues, the program could dramatically improve the program’s ability to put each new prompt into context and thus make appropriate statements in response—even better than ChatGPT.
In a “self-controlled memory system”, or SCM as it is called, the input typed by the user at the prompt is evaluated by the memory controller to see whether it should be dumped into an archival memory system called a memory stream. is required, which includes all previous interactions between the user and the program. It’s like Wang and team’s SideNet and the accompanying memory bank.
If memory is needed, the collection of previous inputs is accessed through a vector database tools such as pinecone, The user’s input is a query, and it is matched for relevance with what is in the database.
Some user queries do not require memory, such as “tell me a joke”, which is a random request that any language model can handle. But one user pointed out, “Do you remember the conclusion we made last week on fitness diets?” This is something that requires access to previous chat content.
In a neat twist, the user prompt, and the memory it retrieves, are combined in what the paper calls “input fusion” – and it is this combined text that becomes the actual input to the language model on which it relies. generates its own response.
Too: This New AI System Can Read Minds With Accuracy About Half The Time
The end result is that SCM can top ChatGPT in tasks that involve context of the first hundreds of turns in a dialogue, Liang and team write. They linked their SCM to a version called GPT-3 text-davinci-003And tested how it performed with the same inputs compared to ChatGPT.
In a series of more than 100 turns, consisting of 4,000 tokens, when the human prompted the machine to recall the hobby of the person discussed at the beginning of the session, “the SCM system responded to the query, demonstrating extraordinary memory.” provides accurate answers—increased capabilities,” they write, whereas, “in contrast, it appears that ChatGPT was distracted by a considerable amount of irrelevant historical data.”
It can also produce summaries of thousands of words for long texts such as work reports. It does this by recursively summarizing the text, which means storing the first summary in a memory stream, and then combining the previous summary with the next summarizing, and so on.
SCM can also create large language models that are not chat bots and behave like chat bots. “Experimental results show that our SCM system enables LLMs, which are not optimized for multi-turn dialogue, to achieve multi-turn dialogue capabilities that are comparable to ChatGPT,” they write.
The work of both Microsoft and TikTok can be considered an extension of the original intent of the language model. Before ChatGPT and its predecessor, Google’s Transformer, natural language tasks were often called recurrent neural networks, or RNNs. A recurrent neural network is a type of algorithm that can go back to earlier input data to compare it with the current input.
Too: GPT-4: A new ability to give illegal advice and exhibit ‘risky emergent behavior’
LLMs like Transformer and ChatGPT replace RNN with a simple approach – attention -. Attention automatically compares everything typed with everything typed before, so that the past is always brought into play.
Therefore, the Microsoft and TikTok research work only focuses on algorithms that are explicitly designed to recall elements of the past in a more organized manner.
The addition of memory is such a basic adjustment, it is likely to become a standard aspect of larger language models in the future, making it more common for programs to be able to refer to previous content, such as chat history, or to be able to address memory. Complete text of very long works.











