OpenAI’s artificial intelligence-powered chatbot ChatGPT is getting worse over time, and researchers don’t understand why.
on July 18 Study Researchers at Stanford and UC Berkeley found that ChatGPT’s latest models have become much less capable of accurately answering a similar series of questions within a few months.
The authors of the study could not give a clear answer as to why the capabilities of AI chatbots have deteriorated.
To test how reliable the different ChatGPT models were, three researchers, Lingjiao Chen, Matei Zaharia and James Xue, asked the ChatGPT-3.5 and ChatGPT-4 models to solve a series of math problems, answer sensitive questions, write new lines of code and conduct. Spatial reasoning from signs.
we rated #chatgptSubstantial differences were found in its behavior over time and the answers to *same questions* between the June and March versions of GPT4 and GPT3.5. Newer versions broke down in some functions. w/ Lingjiao Chen @matei_zaharia pic.twitter.com/FEiqrUVbg6
— James Zou (@james_y_zou) 19 July 2023
According to research, in March ChatGPT-4 was able to identify prime numbers with a 97.6% accuracy rate. In the same test conducted in June, the GPT-4’s accuracy had dropped to just 2.4%.
In contrast, the earlier GPT-3.5 model improved prime number recognition within the same time frame.
Connected: The SEC’s Gary Gensler believes AI can strengthen its enforcement regime
When it came to generating lines of new code, the capabilities of both models degraded substantially between March and June.
The study also found ChatGPT’s responses to sensitive questions – in some instances focusing on ethnicity and gender – subsequently became more terse in refusing to answer.
Earlier iterations of the chatbot provided extensive reasoning for why it could not answer some sensitive questions. However, in June the models apologized to the user and refused to respond.
“The behavior of a ‘same’ (large language model) service can change substantially in a relatively short period of time,” write the researchers, noting the need for continuous monitoring of AI model quality.
The researchers recommend users and companies that rely on LLM services as a component in their workflow implement some form of monitoring analytics to ensure that the chatbot’s speed is maintained.
On June 6, OpenAI unveiled plans to create a team that would help manage risks arising from a superintelligent AI system, something that is expected to arrive within this decade.
AI Eye: Trained AI Goes Crazy on AI Stuff, Is Threads the Loss Leader for AI Data?











