Is ChatGPT Losing its Edge?

A couple of days ago, I stumbled upon the ChatGPT subreddit, and one post that piqued my interest read, “I use ChatGPT for hours every day and can say 100% it’s been nerfed over the last month or so.” Intrigued, I couldn’t resist delving into the rabbit hole of the comment section filled with lovers and skeptics of AI’s capabilities. Users passionately debated whether the recent changes to ChatGPT’s performance were intentional or if it was simply a matter of perception. Amidst the fervent discussions, one thing was certain its impact on people’s lives is undeniable, igniting curiosity, controversy, and a quest for deeper understanding. As the AI landscape continues to evolve, conversations like these remind us of the profound influence AI technologies have on our daily interactions and the fascinating journey of human-machine collaboration. 

 

In this article, I am going to explore the following: 

What is Actually Happening with ChatGPT? 

 

The underlying reason for these changes lies in the frequent updates OpenAI is implementing to its renowned product. These updates are sometimes seen as radical redesigns, driven by continuous attempts at jailbreaking ChatGPT and the surge of legal actions, FTC (Federal Trade Commission) inquiries, and a decline in user engagement. Although some users have experienced faster response generation times, many have observed a notable decrease in the quality of responses, leading to concerns about the overall performance. 

 

The use of GPT-3.5 and GPT-4 has become widespread, with GPT-4 being updatable based on data, user feedback, and design changes. However, the lack of transparency in the update process raises concerns about stable integration into workflows and result reproducibility. Moreover, it is essential to understand whether updates aimed at improving certain aspects might inadvertently reduce capabilities in other dimensions.  

 

To address these questions, Stanford conducted an evaluation of GPT-3.5 and GPT-4 in March and June 2023. The research focused on evaluating its performance on eight LLM tasks commonly used in performance and safety benchmarks. These tasks include solving math problems (with two problem types), answering sensitive questions, responding to Opinion surveys, LangChain Agent, code generation, taking the USMLE medical exam, and visual reasoning. 

 

One of the noteworthy findings from the research was that ChatGPT’s accuracy in identifying prime numbers drastically declined from 97.6% in March to a shocking 2.4% in June. Similarly, in code generation, there were significantly more formatting mistakes observed in recent months compared to earlier this year. These results shed light on the varying performance of ChatGPT on specific tasks over time and highlight the need for continuous monitoring and improvement. 

 

This evaluation of GPT-3.5 and GPT-4 highlights the evolving nature of AI language models like ChatGPT. While the models show promise, their performance on specific tasks can vary over time, necessitating continuous monitoring and improvement. Transparency in the update process and proactive management of trade-offs are essential to optimize ChatGPT’s performance in the journey towards revolutionizing enterprise operations and customer experiences. 

What Should Enterprises be Wary of?

 

Effective automation of intricate processes within an enterprise with limited data sets necessitates specialized AI tools designed to interact with internal systems, manage information through APIs, and execute actions based on collected data. However, it’s important to acknowledge the limitations of ChatGPT in this context. While ChatGPT can be valuable for tasks such as content creation and software development, its capacity for automating enterprise processes is constrained, particularly concerning data security compliances and questionable accuracy of results. 

 

ChatGPT has been known to produce ‘hallucinations’, which are incorrect or misleading statements 

 

These hallucinations can be caused by several factors, including the size and quality of the training dataset, the optimization process used during training, and the input context. The hallucination problem can have a number of negative impacts on enterprises that use ChatGPT. For example, a study by the University of California, Berkeley found that businesses that use ChatGPT to generate marketing copy are more likely to be sued for false advertising. Additionally, a study by the Stanford University School of Medicine found that businesses that use ChatGPT to make decisions about medical treatments are more likely to make mistakes that could harm patients. 

 

ChatGPT ushers in a host of security concerns that necessitate stringent precautions 

 

ChatGPT’s ability to handle sensitive data and its propensity to generate contextually coherent responses underscore the need for comprehensive security measures. One significant apprehension involves the potential disclosure of confidential information through unintended prompts. To mitigate these concerns, enterprises must adopt a multi-faceted approach. First and foremost, robust encryption protocols must be employed to safeguard data both at rest and in transit. Role-based access controls are imperative to limit system access only to authorized personnel. Additionally, regular security audits, vulnerability assessments, and penetration testing can proactively identify and address potential weaknesses.  

 

The adoption of ChatGPT and other large language models is subject to strict scrutiny by businesses and regulators 
 

Studies suggest that “efforts to regulate AI appear to be gathering pace,” the World Economic Forum stated. Data from Stanford University’s 2023 AI Index shows that 37 bills related to AI were passed into law throughout the world in 2022. Even India is now contemplating the implementation of a regulatory structure for artificial intelligence (AI) technology and tools such as ChatGPT. IT Minister Ashwini Vaishnaw recently announced that the Indian government is evaluating the establishment of an AI regulatory framework encompassing aspects like algorithmic bias and copyright concerns. He suggested that AI regulations will likely take a cooperative global approach, like efforts seen in the EU and China, indicating the widespread significance of establishing a comprehensive AI regulatory framework.

 

Even the independent and nonpartisan organizations like the Center for AI and Digital Policy (CAIDP) are stepping into this arena, advocating for responsible and secure AI development. CAIDP’s recent action exemplifies this commitment—the organization had submitted a formal complaint to the Federal Trade Commission (FTC), urging an investigation into OpenAI’s GPT models. CAIDP’s objective is clear: to ensure that necessary safety measures are in place before the release of such models. This call aligns with both the FTC’s established AI product guidelines and the evolving global standards for AI governance.  

 

As the AI landscape gains momentum, organizations like CAIDP underscore the imperative need for robust AI governance. This further accentuates the challenges enterprises face, as highlighted earlier, in integrating such complex AI systems, like ChatGPT, into their operations while navigating intricate regulatory challenges. What does this mean for enterprises? Finding trustworthy vendors, or in all likelihood certified certifiers, will be essential to not only complying with the new laws, but also signaling to consumers that they are as safe as possible.

Conclusion 

 

ChatGPT, a formidable language model trained on billions of parameters, reveals its limitations when applied to small datasets. Consequently, its capacities are confined when striving to extend intelligent process automation beyond conversational AI. In contrast, AI co-workers equipped with Cognitive Process Automation (CPA) capabilities leverage AI models to comprehend specific enterprise datasets. They adeptly manage information from assorted structured and unstructured sources, make instant decisions, and execute tasks seamlessly. 

Make your Enterprise Intelligent with E42! 

 

E42 is a no-code Cognitive Process Automation (CPA) platform to create AI co-workers that automate business processes across functions at scale. Each AI co-worker can be customized with specific features to address particular problem areas in any industry or vertical. At the core of every AI co-worker’s configuration is the ability to think like humans, understand user sentiments, take action based on those sentiments, and learn from every interaction. AI co-workers can be used independently to automate specific processes. Or they can be combined to provide process-agnostic automation to an enterprise. To start your automation journey, get in touch with us at interact@e42.ai today!

Cognitive automation platform

At E42, creating a safe and healthy working environment takes precedence above all. The company has zero tolerance for prejudice, gender bias, and sexual harassment. For a comprehensive overview of our safety policy, please feel free to contact us at interact@e42.ai