AI Research: ChatGPT: Jack of all trades, master of none

Information Fusion, Volume 99, November 2023

About:

One of the first articles in the world aimed at exploring the capabilities of a relatively new tool – ChatGPT. In 2023, this work was included in the list of the 100 most-cited publications in the field of artificial intelligence

Our study evaluated 25 tasks using over 48,000 prompts. Even at that stage, we observed ChatGPT’s potential in context awareness and personalization—features that have since proven to be crucial. Although ChatGPT and GPT-4 still lagged behind the state-of-the-art (SOTA) methods, especially on more challenging reasoning tasks, our observations were a precursor to what has now become evident: these models have the potential to accelerate AI development and profoundly impact our daily lives.

AI, ChatGPT looking at the woman.

Abstract:

OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach
in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies aremostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT’s capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective
reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated
GPT-4 model on five selected subsets of NLP tasks.

We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-
shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the
rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool’s usefulness to society and how the learning and validation procedures for such systems should be established

Conclusions:

Based on ChatGPT’s responses to 48k+ prompts related to 25 different NLP tasks, we can conclude that ChatGPT can solve most of the problems considered quite well. On the other hand, it loses to the best models currently available (SOTA), from 4 to over 70%. Its loss is relatively greater for more difficult and pragmatic tasks, especially when evaluating emotional texts. All this makes ChatGPT a master of none of the task. However, it is still an open question what would happen if ChatGPT was finetuned using the datasets from these tasks, and what the results would look like then. At the moment it is not possible to perform such a study, but it would be worthwhile to do so as soon as it is possible.

The context awareness and ability to implement Contextual Few-Shot Personalization proposed in this paper are valuable features of ChatGPT. It also provides a unique self-explanation capability that facilitates human understanding and adaptation to the expected outcome. We plan to develop and systematize the qualitative analysis of the model’s performance on subjective tasks (primarily emotion recognition), e.g., by comparing ChatGPT responses with the estimated annotation controversy for texts and dimensions. We strongly believe that ChatGPT can accelerate the development of various AI-related technologies and profoundly change our daily lives.

Scroll to Top