We should treat AIs like Human participants in psychological experiments

A lot of diverse and interesting perspectives have been recently discussed in regards to chatGPT and AGI, but there is one opinion that I found particularly relevant that I wanted to expand on.

Cyborg generated with DALL-E, a bit creepy in my opinion. I like the side cut tho.

A lot of diverse and interesting perspectives have been recently discussed in regards to chatGPT and AGI (artificial global intelligence), but there is one opinion that I found particularly relevant that I wanted to share and expand on.

In his recent interview with Lex Fridman, Eliezer Yudkowsky underlines the existential threat posed by current and future AIs, and laments about the fact that we don’t really know what is actually going on inside these giant “matrices of floating-point numbers”. He draws a parallel to neuroimaging, that enabled us to take leaps in the understanding of the brain, hoping for an alternative to be invented and applied to these AIs.

While such “cognitive imaging” techniques are yet to be developed to map out and understand how the capabilities of such AI models are implemented within their architecture, Michael C. Frank highlights the - at least equally important - need to first truly understand the extend of said abilities. What are these models actually capable of in terms of Human-like thinking (and, hopefully, answer the much harder question of whether they are endowed with true cognitive processes or merely pseudo-cognition). Frank proposes to apply experimental psychology methods and paradigms to them. In essence, whenever testing a particular “skill” of chatGPT (or other AI systems), a researcher should consider developing an actual scientific paradigm consisting of multiple trials/items (e.g., different prompt formulations) and participants (e.g., independent instances of the AI), a control condition, and a demonstration of the paradigm validity.

I agree that we must take AIs seriously and study them with the best methods available for complex systems like ourselves (“complex” at least from our intelligence level), and likely should strive at improving and generalize these methods. However, I would also argue that we psychologists might seriously need to consider including AI systems alongside Human participants in cognitive experiments. These systems will be able, in the very near future, to perform all kinds of tasks beyond language manipulation, such as perception or complex problem solving, thus opening the possibility of studies with one group of human participants, and one “group” of AI-based attempts. How would that help psychological science?

  1. It would help us understand the abilities of AI-systems in similar contexts and to highlight some intuitive comparisons with Humans
  2. If we show that AI cannot perform the task, well it is informative with regards to their abilities (previous point).
  3. If we show that AI can perform the task similarly to Humans (same response patterns), it does not mean that AI have Human-like intelligence, just that their algorithm (and training data) is able to encapsulate and imitate Human performance. This is interesting with regards to the debate of whether cognition, conscience and “Human-ness” is present within the vast amount of data on which we train AIs.
  4. If we show that AI performs differently to Humans, this helps us understand the logic and processes at stake under AI’s hood.
  5. In any case, publishing the results by one particular AI system at one particular moment in time will helps us to objectively monitor and track their performance as these systems improve over time.

Comparing Human performance to that of emerging AI-systems will be both beneficial to Human-oriented psychology, to understand the particularities and idiosyncrasies of Human-like cognition, and well as to AI-oriented cognitive science by approaching the issue of artificial intelligence with the seriousness and cautiousness it deserves.

EDIT (09/04/2023): François Chollet, expert in deep learning, underlines an important caveat when testing AIs (and especially LLM that are trained on written material existing on the internet): it is possible that the system has already seen and “learned” a given task. Thus, cross-validating any findings with diverse and new tasks is important.


Interested in doing research related to effects of reality and fiction? We are looking for research assistants and PhD students at the Reality Bending Lab (check-out the join us tab)!

Dominique Makowski
Dominique Makowski
Lecturer in Psychology

Trained as neuropsychologist and CBT psychotherapist, I am currently working as a lecturer at the University of Sussex, on the neuroscience of reality perception.