SAIL Workshop: Fundamental Limits of Large Language Models

Organizers: Özge Alacam, Benjamin Paaßen, Michiel Straat

On November 23, 2023, the SAIL network invited international experts to the workshop “Fundamental Limits of Large Language Models” (LLMs). In a time of anecdotal reports on both impressive capabilities and failures of LLMs, rigorous and systematic research into the underlying laws governing LLMs is sorely needed. The workshop brought interdisciplinary expertise from linguistics, computer science, and cognitive science together and presented the state-of-the-art in theoretical and empirical research.

Leonie Weissweiler (Photo by Adia Khalid)

Leonie Weissweiler (LMU Munich) investigated the (lack of) ability of LLMs to recognize linguistic constructions. Constructions are an alternative to traditional syntactic grammars in linguistics, intended to better capture edge cases/unusual grammatical phenomena in natural language. However, it is challenging to investigate such constructions systematically as the number of well-annotated example cases is low and typical examples tend to be already contained in the training data of LLMs (“training data contamination”). Therefore, Weissweiler and colleagues are arguing for developing new datasets and methodologies that are capable of answering the fundamental research questions in the field.

  • Weissweiler, He, Otani, Mortensen, Levin, and Schütze (2023). Construction Grammar Provides Unique Insight into Neural Language Models. CxGs + NLP Workshop, GURT 2023. https://arxiv.org/abs/2302.02178
  • Weissweiler, Hofmann, Kantharuban, …, and Mortensen (2023). Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model. Empirical Methods in Natural Language Processing. https://arxiv.org/abs/2310.15113
  • Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze (2022). The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative. Empirical Methods in Natural Language Processing. https://arxiv.org/abs/2210.13181

William Merrill (Photo by Adia Khalid)

Link to the video recording

William Merrill (NYU, USA) analyzed the computational capabilities of transformers with the tools of theoretical computer science. Merrill and colleagues could show that transformers (under reasonable assumptions) can be simulated by constant-depth logspace-uniform threshold circuits, which means that they are severely limited in executing even simple computational tasks (such as connectivity tasks in graphs or checking linear inequalities). More broadly, they argue that these limitations apply to all machine learning models which are highly parallelizable (such as transformers), meaning that we face a fundamental tradeoff: Either we achieve models that are efficient to train on very large data sets (such as transformers) or we achieve models that are computationally powerful (expressive) – but not both.

Panel Session (Photo by Adia Khalid)

Leonie Weissweiler, William Merrill, Prof. Philipp Cimiano and Prof. Silke Schwandt participated in a panel discussion on the fundamental and practical limitations of LLMs – as well as possible research directions in the future. Silke Schwandthighlighted that current LLMs are unsuitable for research on historical language samples. However, training new models on such data is challenging due to the sparsity of such data. As such, different methods using smaller or more specialized models are required for many tasks in digital humanities. Leonie Weissweiler emphasized the lack of systematic linguistic research into transformers and the challenges in acquiring sufficient computing time and power to perform such research. William Merrill pointed out that theoretical computer science on transformers is still in its initial phase and might grow to its own research field – similar to the theoretical investigation of recurrent neural networks in the past. On the issue of the lack of (explicit) knowledge in LLMs, Philipp Cimiano argued that, rather than injecting knowledge into LLMs, the more promising research direction is to treat LLMs as an interface technology that translates human natural language queries into formal queries for semantic knowledge bases and the formal answer of such systems back into natural language.

Iris van Rooij (Photo by Adia Khalid)

Link to the video recording

Prof. Iris van Rooij (Radboud University, Netherlands & Aarhus University, Denmark) re-introduced the notion of artificial intelligence as part of cognitive science and argued that, rather than trying to build models that mimic human intelligence (which they dubbed “Makeism”), one should treat computational models of human cognition as theoretical tools or formal hypotheses. Prof. van Rooij and colleagues underlined this view by presenting a theorem showing that, without additional assumptions on the class of models, learning a model of human cognition (or approximations thereof) from example data is NP-hard.

Nouha Dziri (Photo by Adia Khalid)

Link to the video recording

Nouha Dziri presented a series of investigations into the ability of LLMs to perform compositional computational tasks, in particular multiplication. With extensive experiments, Dziri and colleagues showed that even the most advanced LLMs fail to reliably multiply four-digit numbers, but do tend to get the first and last digits right. They could explain these findings by analyzing the computational graph required to execute such multiplications: while middle digits require traversing deep paths through the graph (which transformers in general and LLMs in particular fail to do), the first and last digits can be found via shortcuts through the graph and are thus achievable for LLMs.

Overall, the works presented at the workshop cover a diversity of interdisciplinary methods and angles to investigate the fundamental limits of large language models and provide deeper insight why LLMs achieve some tasks and fail at others. This research provides crucial orientation for researchers and practitioners across fields and we are excited to see this research continue. 

Due to the interdisciplinary nature of the domain, we believe that bringing researchers from different sub-disciplines together yields fruitful discussions and stimulates collaborations.