Representational Alignment

We may live in the same world but do we represent it in the same way? If not, then how do we still manage to effectively communicate and cooperate in a system about which we fundamentally disagree?

From Plato's Sophist to contemporary studies comparing LLMs to human brains, the study of diverging representations has fascinated researchers for millenia and continues to be an active area of research in neuroscience, cognitive science, and machine learning.

In this session, we will discuss how we can measure and manipulate the representational alignment of (both biological and artificial) intelligent entities (e.g. humans and neural networks). We will also explore the implications of representational (mis)alignment between intelligent entities on their ability to communicate, cooperate, and compete.

Session Chair

Dr Ilia Sucholutsky (Princeton University)

How and why we should study representational alignment

Invited Talks

Professor Bradley Love (UCL)

Aligning embedding spaces for model evaluation and learning

Professor Iris Groen (University of Amsterdam)

Are DNNs representationally aligned with human scene-selective cortex? Elucidating the influence of image dataset, network training and cognitive task demands

Contributed Talks

Professor Mayank Kejriwal (University of Southern California)

On using Fodor's theory of modularity for situating large language models within a larger artificial general intelligence architecture

Dr Andreea Bobu (Boston Dynamics AI Institute)

Aligning Robot and Human Representations

Invited Talks

UCL

Aligning embedding spaces for model evaluation and learning

For decades, psychologists and neuroscientists have used embeddings to characterize human knowledge. Images, text, activity patterns, and brain measures can all be captured in embedding spaces. First, I will consider some virtues and pitfalls of aligning embedding spaces to characterize representations. Second, I will consider whether people, and in particular children, learn by aligning embeddings spaces across modalities in an unsupervised fashion.

University of Amsterdam

Are DNNs representationally aligned with human scene-selective cortex?

Elucidating the influence of image dataset, network training and cognitive task demands

Identifying the neural computations that enable our brains to represent our visual environments is a major goal of cognitive neuroscience. In recent years, the unprecedented ability of deep neural networks (DNNs) to predict neural responses to visual inputs in human cortex has led to a great deal of excitement about these models’ potential to capture human mental representations of the outside world. However, many studies showing representational alignment of visual DNNs with humans use brain responses to (isolated) object stimuli, with human participants typically performing passive viewing tasks. Real-life visual perception requires processing of complex scenes containing a multitude of objects and high-level semantics, which humans can dynamically represent depending on task demands. In this talk, I will zoom in on visual cortex regions known as ‘scene-selective’ to discuss to what extent contemporary DNNs used for object and scene classification can be said to be representationally aligned with human neural and behavioral scene representations. In particular, I will highlight how image data set, neural network training, and cognitive task demands affect DNN’s representational alignment with human scene perception.

Contributed Talks

USC

On using Fodor's theory of modularity for situating large language models within a larger artificial general intelligence architecture

Mental modules have been of interest in the philosophy of cognition since the early 1980's with the release of Fodor's seminar work on the subject. More recently, 'post-Fodorian' theorists have even hypothesized that Fodor's modularity may not have gone far enough and that even higher-level cognitive processes in the mind (such as decision making) may be informationally encapsulated enough to be considered modular. With recent advent of large language models (LLMs) and generative AI, such as ChatGPT and Bard, there is a new line of research seeking to explore the ‘cognitive’ properties of such models e.g., does an (specific) LLM have theory of mind, and is it conscious? Unfortunately, much of the thinking on this topic, especially by experts in natural language processing and AI, has ad-hoc rather than theoretically grounded.

In this talk, I suggest that Fodor’s general framework offers us a systematic and unbiased way of exploring such properties in LLMs. I propose that an LLM should be conceived, not only as a neuronal black box, but also as a complex ‘network’ of interacting cognitive modules. Recovering the structure of this network, the identity of individual modules, their properties, and situating them comparatively to human cognitive modules, suggests a bolder, less reductionist way of conducting such studies. I also propose a radical vision for the longer-term future, relying on computational definitions of consciousness, that aims to measure (hypothetical) consciousness in LLMs as an emergent property of the complex network itself.

Boston Dynamics AI Institute

Aligning Robot and Human Representations

To perform tasks that humans want in the world, robots rely on a representation of salient task features; for example, to hand me a cup of coffee, the robot considers features like efficiency and cup orientation in its behavior. Prior methods try to learn both a representation and a downstream task jointly from data sets of human behavior, but this unfortunately picks up on spurious correlations and results in behaviors that do not generalize. In my view, what’s holding us back from successful human-robot interaction is that human and robot representations are often misaligned: for example, our lab’s assistive robot moved a cup inches away from my face -- which is technically collision-free behavior -- because it lacked an understanding of personal space. Instead of treating people as static data sources, my key insight is that robots must engage with humans in an interactive process for finding a shared representation for more efficient, transparent, and seamless downstream learning.

In this talk, I focus on a divide and conquer approach: explicitly focus human input on teaching robots good representations before using them for learning downstream tasks. This means that instead of relying on inputs designed to teach the representation implicitly, we have the opportunity to design human input that is explicitly targeted at teaching the representation and can do so efficiently. I introduce a new type of representation-specific input that lets the human teach new features, I enable robots to reason about the uncertainty in their current representation and automatically detect misalignment, and I propose a novel human behavior model to learn robust behaviors on top of human-aligned representations. By explicitly tackling representation alignment, I believe we can ultimately achieve seamless interaction with humans where each agent truly grasps why the other behaves the way they do.