Mechanistic-Interpretability

Re-Engineering the Concept of Understanding for AI

With Pierre Beckmann.

Argues that the concept of understanding needs to be re-engineered for artificial cognition in a way that is empirically informed by mechanistic interpretability research and theoretically informed by a grasp of the functions of the concept.

AI, conceptual engineering, mechanistic interpretability, understanding, conceptual change, functions

PDF coming soon

Mechanistic Indicators of Understanding in Large Language Models

Philosophical Studies. With Pierre Beckmann. doi:10.48550/arXiv.2507.08017

Draws on detailed technical evidence from research on mechanistic interpretability (MI) to argue that while LLMs differ profoundly from human cognition, they do more than tally up word co-occurrences: they form internal structures that are fruitfully compared to different forms of human understanding, such as conceptual, factual, and principled understanding. We synthesize MI’s most relevant findings to date while embedding them within an integrative theoretical framework for thinking about understanding in LLMs. As the phenomenon of “parallel mechanisms” shows, however, the differences between LLMs and human cognition are as philosophically fruitful to consider as the similarities.

explainable AI, LLM, mechanistic interpretability, philosophy of AI, understanding, conceptual change

Download PDF

Mechanistic-Interpretability 2

Re-Engineering the Concept of Understanding for AI

Mechanistic Indicators of Understanding in Large Language Models

Mechanistic-Interpretability ²