Tracing Knowledge: From Metrics to Meaning

Takeaways from the TU Berlin Workshop on LLMs for the History, Philosophy, and Sociology of Science (April 2025)

May 07, 2025

purple yellow and pink wall graffiti — Photo by Mateo Krossler on Unsplash

Understanding the evolution of concepts and language has been a long-lasting problem for researchers in the humanities. How do new concepts emerge? Which factors develop this phenomenon? Is this kickstarted by new knowledge, perhaps a scientific discovery? Answering these questions requires defining what is a concept in the first place, but contrary to popular belief, a concept is not a word.

Concepts are both processes and their product.

Main approaches to answer these questions come from philosophy and linguistics. From Saussure’s dualism of signifiers and signified to Hegel’s developmental conception of a concept, it is clear that concepts are not object representations alone. Concepts are both processes and their product.

But what happens when there are too many interacting products, and the processes are so complex they would take lifetimes to unravel by hand? Studying the evolution of concepts amidst today’s exploding numbers of scientific publications is challenging but necessary to further understand the dynamics of science, namely how knowledge develops and why paradigm shifts occur.

Interested in addressing these challenges with Large Language Models (LLMs), scientists from different backgrounds joined the LLMs for the History, Philosophy, and Sociology of Science workshop at TU Berlin.

Linguistics is not enough

A key topic of the workshop was how modeling semantics with text data is insufficient. The word embeddings produced by LLMs are based on probability distributions that consider word frequencies in the context of its neighbouring words. That is, given the word United, what is the likelihood of it being preceded or followed by other words, such as Nations or States? This is also known as distributional semantics.

Although useful, current frameworks are arguably reductionist since “real world meaning“ is not based on distributional semantics. Meaning is built of/by/for objects, their interpretations and representations, and this dynamism is overlooked by approaches based only on text data.

The real challenge then lies in modeling the process of inquiry—understanding how abstract ideas evolve into well-defined concepts. During Prof. Gerd Graßhoff’s talk, he suggested that achieving true conceptual understanding is only a matter of time. His vision involves teaching computational models philosophy by integrating additional components—components responsible for generating reasons, constructing arguments, and evaluating evidence for or against the truth of a proposition or action. This he called “computational epistemology.”

Blind train (leader)boarding

Another key point raised during the workshop was how choosing a model is difficult. There are lots of benchmarks, and without enough AI literacy one can hardly understand how a model’s performance is being compared, let alone decide which model to use for a problem related to the History, Philosophy and Sociology of Science (HPSS).

Moreover, regardless of their high rankings on NLP leaderboards, researchers in HPSS may decide not to use LLMs since their results are often not interpretable. The social implications of blindly applying top-performing LLMs to HPSS problems were heavily discussed. For instance, models that reflect biases toward certain communities or perspectives risk not only reinforcing those biases in subsequent research, but also leading to flawed conclusions about society, historical events, and related subjects.

Imagine a train that people blindly board, hoping that it would take them where they want since it had the best ratings. The train (LLM) is moving, but we do not know how it is navigating. People’s interaction with LLMs here is much like someone boarding a train without knowledge of the route, track conditions, or the decisions made to get them to their destination.

Unable to interpret the train’s journey, one would hardly understand why the train took the stops it did.

Some talks to clear the way

To improve interpretability, our discussions in CASCADE have explored the roles of Explainable AI (XAI) and temporal awareness. In what follows, I highlight two workshop talks, each corresponding to one of these key areas.

In his talk Interpretability for LLMs: Scientific Insights, Transparency, and Applications in the Humanities, Oliver Eberle addressed the broader issue of interpretability in neural networks—not just in LLMs. He outlined several approaches that provide more nuanced explanations than traditional attention-based methods, such as relevant walks identification for graph neural networks. Overall, the talk offered concrete methods for enhancing the interpretability of neural networks in the context of HPSS-related research. I highly recommend watching the recording when available to take note of those that might be relevant to one’s research.

While knowing XAI tools is important to ensure interpretability after using LLMs, so is time awareness. The outputs from LLMs were not intended to be time-aware by design since the original purpose was not to do diachronic text analysis but machine translation. For instance, it was not relevant to know “oxygène” was adopted from French into English as “oxygen” after Antoine-Laurent Lavoisier discovery in 1774, what mattered was only to identify the equivalent entities in the right order to build the most reliable translations.

In his talk Time-Aware Language Models: Towards a Novel Architecture for Historical Analysis, Jochen Büttner addressed this limitation and proposed an approach for integrating temporal sensitivity into LLMs. His abstract, along with those of the other participants, is on the workshop’s program page. As the presentations were recorded, I expect the talks to be published soon.

In conclusion, there is a need to develop benchmarks, datasets, tools, and related resources. This workshop not only offered approaches for conducting interpretable studies using LLMs, but also broadly encouraged researchers to frame HPSS problems as NLP tasks.

On that note, those interested in contributing—whether through a traditional research paper or reflective piece—are invited to respond to the open call for contributions associated with the workshop. For further information, please contact Dr. Arno Simons [arno.simons@tu-berlin.de].

Language & Technology

Discussion about this post