The depth and diversity of Cultural Heritage collections are recognised as invaluable for enriching lives, fostering social and cultural cohesion, and acting as a valuable economic resource. Yet making full use of those collections and the individual records within them remains hampered by a series of interrelated problems: 1. digital catalogue metadata tend to exist for only a small proportion of CH collections; 2. where it exists, it is often sparse, unstructured and contains varying forms of bias; 3. where structured, it is often not aligned with external authorities. This means that it is currently difficult to discover individual items and almost impossible to link them to other records within the same collection, let alone between different resources.
To address these issues, guidelines have been produced to improve the Findability, Accessibility, Interoperability and Reusability of digital assets through machine-actionable methods. Based on FAIR principles, Linked Open Data (LOD) has proven an effective mechanism for identifying, disambiguating and linking key entities, such as place, people, objects and events, but implementing LOD tends to require massive investment in time, resource and expertise. More recently, transformer-based AI Large Language Models (LLMs) have demonstrated a remarkable capacity to interpret and contextualise natural language. However, while LLMs are far more intuitive to use, their probabilistic and variable outputs make data enrichment unstable and unpredictable: they can return simply too many errors to make their use worthwhile for data curation.
The particular scenario set out here uses a combination of LOD and LLM technologies to enable digital assets to be enriched through the processes of Named Entity Recognition, Named Entity Disambiguation, and Relationship Extraction.