2024:Program/Empowering Wikidata editors and content with the Wikidata Quality Toolkit

View on Commons

Session title: Empowering Wikidata editors and content with the Wikidata Quality Toolkit

Session type: Workshop

Track: Technology

Language: en

This workshop focuses on presenting and showing how to use the tools included in the Wikidata Quality Toolkit (WQT). The WQT consists of tools, built on top of recently published research, that aim at assisting Wikidata editors in three of their daily tasks: recommending editors what items to edit (based on their expertise and edit history); detecting item references that do not support claims well (and ways of improving them); and automatically generating EntitySchemas to find items with missing information. We will demonstrate these tools, help editors in using them, and gather feedback for further improvement.

Description

The Wikidata Quality Toolkit (WQT, https://king-s-knowledge-graph-lab.github.io/WikidataQualityToolkit/) is a set of tools that various researchers, developers, and open knowledge activists and enthusiasts are currently working on to improve the quality of content in Wikidata and improve the workflows of editors in their everyday tasks. These tools are built on top of successful, recently published research about Wikidata (including a paper that received the Wikimedia Foundation Research of the Year Award in 2022). Our aim is to help transition these tools from the lab into the community and the world.

The WQT contains three tools covering the spectrum of Wikidata content quality and editor workflow improvement, and solve three tasks: verifying the quality of references in supporting claims, recommending items based on expertise automatically, and generating schemas for item completion.

Reference Quality Verification (RQV). RQV provides an automated pipeline that verifies whether Knowledge Graph triples are supported by their documented sources. It involves text extraction, triple verbalization, sentence selection, and claim verification using rule-based methods and machine learning models. The users can verify the reference quality of specific document or wikidata item by using this tool. Futhermore, this tool supports to verify a batch of documents and wikidata tiems automatically.

Wikidata Game+. Wikidata Game+ builds upon the Wikidata Game by incorporating a novel recommendation system that provides personalised recommendation items for the editors, relying on both item features and item-editor previous interactions. It utilises users', items' content, and items' relations representations using matrix factorization, ELMo, and TransR embedding techniques.

EntitySchema Generator. There are numerous issues with Wikidata modeling and data quality, with inconsistent modeling of EntitySchemas being one of the most significant challenges nowadays. The EntitySchema generator addresses this by generating reference patterns of entity schemas for specific topics of entities based on Large Language Models (LLMs). By training on both good and bad examples, it can generate reference patterns and evaluate the quality of entity schemas. Additionally, it can modify inconsistent entity schemas based on the generated best patterns, and provide explanations and additional comments leveraging the capabilities of LLMs.

Session recording: https://www.youtube.com/watch?v=BbGrkYK8FEk&list=PLhV3K_DS5YfJ1xyY0LNDNX3RKyRQEXOdB&t=22855

How does your session relate to the event theme, Collaboration of the Open?

The session reinforces "Collaboration of the Open" by leveraging synergies and building collaboration bridges between three different open ecosystems: academia, which aims at improving scientific methods and providing innovation openly for the benefit of all society; open source developers, which can follow up on the outputs of academia (papers, early software) and turn them into scalable, globally accessible and usable tools; and the open knowledge community of the Wikidata/Wikimedia movement, the final users of such tools and through which we hope to improve the quality of open knowledge resources.

What is the experience level needed for the audience for your session?

Everyone can participate in this session

Etherpad link

https://etherpad.wikimedia.org/p/WM2024_Day3_Ochrid_-_Room_9

Resources

https://docs.google.com/presentation/d/1A3D9aFo6bOI6AiwMcDyOLSXoXFAw7y33YXD11akiBc8/edit?usp=sharing

Speakers

Albert Meroño

I am an Assistant Professor (Lecturer) in Computer Science at King’s College London, United Kingdom. My research revolves around culturally-informed Artificial Intelligence, in particular multimodal knowledge graphs, Web data APIs, music semantics, and knowledge representation and reasoning for digital humanities and cultural heritage.