Publish repository ④
Write thesis
Kolloquium "Phänomenologie der Digital Humanities" #8 (7.2.2025)
Lisa Poggel Chair of Digital Humanities, FU Berlin
Introduction |
|
---|
The Project |
|
---|
Art history: a record of the history of ownership of a piece of art
Archival science/archaeology: context wherein objects and records are found and created to aid in their interpretation and take those materials as evidence of context.
Sources: Glavic (2021: 3f.), Anderson (2024)
Source: W3C (2010)
(or lineage, pedigree, parentage, genealogy, ...)
Databases: record of how the data item was derived from other data items by a set of transformations; explains how the result of an operation was derived from its inputs
Experimental sciences: metadata record of the process of experiment workflows, annotations, and notes about experiments; ensures reproducibility and trustworthiness
Data science: record that describes the origins and processing of data; enables responsible (i.e. fair, accountable, transparent and explainable) AI
Research Data Management: degree to which a data set and the data elements and data values it contains are equipped with verifiable information about its origin. Dokumentation darüber, woher Datenmaterial stammt und mit welchen Prozessen und Methoden es produziert wurde; beantwortet die Fragen, warum und wie die Daten produziert wurden, wo, wann und von wem.
Sources: Glavic (2021: 3f.), Simmhan, Plale and Gannon (2005: 1), Data Provenance Initiative (2024), Werder, Ramesh and Zhang (2022), Stein and Taentzer (2023), eResearch Alliance Universität Göttingen (2025)
Provenance: any information describing the production process of an end product, which can be anything from a piece of digital data to a physical object.
Provenance meta-data: meta-data describing an arbitrary production process using an arbitrary data model and model of computation
Information system provenance: meta-data collected for an information-disseminating process that can be computed based on the input, the output, and the parameters of the process. (= processes producing digital data within information systems)
Workflow provenance: specializes information system provenance by further restricting the type of production processes to so-called workflows. (= directed graph where nodes represent arbitrary functions or modules [...] with some input, output, and parameters)
Data provenance: allows to track the processing of individual data items (e.g., tuples) at the “highest resolution,” i.e., the provenance itself is at the level of individual data items (and the operations they undergo). Collecting data provenance typically applies on structured data models and declarative query languages with clearly defined semantics of individual operators.
Sources: highlighted terms from Herschel, Diestelkämper and Lahmar (2017: 882f.), Simmhan, Plale and Gannon (2005), IBM (2024) u.A.
The FAIR data consortium subsumes data provenance under the reusability aspect and describes data provenance as a precondition for reusability:
For others to reuse your data, they should know where the data came from (i.e., clear story of origin/history, see R1), who to cite and/or how you wish to be acknowledged. Include a description of the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else that you may have transformed or completed? Ideally, this workflow is described in a machine-readable format."
Source: FAIR Data Consortium
Tasks of Sektion (Meta)daten, Terminologien, Provenienz according to Sektionskonzept (2021):
Im Themenbereich Provenienz befasst sich die Sektion mit rechtlichen, technischen und kulturellen Aspekten des Entstehungskontextes von (Meta)daten (z. B. im Rahmen von Experimenten, Laborbüchern, Digitalisierungsprozessen etc.) und entwirft Vorschläge für einheitliche und nachvollziehbare Dokumentationsverfahren zur Beantwortung der Fragen nach dem Was, Wo, Wann, Wer, Wie und Warum der Datenerzeugung und Datenprozessierung. Hierbei entwickelt die Sektion Empfehlungen für die Abbildung der Provenienz in einem möglichen NFDI-Kernmetadatenformat.
Source: Koepler et al. (2021)
From the Charta of the Cookbooks, Guidance and Best Practices Working Group within the Sektion (Meta)daten, Terminologien, Provenienz:
A common understanding of (meta)data, terminology, provenance and related sub-concepts is core in data-driven research to foster the provision of FAIR data. However, knowledge and implementation of metadata standards, data repositories, terminologies as well as provenance concepts differ within and across disciplines. In order to create or reuse subject- and application-specific metadata that is at the same time semantically rich, machine-actionable and interoperable, and to interlink data (i.e. FAIR data), a common understanding of quality parameters for metadata is required.
Source: Arndt et al. (2022)
Provenance meta-data: e.g. Historical Context Ontology (HiCO)
Information system provenance: e.g. commercial data historians like Clarify for gathering process manufacturing data; tools for recording the computing environment like R E2ETools
Workflow provenance: e.g. TaDiRAH; tools like Vistrails or LabelFlow
Data provenance: e.g. Wikidata references, ProvSQL
Source: Massari, Peroni, Tomasi and Heibi (2023), see also Sikos (2021), Sikos and Philp (2020)
When it comes to managing gender data, common challenges and beliefs seem to be:
Certainly within the community of researchers working with cultural data, the desire to compare and aggregate diverse sources held together by a thin red thread of potential narrative cohesion, is only increasing. The KPLEX project (kplex-project.eu) is investigating these barriers to meaning-making. Our team has adopted a comparative, multidisciplinary, and multi-sectoral approach to this problem, focussing on key challenges to the knowledge creation capacity of cultural data such as the terms we use to speak about data in a cultural context, the manner in which data that are not digitised or shared become “hidden” from aggregation systems, the fact that data lacks the objectivity often ascribed to the term and the subtle ways in which data that are complex almost always become simplified before they can be aggregated.
Source: Edmond and Folan (2017)
Sources: [1] Andrews et al. (2024), [2] Flanders (2021)
Introduction |
|
---|
The Project |
|
---|
Digital humanities projects working with prosopographical data face a dilemma: historical gender data is inaccurate, messy, and mostly binary, but leaving it out means rendering gender as a social category of difference invisible. Approaches to tracking and representing the provenance of historical gender data vary and standardization is needed to improve interoperability and interpretability.
repeat
Python Scrapy used to scrape ADHO conference abstracts 2013-2023 from conference websites and repositories
XML-TEI 2013–2016, 2018, 2020, 2022, and 2023
Plaintext 2017
PDF to plaintext conversion 2019
Code available at: https://github.com/lipogg/dh-projects-scraper
(Mayring, 2000)
Variable | Fleiss’ Kappa | Strength of agreement |
Online | 0.76 | Substantial |
Digital Humanities | 0.58 | Moderate |
Dataset | 0.32 | Fair |
Single dataset | 0.35 | Fair |
Unrestricted | 0.43 | Moderate |
Personal Data | 0.46 | Moderate |
Gender Data | 0.59 | Moderate |
Variable | Fleiss’ Kappa | Strength of agreement |
Online | 0.64 | Substantial |
Digital Humanities | 0.63 | Substantial |
Dataset | 0.68 | Substantial |
Single dataset | 0.77 | Substantial |
Unrestricted | 0.68 | Substantial |
Personal Data | 0.48 | Moderate |
Gender Data | 0.63 | Substantial |
Result interpretation based on Landis and Koch (1977)
Uncertainty Undetermined, Indeterminate, Contested
Inapplicability Not applicable
Unavailability Not provided, ?, Unknown
The “other” category Used for “other” genders, unknown values, animals and organizations
XML-TEI <sex>
four times, with remarks: sex refers to “performed gender”,
“female/male named individuals”
CIDOC-CRM
Wikidata
FOAF:gender
rdaGr2:gender
See e.g. Siddiqui (2023)
Publish repository ④
Write thesis
Conduct interviews ③
Create repository ④
Write thesis
Take stock ①
Identify issues ②