Digital Humanities 2024
Lisa Poggel Freie Universität Berlin
Viktor J. Illmer Freie Universität Berlin
Aleksandr Lange University of Manchester
Pauline Junginger Philipps-Universität Marburg
Zeynep Ecem Pulas Max Planck Institute for the History of Science
What is the provenance of gender data?
Which categories are used, and why?
How is it organised and does it follow a standard?
repeat
Python Scrapy used to scrape ADHO conference abstracts 2013-2023 from conference websites and repositories
XML-TEI 2013–2016, 2018, 2020, 2022, and 2023
Plaintext 2017
PDF to plaintext conversion 2019
Code available at: https://github.com/lipogg/dh-projects-scraper
(Mayring, 2000)
Variable | Fleiss’ Kappa | Strength of agreement |
Online | 0.76 | Substantial |
Digital Humanities | 0.58 | Moderate |
Dataset | 0.32 | Fair |
Single dataset | 0.35 | Fair |
Unrestricted | 0.43 | Moderate |
Personal Data | 0.46 | Moderate |
Gender Data | 0.59 | Moderate |
Variable | Fleiss’ Kappa | Strength of agreement |
Online | 0.64 | Substantial |
Digital Humanities | 0.63 | Substantial |
Dataset | 0.68 | Substantial |
Single dataset | 0.77 | Substantial |
Unrestricted | 0.68 | Substantial |
Personal Data | 0.48 | Moderate |
Gender Data | 0.63 | Substantial |
Result interpretation based on Landis and Koch (1977)
Uncertainty Undetermined, Indeterminate, Contested
Inapplicability Not applicable
Unavailability Not provided, ?, Unknown
The “other” category Used for “other” genders, unknown values, animals and organizations
XML-TEI <sex> four times, with remarks: sex refers to “performed gender”, “female/male named individuals”
CIDOC-CRM
Wikidata
FOAF:gender
rdaGr2:gender
See e.g. Siddiqui (2023)
Common labels such as “other”, “woman”, or “gender” itself imply equivalence where varying provenance entails qualitative difference
Without provenance information, gender data is neither interoperable nor interpretable
Provenance of gender data should always be made explicit and qualified as e.g. “assumed gender”, “self-identified gender”, etc.
Provenance information should be directly accessible and visible alongside the data
Abolish the “other” category
The burning of the knight Richard Puller von Hohenburg with his servant before the walls of Zürich, for sodomy, 1482, reimagined (Original: Wikimedia Commons 2019)
Lisa Poggel l.poggel@fu-berlin.de
Viktor J. Illmer v.illmer@fu-berlin.de
Aleksandr Lange aleksandr.lange@postgrad.manchester.ac.uk
Pauline Junginger pauline.junginger@uni-marburg.de
Zeynep Ecem Pulas zecempulas@mpiwg-berlin.mpg.de
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy in the context of the Cluster of Excellence Temporal Communities: Doing Literature in a Global Perspective – EXC 2020 – Project ID 390608380.