QR Code Presentation

Digital Humanities 2024

QR Code Presentation

Decoding Gender in Digital Humanities Datasets

A Critical Quantitative Perspective

Lisa Poggel Freie Universität Berlin

Viktor J. Illmer Freie Universität Berlin

Aleksandr Lange University of Manchester

Pauline Junginger Philipps-Universität Marburg

Zeynep Ecem Pulas Max Planck Institute for the History of Science

https://lipogg.github.io/dh2024-slides

What is the provenance of gender data?

Which categories are used, and why?

How is it organised and does it follow a standard?

Why do we need a critical

quantitative perspective?

Method

Overview

  1. Collect data
    Extract URLs from Books of Abstracts of ADHO conferences
  2. Clean and process data
    Filter URLs for URLs likely pointing to DH datasets
  3. Conduct content analysis
    Identify DH datasets and evaluate datasets with respect to a set of variables
    • Develop annotation variables and coding scheme
    • repeat

    • Annotate content
    • Evaluate agreement between annotators with ICR scores
  4. Evaluate results
Scrapy Logo

Data collection

Python Scrapy used to scrape ADHO conference abstracts 2013-2023 from conference websites and repositories

XML-TEI 2013–2016, 2018, 2020, 2022, and 2023

Plaintext 2017

PDF to plaintext conversion 2019

Code available at: https://github.com/lipogg/dh-projects-scraper

Scrapy Logo

Data cleaning and preprocessing

  • Deduplication
    • remove URLs from the same conference year
  • Validation
    • validate URLs, remove e-mails, URL-like strings
  • Pre-filtering
    • filter out very frequent domains
  • sciencedirect
  • culturalanalytics
  • orcid.org
  • twitter.com
  • reddit.com
  • doi.org
  • zenodo.org
Google Sheets Logo

Annotation

  • Qualitative content analysis
  • Mixed deductive-inductive category system development
  • Selection variable funnel and content variables
  • Quantitative evaluation

(Mayring, 2000)

Selection Variable Funnel

  • Is the URL …
    • online?
    • a Digital Humanities resource?
    • a dataset?
    • a single dataset?
    • unrestricted?
  • And does it …
    • contain personal data?
    • contain gender data?

Inter-Coder Reliability Scores


First annotation round (5 coders)
Variable Fleiss’ Kappa Strength of agreement
Online 0.76 Substantial
Digital Humanities 0.58 Moderate
Dataset 0.32 Fair
Single dataset 0.35 Fair
Unrestricted 0.43 Moderate
Personal Data 0.46 Moderate
Gender Data 0.59 Moderate

Second annotation round (3 coders)
Variable Fleiss’ Kappa Strength of agreement
Online 0.64 Substantial
Digital Humanities 0.63 Substantial
Dataset 0.68 Substantial
Single dataset 0.77 Substantial
Unrestricted 0.68 Substantial
Personal Data 0.48 Moderate
Gender Data 0.63 Substantial

Result interpretation based on Landis and Koch (1977)

Results and Discussion

Selection Variables

Gender expressions

Test

Common non-gender expressions

Uncertainty Undetermined, Indeterminate, Contested

Inapplicability Not applicable

Unavailability Not provided, ?, Unknown

The “other” category Used for “other” genders, unknown values, animals and organizations

Gender expressions by structuredness of gender data

Test

Standards used

Text Encoding Initiative (TEI) Logo XML-TEI <sex> four times, with remarks: sex refers to “performed gender”, “female/male named individuals”

CIDOC bubbles CIDOC-CRM

Wikidata Logo Wikidata

FOAF Logo FOAF:gender

RDF Logo rdaGr2:gender

Gender expressions by availability of provenance information

Test

Provenance of gender data

Test

Gender expressions by provenance

Test

Limitations

  • Global North overrepresented in DH conferences and conference abstracts
  • Established projects favoured over small datasets
  • Gender categories beyond the binary less easily enumerated for unstructured data

See e.g. Siddiqui (2023)

Outlook

Common labels such as “other”, “woman”, or “gender” itself imply equivalence where varying provenance entails qualitative difference

Without provenance information, gender data is neither interoperable nor interpretable

Provenance of gender data should always be made explicit and qualified as e.g. “assumed gender”, “self-identified gender”, etc.

Provenance information should be directly accessible and visible alongside the data

Abolish the “other” category

The burning of the knight Richard Puller von Hohenburg with his servant before the walls of Zürich, for sodomy, 1482, reimagined (Original: Wikimedia Commons 2019)

The burning of the knight Richard Puller von Hohenburg with his servant before the walls of Zürich, for sodomy, 1482, reimagined (Original: Wikimedia Commons 2019)

Future plans

  • Workflow and list of DH datasets will be published and may be repurposed for other meta-analyses
  • What other annotation variables would be interesting? e.g. language, project stage (onset, maintenance, archived …)
  • Interviews may help identify common pitfalls, their causes, and pragmatic routes to standardization

Thank you! Questions, suggestions, criticism, …?

Literature

Albina, B., Nelson, E. and Uhl, R. (eds) (2024). Inclusive Cataloging: Histories, Context, and Reparative Approaches. Chicago: ALA Editions.
Billey, A., Drabinski, E. and Roberto, K. R. (2014). What’s Gender Got to Do with It? A Critique of RDA 9.7. Cataloging & Classification Quarterly, 52(4), pp. 412–21. 10.1080/01639374.2014.882465.
Bode, K. (2020). Why You Can’t Model Away Bias. Modern Language Quarterly, 81(1), pp. 95–124. 10.1215/00267929-7933102.
Brown, S. (2018). Delivery Service: Gender and the Political Unconscious of Digital Humanities. In Losh, E. and Wernimont, J. (eds), Bodies of Information: Intersectional Feminism and the Digital Humanities. Minneapolis: University of Minnesota Press, pp. 261–86. 10.5749/j.ctv9hj9r9.
Canning, E. et al. (2022). The Power to Structure: Making Meaning from Metadata Through Ontologies. KULA: Knowledge Creation, Dissemination, and Preservation Studies, 6(3), pp. 1–15. 10.18357/kula.169.
Craig, H., Estill, L. and May, K. L. (2023). A Rationale of Trans-Inclusive Bibliography. Textual Cultures, 16(2), pp. 1–28. 10.14434/tc.v16i2.36763.
D’Ignazio, C. and Klein, L. F. (2023). Data Feminism. Cambridge, Massachusetts: The MIT Press.
Hall, M. (2020). Opportunities and Risks in Digital Humanities Research. In Carius, H.Prell, M.and Smolarski, R. (eds), Kooperationen in den digitalen Geisteswissenschaften gestalten. Göttingen: V&R unipress, pp. 47–66. 10.14220/9783737011778.47.
Kim, D. (2018). Building Pleasure and the Digital Archive. In Losh, E. and Wernimont, J. (eds), Bodies of Information: Intersectional Feminism and the Digital Humanities. Minneapolis: University of Minnesota Press, pp. 230–60. 10.5749/j.ctv9hj9r9.
Kim, D. and Stommel, J. (eds) (2018). Disrupting the Digital Humanities. Santa Barbara: Punctum Books. 10.2307/j.ctv19cwdqv.
Landis, J. R. and Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), pp. 159–74. 10.2307/2529310.
Losh, E. and Wernimont, J. (eds) (2018). Bodies of Information: Intersectional Feminism and the Digital Humanities. Minneapolis: University of Minnesota Press. 10.5749/j.ctv9hj9r9.
Mandell, L. (2019). Gender and Cultural Analytics: Finding or Making Stereotypes? In Gold, M.K. and Klein, L.F. (eds), Debates in the Digital Humanities 2019. Minneapolis: University of Minnesota Press, pp. 3–26. 10.5749/j.ctvg251hk.
Mayring, P. (2000). Qualitative Content Analysis. Forum: Qualitative Social Research, 1(2). 10.17169/FQS-1.2.1089.
Ortolja-Baird, A. and Nyhan, J. (2022). Encoding the Haunting of an Object Catalogue: On the Potential of Digital Technologies to Perpetuate or Subvert the Silence and Bias of the Early-Modern Archive. Digital Scholarship in the Humanities, 37(3), pp. 844–67. 10.1093/llc/fqab065.
Posner, M. (2016). What’s Next: The Radical, Unrealized Potential of Digital Humanities. In Gold, M.K. and Klein, L.F. (eds), Debates in the Digital Humanities 2016. Minneapolis: University of Minnesota Press, pp. 32–41. 10.5749/j.ctt1cn6thb.
Robinson, P. (2019). Gender, Feminism, Textual Scholarship, and Digital Humanities. In Bordalejo, B. and Risam, R. (eds), Intersectionality in Digital Humanities. Amsterdam: Amsterdam University Press, pp. 89–108. 10.1017/9781641890519.008.
Schwartz, M. and Crompton, C. (2018). Remaking History: Lesbian Feminist Historical Methods in the Digital Humanities. In Losh, E. and Wernimont, J. (eds), Bodies of Information: Intersectional Feminism and the Digital Humanities. Minneapolis: University of Minnesota Press, pp. 131–56. 10.5749/j.ctv9hj9r9.
Siddiqui, N. (2023). An Undue Burden: Race, Gender, and Mobility in Digital Humanities Conferences. In DH2023. Graz.https://dh-abstracts.library.virginia.edu/works/12417.
Vecoli, L. (2015). The Tretter Collection: What We Have, What’s Missing, and the Challenges of Trans History. TSQ: Transgender Studies Quarterly, 2(4), pp. 607–13. 10.1215/23289252-3151529.

Contact

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy in the context of the Cluster of Excellence Temporal Communities: Doing Literature in a Global Perspective – EXC 2020 – Project ID 390608380.