Open datasets and software
- Early Slavic language models
- Ancient Greek language models
- Replication data for: A quantitative and typological study of Early Slavic participle clauses and their competition (University of Oxford, DPhil Thesis)
- Code and data for The Semantic Map of When and its Typological Parallels
- OldSlavNet (Early Slavic dependency parser)
- Introduction to Text Mining: Jupyter notebooks (used as teaching material for the MSc in Digital Scholarship at the University of Oxford, October 2023)
- Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)
- Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England
- Python scripts to train and explore syntactic (graph-based, Node2Vec) word embeddings for Ancient Greek
- Scripts to carry out impact analysis on the publication metrics related to the Journal of Open Humanities Data and the Research Data Journal for the Humanities and Social Sciences
- Data from 'One question, different annotation depths: A case study in Early Slavic'
- Data from 'Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic'
- Word-alignment models for Bible translations in 100+ historical and contemporary languages and scripts to train them
- Mixed drafts, scripts or data useful for NLP tasks on Pre-Modern Slavic.