Open datasets and software
- [HuggingFace] BERTislav: a BERT-based fill-mask Early Slavic language model
- [HuggingFace] OldSlavNet: model and data
- [HuggingFace] Binary RoBERTa-based classifier fine-tuned on historical British newspaper articles reporting of suicide, to distinguish between (confirmed or speculated) suicide cases, investigations, or court cases
- [Figshare] Replication data and code for: Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology
- [Zenodo] Early Slavic language models
- [Zenodo] Ancient Greek language models
- [Figshare] Replication data for: A quantitative and typological study of Early Slavic participle clauses and their competition (University of Oxford, DPhil Thesis)
- [Figshare] Code and data for The Semantic Map of When and its Typological Parallels
- [GitHub] DiachronicEmb-BigHistData: Tools to train and explore diachronic word embeddings from Big Historical Data
- [GitHub] OldSlavNet (Early Slavic dependency parser)
- [GitHub] Introduction to Text Mining: Jupyter notebooks (used as teaching material for the MSc in Digital Scholarship at the University of Oxford, October 2023)
- [Zenodo] Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)
- [Zenodo] Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England
- [GitHub] Python scripts to train and explore syntactic (graph-based, Node2Vec) word embeddings for Ancient Greek
- [Zenodo] DataPapersAnalysis: Scripts to carry out impact analysis on the publication metrics related to the Journal of Open Humanities Data and the Research Data Journal for the Humanities and Social Sciences
- [Figshare] Data from 'One question, different annotation depths: A case study in Early Slavic'
- [Figshare] Data from 'Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic'
- [GitHub] Word-alignment models for Bible translations in 100+ historical and contemporary languages and scripts to train them
- [GitHub] Mixed drafts, scripts or data useful for NLP tasks on Pre-Modern Slavic.