My linguistics training had a very strong historical component, with a focus on Slavic and wider Indo-European. I like experimental and computational approaches and get bored quickly when doing interpretative, close-reading work (which is my limit: I look up to traditional philologists).
The main areas I work on are:
- semantics of non-finite clauses
- typology of temporal subordination
- discourse representation theory
- temporal semantics across sentences
My doctoral project looks (quantitatively, through treebank data) into the competition between finite and non-finite temporal subordinates in Early Slavic, and their position within the typology of when-clauses in 1400+ languages of the world.
It’s hard to set Digital Humanities apart from my main areas of research, and that’s often true for digital humanists at large: one tends to approach DH to answer questions in their research areas, and then may find themselves wondering about DH tools and techniques as such. The following are some of the areas I work most frequently in:
- application of computational and/or digital methods from outside the humanities to humanities research (geostatistics, genetics, biology, you name it)
- visualization of (small and big) parallel language data
- application of NLProc methods to answer humanities research questions (word embeddings, causal analysis, topic modelling, automatic linguistic annotation)
- corpus building, treebanking
- digitizations, handwritten text recognition, TEI-encoding
Some of my recent contributions include:
- Diachronic word embeddings from Big Historical Data (19th century English). See here - a case study on the language of mechanization will soon appear in the ACL Anthology.
- Early Slavic dependency parsing (see OldSlavNet and Publications).
Open Scholarship in the Humanities
I am Editorial Assistant for the Journal of Open Humanities Data (JOHD) and Fellow at RROx, the Oxford ‘branch’ of the UK Reproducibility Network (UKRN).
I’m interested in the specific challenges faced by the Humanities in making research reproducible (also: I get a bit angry when I am not given the steps followed by another researcher to get to an interpretation or a result).
Some of my contributions to the discussion:
- The Open Humanities Seminar Series (OHSS), a monthly event I organized and ran from January to April 2022, dedicated to different aspects of Open Humanities.
- Deep Impact: A study on the impact of data papers and datasets in the humanities and social sciences (2022) with Barbara McGillivray, Marton Ribary, Mandy Wigdorowitz and Eleonora Zordan, presented at SciDatCon-IDW Seoul 2022 and currently under peer review.
- Le Journal of Open Humanities Data (JOHD): enjeux et défis dans la publication de data papers pour les sciences humaines (2021), a paper written with Paola Marongiu, Marton Ribary and Barbara McGillivray, and presented at DHNord (soon to be published by Presses Universitaires du Septentrion).
Selected past projects
In 2020 I spent two months as a Research Assistant at the ReadOxford research group, based at the Department of Experimental Psychology of the University of Oxford. The aim of the group is to answer different questions related to child literacy development. I mainly dealt with data processing, developing R scripts to make corpus data reproducible and analysable for morphological complexity and lexical variation. My major contribution has been scripting an R code to automatically calculate the Average Reduced Frequency (ARF) of combined lemmata/parts of speech in the Oxford Children Corpus and Childes treebank.
Starting from mid-2020, I was first Research Assistant , then collaborated with the International Multimodal Communication Centre (IMCC) based within the Oxford School of Global and Area Studies (OSGA) at the University of Oxford. I carried out annotation and correlation analyses of speech-gesture co-occurrences in Russian and American media, largely within the project Depictions of Post-COVID-19 Futures in Russian International Media: Multimodal Viewpoint Analysis .
In 2015, I spent two months as a trainee Assistant Curator-Cataloguer for the Slavonic Collections at The British Library . During that time, I enhanced the online catalogue of all Slavonic early-printed Cyrillic books held at the British Library (and fuelled my interest for all-things data and pre-modern Slavic). Check out two posts I wrote for the British Library’s European Studies Blog:
- Fairytales on trial: the Good and the Beautiful in early-Soviet children’s literature (26th March 2019)
- A reluctantly modern voice from the 17th-century Russian storm: Archpriest Avvakum and the Lifewritten by himself (10 June 2015)