To train your own parser, you will need:
train.conllu
: the training data. Put this file inside the folder ./training_data/new/
dev.conllu
: the development data. Put this file inside the folder ./training_data/new/
./test_data/gold/
, and a copy of each of these inside ./test_data/tobeannotated/
After adding these files to the right directories, the project directory should look like the following (note the placement of train.conllu
, dev.conllu
, and of the gold-test data, named text1.conllu
and text2.conllu
):
📦ROOT
┣ 📂models
┃ ┗ 📂OldSlavNet
┃ ┣ 📜model
┃ ┗ 📜model.params
┣ 📂oldslavnet-venv
┣ 📂scripts
┣ 📂test_data
┃ ┣ 📂annotated
┃ ┣ 📂gold
┃ ┃ ┣ 📜text1.conllu
┃ ┃ ┗ 📜text2.conllu
┃ ┗ 📂tobeannotated
┃ ┣ 📜text1.conllu
┃ ┗ 📜text2.conllu
┣ 📂training_data
┃ ┣ 📂new
┃ ┃ ┣ 📜dev.conllu
┃ ┃ ┗ 📜train.conllu
┃ ┗ 📂past
┃ ┗ 📂OldSlavNet
┃ ┣ 📜dev.conllu
┃ ┗ 📜train.conllu
┣ 📜LICENSE
┣ 📜Makefile
┣ 📜README.md
┣ 📜requirements.txt
┣ 📜tag.sh
┗ 📜train.sh
From the ROOT
directory, run:
./train.sh
You will be prompted to enter:
This will:
./models/
named after the name you entered for your model, where the trained model itself (the model
and model.params
files) will be saved./training_data/new/
to a new folder under ./training_data/past/
named after the name you entered for your model./test_data/tobeannotated/
, compare them with those with the same name under ./test_data/gold/
and generate a text file for each of them with performance metrics under ./models/yourmodelname/validation-output/
After the model has been trained, the project directory should look like the following:
📦ROOT
┣ 📂models
┃ ┣ yourmodelname
┃ ┃ ┣ 📜model
┃ ┃ ┣ 📜model.params
┃ ┃ ┗ 📂validation-output
┃ ┃ ┣ 📜text1-validated.txt
┃ ┃ ┗ 📜text2-validated.txt
┃ ┗ 📂OldSlavNet
┃ ┣ 📜model
┃ ┗ 📜model.params
┣ 📂oldslavnet-venv
┣ 📂scripts
┣ 📂test_data
┣ 📂training_data
┃ ┣ 📂new
┃ ┗ 📂past
┃ ┗ 📂yourmodelname
┃ ┣ 📜dev.conllu
┃ ┗ 📜train.conllu
┃ ┗ 📂OldSlavNet
┃ ┣ 📜dev.conllu
┃ ┗ 📜train.conllu
┣ 📜LICENSE
┣ 📜Makefile
┣ 📜README.md
┣ 📜requirements.txt
┣ 📜tag.sh
┗ 📜train.sh