Muiltilingual:
WiLI Dataset https://arxiv.org/pdf/1801.07779.pdf\
https://zenodo.org/record/841984#.XxYLQXUzaWg
Russian:
Taiga (Lenta.ru) Dataset https://github.com/TatianaShavrina/taiga_site
Train Jyputer Notebook:
https://colab.research.google.com/drive/10xdNNp-sbTY_M8gCAwWarQqIUd89-BOw?usp=sharing
Russain text are about 15 % of all samples
Results on test set:
precision:0.998
recall: 0.990
f1 score: 0.994