This directory contains the files to evaluate MT evaluation metrics,
it's not production standards data, neither will it be helpful in
shared task participation but it provides a good testbed for new metrics
implementation and comparison against metrics already available in
nltk.translate.*_score.py to validate the numbers.
It includes the first 100 sentences from the newstest 2015 development set
for the English-Russian language part, made available at Workshop for Machine
Translation 2016 (WMT16) and the Google Translate of the English source sentencs.
[Plaintext]
- newstest-2015-100sents.en-ru.src.en
- newstest-2015-100sents.en-ru.ref.ru
- newstest-2015-100sents.en-ru.google.ru
[SGM]
- newstest2015-100sents-enru-ref.ru.sgm
- newstest2015-100sents-enru-src.en.sgm
- newstest2015-100sents-enru-google.ru.sgm
And the original ,sgm files from WMT16:
- newstest2015-enru-ref.ru.sgm
- newstest2015-enru-src.en.sgm
The plaintext are converted from the .sgm files from the development sets in WMT with
the following command:
sed -e 's/<[^>]*>//g; /^\s*$/d' newstest-2015.enru.src.en.sgm | head -n100 > newstest-2015-100sents.en-ru.src.en
sed -e 's/<[^>]*>//g; /^\s*$/d' newstest-2015.enru.ref.ru.sgm | head -n100 > newstest-2015-100sents.en-ru.ref.en
The tokenized versions of the natural text files above are processed using Moses tokenizer.perl:
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ru < newstest-2015-100sents.en-ru.ref.ru > ref.ru
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ru < newstest-2015-100sents.en-ru.google.ru > google.ru
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < newstest-2015-100sents.en-ru.src.en > src.en
The Google translate outputs are created on 25 Oct 2016 10am. using the English source sentences.
The newstest2015-100sents-enru-google.ru.sgm is created using the wrap-xml.perl tool in Moses:
~/mosesdecoder/scripts/ems/support/wrap-xml.perl ru newstest2015-100sents-enru-src.en.sgm Google < google.ru > newstest2015-100sents-enru-google.ru.sgm
The BLEU scores output from multi-bleu.perl is as such:
~/mosesdecoder/scripts/generic/multi-bleu.perl ref.ru < google.ru
BLEU = 23.17, 53.8/29.6/17.6/10.3 (BP=1.000, ratio=1.074, hyp_len=1989, ref_len=1852)
The mteval-13a.output file is produced using the mteval-v13a.pl
~/mosesdecoder/scripts/generic/mteval-v13a.pl -r newstest2015-100sents-enru-ref.ru.sgm -s newstest2015-100sents-enru-src.en.sgm -t newstest2015-100sents-enru-google.ru.sgm > mteval-13a.output