This directory contains the files to evaluate MT evaluation metrics,
it's not production standards data, neither will it be helpful in
shared task participation but it provides a good testbed for new metrics
implementation and comparison against metrics already available in
nltk.translate.*_score.py to validate the numbers.

It includes the first 100 sentences from the newstest 2015 development set
for the English-Russian language part, made available at Workshop for Machine
Translation 2016 (WMT16) and the Google Translate of the English source sentencs.

[Plaintext]

 - newstest-2015-100sents.en-ru.src.en
 - newstest-2015-100sents.en-ru.ref.ru
 - newstest-2015-100sents.en-ru.google.ru

[SGM]

 - newstest2015-100sents-enru-ref.ru.sgm 
 - newstest2015-100sents-enru-src.en.sgm
 - newstest2015-100sents-enru-google.ru.sgm

And the original ,sgm files from WMT16:

 - newstest2015-enru-ref.ru.sgm 
 - newstest2015-enru-src.en.sgm 


The plaintext are converted from the .sgm files from the development sets in WMT with 
the following command:

    sed -e 's/<[^>]*>//g; /^\s*$/d' newstest-2015.enru.src.en.sgm | head -n100 > newstest-2015-100sents.en-ru.src.en
    sed -e 's/<[^>]*>//g; /^\s*$/d' newstest-2015.enru.ref.ru.sgm | head -n100 > newstest-2015-100sents.en-ru.ref.en

The tokenized versions of the natural text files above are processed using Moses tokenizer.perl:

    ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ru < newstest-2015-100sents.en-ru.ref.ru > ref.ru
    ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ru < newstest-2015-100sents.en-ru.google.ru > google.ru
    ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < newstest-2015-100sents.en-ru.src.en > src.en

The Google translate outputs are created on 25 Oct 2016 10am. using the English source sentences.

The newstest2015-100sents-enru-google.ru.sgm is created using the wrap-xml.perl tool in Moses:

    ~/mosesdecoder/scripts/ems/support/wrap-xml.perl ru newstest2015-100sents-enru-src.en.sgm Google < google.ru > newstest2015-100sents-enru-google.ru.sgm


The BLEU scores output from multi-bleu.perl is as such:

    ~/mosesdecoder/scripts/generic/multi-bleu.perl ref.ru < google.ru 
    BLEU = 23.17, 53.8/29.6/17.6/10.3 (BP=1.000, ratio=1.074, hyp_len=1989, ref_len=1852)

The mteval-13a.output file is produced using the mteval-v13a.pl

    ~/mosesdecoder/scripts/generic/mteval-v13a.pl -r newstest2015-100sents-enru-ref.ru.sgm -s newstest2015-100sents-enru-src.en.sgm -t newstest2015-100sents-enru-google.ru.sgm  > mteval-13a.output