The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Taleghani, Marziyeh; pazouki, ehsan; Ghahraman, Vahid

Manuscript ID : 668978 Visit : 508 Page: 43 - 55

20.1001.1.20088590.2019.9.3.4.6

Article Type: Original Research

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Subject Areas : All areas of language and translation

Marziyeh Taleghani ¹ , ehsan pazouki ² , Vahid Ghahraman ³

1 - MA in Translation Studies, Faculty of Persian Literature and Foreign Languages, South Tehran Branch of Azad University Iran
2 - Assistant Professor of Artificial Intelligence, Faculty of Computer Engineering, Shahid Rajaei Teacher Training University, Tehran, Iran
3 - Assistant Professor of TESOL, Iran Encyclopedia Compiling Foundation, Tehran, Iran

Received: 2019-11-10 Accepted : 2019-11-10 Published : 2019-07-01

Keywords:

Abstract :

References:

Agarwal, A., & Lavie, A. (2008). METEOR, M-BLEU and M-TER: Evaluation metrics for high-correlation with human rankings of machine translation output (pp. 115–118). Presented at the Third Workshop on Statistical Machine Translation, Columbus.

Ansari, E., Sadreddini, M. H., Tabebordbar, A., & WALLACE, R. (2014). Extracting Persian-English parallel sentences from document level aligned comparable corpus using bi-directional translation. Advances in Computer Science: An International Journal, 3(5), 59–65.

Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (pp. 65–72). Michigan.

Bouamor, H., Alshikhabobak, H., Mohit, B., & Oflazer, K. (2014). A human judgment corpus and a metric for Arabic MT evaluation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 207–213). Doha, Qatar.

Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., & Schroeder, J. (2007). (Meta-) Evaluation of Machine Translation. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 136–158). Stroudsburg, PA, USA: Association for Computational Linguistics.

Callison-Burch, C., Osborne, M., & Koehn, P. (2006). Re-evaluating the role of BLEU in machine translation research. In In Proceedings of EACL-2006.

Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence Statistics. In HLT ’02 Proceedings of the second international conference on Human Language Technology Research (pp. 138–145). Morgan Kaufmann Publishers Inc.

Dreyer, M., & Marcu, D. (2012). HyTER: Meaning-Equivalent Semantics for Translation Evaluation (pp. 162–171). Presented at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

Farzi, S., & Faili, H. (2015). A swarm-inspired re-ranker system for statistical machine translation. Computer Speech & Language, 29(1), 45–62.

Giménez, J., & Màrquez, L. (2010). Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation. The Prague Bulletin of Mathematical Linguistics, 94, 77–86.

Kalyani, A., Kumud, H., Singh, S. P., & Kumar, A. (2014). Assessing the Quality of MT Systems for Hindi to English Translation. International Journal of Computer Applications, 89(15), 41–45.

Machine translation. (2015, September 6). In Wikipedia. Retrieved from

https://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=679685135

MATLAB. (2017, January 17). In Wikipedia. Retrieved from

https://en.wikipedia.org/w/index.php?title=MATLAB&oldid=760467403

Nießen, S., Och, F. J., Leusch, G., Ney, H., & Informatik, L. F. (2000). A Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. In In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000).

Papineni, K., Roukos, S., Ward, T., & Wei, J. Z. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association forComputational Linguistics (ACL) (pp. 311–318). Philadelphia.

pilevar, M. T., & Faili, H. (2010). Persian SMT: A first attempt to English-Persian statistical machine translation. In JADT 2010: 10th international conference on statistical analysis of textual data.

Snover, M., Dorr, B., Schwart, R., Micciulla, L., & Makhoul, M. (2006). A Study of Translation Edit Rate with Targeted Human Annotation. In In Proceedings of Association for Machine Translation in the Americas (pp. 223–231).

Sun, Y. (2010). Mining the Correlation between Human and Automatic Evaluation at Sentence Level. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, … D. Tapias (Eds.), Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta. European Language Resources Association.

Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., & Sawaf, H. (1997). Accelerated DP based search for statistical translation. In Proceedings of European Conference on Speech Communication and Technology. Rhodes, Greece.

Turian, J., Shen, L., & Melamed, I. D. (2003). Evaluation of Machine Translation and its Evaluation. In In Proceedings of MT Summit IX (pp. 386–393).

Share To

Article Url

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Sanad

Links

Related Centers

Technical Support

Official pages