Mukayese: Turkish NLP Strikes Back

Ali Safaya, Emirhan Kurtuluş, Arda Göktoğan, and Deniz Yuret

In Findings of the Association for Computational Linguistics: ACL 2022, pages 846-863, Dublin, Ireland.

Turkish Natural Language Processing is left behind in developing state-of-the-art systems due to a lack of organized benchmarks and baselines. We fill this gap with Mukayese (Turkish word for "comparison/benchmarking"), an extensive set of datasets and benchmarks for several Turkish NLP tasks.

MukayeseLLM Leaderboard (New!)
MukayeseLLM Leaderboard is a leaderboard for ranking and evaluating the performance of various LLMs on Turkish tasks.

Mukayese NLP Benchmarks

Language Modeling
Language modeling is the task of assigning a probability to sentences in a language.
Machine Translation
Machine translation is the task of translating text from one language to another.
Named Entity Recognition
Named entity recognition is the task of identifying named entities in text.
Sentence Segmentation
Sentence segmentation is the task of dividing text into sentences.
Spellchecking and Correction
Spellchecking is the task of correcting spelling mistakes in text.
Summarization
Summarization is the task of summarizing a document or article into a shorter text.
Text Classification
Text classification is the task of assigning a set of predefined categories to open-ended text.

📄 Publication

With Mukayese, you can:
@inproceedings{safaya-etal-2022-mukayese,
    title = "Mukayese: {T}urkish {NLP} Strikes Back",
    author = "Safaya, Ali  and
      Kurtulu{\c{s}}, Emirhan  and
      Goktogan, Arda  and
      Yuret, Deniz",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-acl.69",
    doi = "10.18653/v1/2022.findings-acl.69",
    pages = "846--863",
}