Sumerian and Akkadian were written using the cuneiform writing system. It is a logo-syllabic script, written mostly by impressing wedge-like shapes on clay tablets. Cuneiform was also written on other preserved media, such as stone monuments and metal objects.

Sumerian is the first recorded written language, spoken from around the fourth millennium BCE (perhaps before) to the early second millennium BCE. However, it remained a scholarly language studied, copied and taught as part of the cuneiform writing tradition that died out around the first two centuries of the common era. Sumerian is an isolate language that cannot be ascribed to any of the known language families.

Akkadian is the earliest attested Semitic language, belonging specifically to the East Semitic branch. It is attested in writing from c. 2,700 BCE until the end of cuneiform writing. It was a spoken language for most of this period, until it was gradually replaced by Aramaic in the first millennium BCE. During the second millennium BCE, Akkadian was the lingua franca of the ancient Near East, in use from Iran to Anatolia, from Syria to Egypt. As such, significant amounts of cuneiform tablets written in Akkadian and Sumerian were discovered outside of Mesopotamia during this time, within the context of cuneiform traditions of other languages.

The Machine Translation (MT) task of Akkadian and Sumerian documents is crucial to advance the understanding of the events described in the tablets as well as the scholarly field altogether. However, since they are low-resource languages, the existing corpora of texts consists of a relatively limited amount of data, and research on multilingual models offers only minor improvements on LM/MLM tasks, let alone MT tasks can hardly be addressed. For that matter, we wish to utilize a platform which can facilitate reading and interpretation of Akkadian and Sumerian texts by leveraging the power of high-resource languages like English, and to some extent Hebrew (that is not considered high resource, but has significantly bigger datasets compared to Akkadian and keeps on growing).

EvaCun 2023 consists of three machine translation tasks – Akkadian (in cuneiform) to English, Akkadian (transcription) to English and Sumerian (transcription) to English, based on the corpora of royal, administrative, and financial texts we provide. Shared data and several scorers are provided to the participants. The organizers rely on the honesty of all participants who might have some prior knowledge of part of the data that will be used for evaluation. Unfairly use of such knowledge is not permitted in the shared task.

The shared task is designed with the aim of answering three main questions:

  • How can we make machine translation techniques show the best performance on Akkadian in both the cuneiform and the transcription modalities?
  • How can we promote the development of resources and language technologies for Akkadian and Sumerian?
  • How can we foster collaboration among scholars working on Akkadian and Sumerian and attract researchers from different disciplines?

EvaCun 2023 is co-organized with ALT 2023: “Ancient Language Translation Workshop”, Macau SAR, China on Sep 4, 2023. As a co-located event with MT-SUMMIT2023, this workshop will provide an opportunity to learn about the challenges and latest developments in the field of machine translation for ancient languages. EvaCun 2023 is organized by the Digital Pasts lab at Ariel University and the TAD Center for AI and Data Science at Tel-Aviv University, Israel.

EvaCun 2023 is co-organized with ALT 2023: “Ancient Language Translation Workshop”, Macau SAR, China on Sep 4, 2023. As a co-located event with MT-SUMMIT2023, this workshop will provide an opportunity to learn about the challenges and latest developments in the field of machine translation for ancient languages.

EvaCun 2023 is organized by the Archaeological department in Ariel University and the TAD Center for AI and Data Science at Tel-Aviv University, Israel.