Contents

Short Description

How do we turn an ancient text into data? How do we apply data science techniques to historical, cultural, and linguistic questions? What are the ramifications of such transformations when confronted with classical approaches to ancient texts? The Ancient Language Processing course will focus specifically on how to answer the above questions when working with ancient languages and scripts from the emergence of writing in Mesopotamia and Egypt, to the rest of the world up till 800 CE. This course will introduce students of ancient history, ancient Near Eastern languages, and computer science to the computational processing of ancient texts. They will engage with inscribed artefacts–from dataset pre-processing to computational analysis via text parsing, vector space models (VSMs), statistical approaches, and graph theory.

Course Objectives

Ancient languages contain rich human historical and cultural wealth. So far there has been good advancement in applying language technologies to ancient languages such as Sumerian, Akkadian, Latin, Ancient Greek, and Ancient Chinese, especially in the construction of digital language resources and resources to facilitate automatic analysis. For example, the Universal Dependencies (UD) project has made treebanks available for a series of ancient languages. The objective of this course is to computationally engage with ancient datasets of inscribed artefacts, mostly texts, from data exploration to publication of computational analysis. We will analyze classical studies and consider emerging research questions in the field of ancient Near Eastern studies, in order to address them computationally using ancient language processing.

Learning Outcomes

Students will discuss and contrast the shared epigraphical challenges in ancient language processing: such as Latin, non-Latin and non-alphabetic scripts, Right-to-Left, transliteration conventions and fragmentary texts, and in particular, the multilingual framework to represent the morphology, syntax, and semantics, as well as machine translation models. They will perform hands-on digital philology with novel methods, code, and techniques. The main outcome of the course will be a collaborative computational research paper on an ancient dataset.

Workload: Assignments and Active Participation

  1. We meet on a weekly basis for two hours (synchronic mode) which sums up to c. 30 hours over the course of the semester (equivalent to 1 credit point). See schedule for further information on the meetings.
  2. We will build a shared glossary of terms and concepts that are important in Ancient Language Processing. Students are going to work in groups of two on the terms assigned to them, providing a short definition and links to more extensive explanations. Writing will be done on the GitHub collaborative platform and will be accessible through the course homepage. (c. 30 hours = 1 credit point)
  3. Students will develop their own projects based on data provided by the instructors or by data that they contribute. The project plan requires (a) a humanistic research question, (b) operationalize it (i.e., develop a workflow to tackle the question computationally) and (c) write a term paper about the describing their data set, methods, analysis, interpretation and results. The structure of the paper will follow a template of a typical computational humanities paper. (c. 120 hours = 4 credit points)

Schedule

Materials and Texts

For more see the designated course Zotero library still under construction: https://www.zotero.org/groups/4809611/ancient_language_processing

Assessment