Schedule

Unit 1: What is Ancient Language Processing?

Date Topic Slides Assignment Required Reading Materials
15/04/2026 A Taste of ALP and Data Modeling 📊 Slides

Choose one ancient language dataset: familiarize yourself with the data model, consider how this data model fits with the datasets presented in class. Remodel your data accordingly. Do not forget collecting metadata as defined in the class examples.

Tutorials: How to define a function | List comprehensions | Load, parse and extract data from ORACC json files | RegEx101.com

22/04/2026 No Class (Independence Day)
29/04/2026 Doing Computational Philology: the (ideal) pipeline

Explore your dataset: upload the data into Voyant (or similar programs) or into an LLM and describe it based on distant reading techniques (i.e. quantification). Following this analysis decide where close reading would be necessary.

Rockwell & Sinclair (2016) | Sommerschield et al. (2023)

Datasets: Resources in Egyptian; General ANE resources; Online DANES resources | Tools: Voyant Tools; AntConc

Unit 2: Digitization and Annotation

Date Topic Slides Assignment Required Reading Materials
06/05/2026 Pipeline I: Optical / Handwritten Character Recognition

Gordin, Alper et al. (2024) | Lincke (2021)

Chauhan's blogpost (2022) | Tutorials: Playing with Neural Networks

13/05/2026 Pipeline II: Annotation (Lemmatization, PoS-Tagging, Treebanking)

Riemenschneider (2025) | Gordin et al. (2025)

Ong & Gordin (2024) | Sahala & Lincke (2024) | Farsi et al. (2025) | Jurafsky & Martin (2024)

Unit 3: Operationalization and the Vector Space

Date Topic Slides Assignment Required Reading Materials
20/05/2026 Pipeline III: Operationalization of words I - the vector space

Gavin (2020) | Jurfasky & Martin (2024)

27/05/2026 Pipeline IV: Operationalization of words II - Measuring distances

Process your data and measure distances: Data model using cuneiform data from ORACC with metadata can be found in the ALP 2024 course repository

Schweitzer (2013); Schweitzer (2023) - Egyptian | Gordin, Romach & and Yavasan (2025) - Hittite | Jurfasky & Martin (2024)

Tutorials: Coding the past: understand TF-IDF in python | Programming Historian: analyzing documents with TF-IDF | Karsdorp, Kestemont, & Riddell 2021

03/06/2026 Pipeline V: Operationalization of words III - Word vectors

Bennett & Sahala (2023)

BERT

Unit 4: Network Analysis and Visualization

Date Topic Slides Assignment Required Reading Materials
10/06/2026 Pipeline VI: Visualization I - Graph theory
17/06/2026 Pipeline VII: Visualization II - Network metrics
24/06/2026 Presentation of class projects and feedback
01/07/2026 Presentation of class projects and feedback