MarkTechPost@AI 20小时前
Google DeepMind Introduces Aeneas: AI-Powered Contextualization and Restoration of Ancient Latin Inscriptions
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌DeepMind开发了Aeneas,一个基于Transformer的生成式神经网络,旨在解决古拉丁铭文研究中的诸多挑战。该工具能够修复损坏的铭文文本、进行年代和地理溯源,并通过检索相关铭文提供语境分析。Aeneas在处理残缺不全的铭文、庞大语料库以及地理和语言变异性等方面表现出色。它通过结合文本和图像信息,在文本修复、地理归属和年代测定方面显著提高了准确性和效率,为历史学家提供了强大的辅助研究工具,并已集成到教育和研究工作流程中。

🏛️ Aeneas是一款由谷歌DeepMind开发的AI工具,专注于古拉丁铭文研究。它能够修复铭文中缺失或损坏的文本段落,即使长度未知,也能提供多重修复方案。该模型利用Transformer架构,并结合了文本和可选的图像信息,提高了修复的准确性。

🌍 Aeneas在地理溯源方面表现出色,能够将铭文归类到62个罗马省份中的一个,准确率达到72%。在年代测定方面,它能够以约13年的平均误差估算铭文的年代,显著优于传统方法。这些功能为理解铭文的来源和背景提供了关键支持。

🔗 除了文本修复和溯源,Aeneas还能检索与目标铭文在语言、铭刻和文化上相关的“平行铭文”,帮助研究人员理解其历史语境。这种语境化功能使历史学家在研究中的信心平均提升44%,并大大缩短了查找相关资料的时间。

🎓 Aeneas不仅是一个研究工具,也被设计用于教育目的。它与包含超过17.6万条拉丁铭文的数据集“Latin Epigraphic Dataset (LED)”一同发布,并提供了一个教育课程,旨在弥合人工智能与古典研究之间的鸿沟,提升学生的数字素养。

The discipline of epigraphy, focused on studying texts inscribed on durable materials like stone and metal, provides critical firsthand evidence for understanding the Roman world. The field faces numerous challenges including fragmentary inscriptions, uncertain dating, diverse geographical provenance, widespread use of abbreviations, and a large and rapidly growing corpus of over 176,000 Latin inscriptions, with approximately 1,500 new inscriptions added annually.

To address these challenges, Google DeepMind developed Aeneas: a transformer-based generative neural network that performs restoration of damaged text segments, chronological dating, geographic attribution, and contextualization through retrieval of relevant epigraphic parallels.

Challenges in Latin Epigraphy

Latin inscriptions span more than two millennia, from roughly the 7th century BCE to the 8th century CE, across the vast Roman Empire comprising over sixty provinces. These inscriptions vary from imperial decrees and legal documents to tombstones and votive altars. Epigraphers traditionally restore partially lost or illegible texts using detailed knowledge of language, formulae, and cultural context, and attribute inscriptions to certain timeframes and locations by comparing linguistic and material evidence.

However, many inscriptions suffer from physical damage with missing segments of uncertain lengths. The wide geographic dispersion and diachronic linguistic changes make dating and provenance attribution complex, especially when combined with the sheer corpus size. Manual identification of epigraphic parallels is labor-intensive and often limited by specialized expertise localized to certain regions or periods.

Latin Epigraphic Dataset (LED)

Aeneas is trained on the Latin Epigraphic Dataset (LED), an integrated and harmonized corpus of 176,861 Latin inscriptions aggregating records from three major databases. The dataset includes approximately 16 million characters covering inscriptions spanning seven centuries BCE to eight centuries CE. About 5% of these inscriptions have associated grayscale images.

The dataset uses character-level transcriptions employing special placeholder tokens: - marks missing text of a known length while # denotes missing segments of unknown length. Metadata includes province-level provenance over 62 Roman provinces and dating by decade.

Model Architecture and Input Modalities

Aeneas’s core is a deep, narrow transformer decoder based on the T5 architecture, adapted with rotary positional embeddings for effective local and contextual character processing. The textual input is processed alongside optional inscription images (when available) through a shallow convolutional network (ResNet-8), which feeds image embeddings to the geographical attribution head only.

The model includes multiple specialized task heads to perform:

Additionally, the model generates a unified historically enriched embedding by combining outputs from the core and task heads. This embedding enables retrieval of ranked epigraphic parallels using cosine similarity, incorporating linguistic, epigraphic, and broader cultural analogies beyond exact textual matches.

Training Setup and Data Augmentation

Training occurs on TPU v5e hardware with batch sizes up to 1024 text-image pairs. Losses for each task are combined with optimized weighting. The data is augmented by random text masking (up to 75% characters), text clipping, word deletions, punctuation dropping, image augmentations (zoom, rotation, brightness/contrast adjustments), dropout, and label smoothing to improve generalization.

Prediction uses beam search with specialized non-sequential logic for unknown-length text restoration, ensuring multiple restoration candidates ranked by joint probability and length.

Performance and Evaluation

Evaluated on the LED test set and through a human-AI collaboration study with 23 epigraphers, Aeneas demonstrates marked improvements:

These improvements are statistically significant and highlight the model’s utility as an augmentation to expert scholarship.

Case Studies

Res Gestae Divi Augusti:
Aeneas’s analysis of this monumental inscription reveals bimodal dating distributions reflecting scholarly debates about its compositional layers and stages (late first century BCE and early first century CE). Saliency maps highlight date-sensitive linguistic forms, archaic orthography, institutional titles, and personal names, mirroring expert epigraphic knowledge. Parallels retrieved predominantly include imperial legal decrees and official senatorial texts sharing formulaic and ideological features.

Votive Altar from Mainz (CIL XIII, 6665):
Dedicated in 211 CE by a military official, this inscription was accurately dated and geographically attributed to Germania Superior and related provinces. Saliency maps identify key consular dating formulas and cultic references. Aeneas retrieved highly related parallels including a 197 CE altar sharing rare textual formulas and iconography, revealing historically meaningful connections beyond direct text overlap or spatial metadata.

Integration in Research Workflows and Education

Aeneas operates as a cooperative tool, not a replacement for historians. It accelerates searching for epigraphic parallels, aids restoration, and refines attribution, freeing scholars to focus on higher-level interpretation. The tool and dataset are openly available via the Predicting the Past platform under permissive licenses. An educational curriculum has been co-developed targeting high school students and educators, promoting interdisciplinary digital literacy by bridging AI and classical studies.


FAQ 1: What is Aeneas and what tasks does it perform?

Aeneas is a generative multimodal neural network developed by Google DeepMind for Latin epigraphy. It assists historians by restoring damaged or missing text in ancient Latin inscriptions, estimating their date within about 13 years, attributing their geographical origin with around 72% accuracy, and retrieving historically relevant parallel inscriptions for contextual analysis.

FAQ 2: How does Aeneas handle incomplete or damaged inscriptions?

Aeneas can predict missing text segments even when the length of the gap is unknown, a capability known as arbitrary-length restoration. It uses a transformer-based architecture and specialized neural network heads to generate multiple plausible restoration hypotheses, ranked by likelihood, facilitating expert evaluation and further research.

FAQ 3: How is Aeneas integrated into historian workflows?

Aeneas provides historians with ranked lists of epigraphic parallels and predictive hypotheses for restoration, dating, and provenance. These outputs boost historians’ confidence and accuracy, reduce research time by quickly suggesting relevant texts, and support collaborative human-AI analysis. The model and datasets are openly accessible via the Predicting the Past platform.


Check out the Paper, Project and Google DeepMind Blog. All credit for this research goes to the researchers of this project. SUBSCRIBE NOW to our AI Newsletter

The post Google DeepMind Introduces Aeneas: AI-Powered Contextualization and Restoration of Ancient Latin Inscriptions appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Aeneas 古拉丁铭文 人工智能 历史学 文本修复 谷歌DeepMind
相关文章