ArchXAI explores how AI can help archives digitize, index, search, and safely review records across languages. This site shares what we are testing, what works well, and where human review still matters.

At a glance

The current public-facing view of ArchXAI's benchmarking and tool evaluation work.

What ArchXAI does

The project develops and tests AI-assisted tools that can help archives improve access to records and respond to information requests faster.

Who it is for

Archivists, archive users, researchers, and the wider public who benefit from better indexing, search, and multilingual access.

What is published here

Short benchmark notes, explanations of how we compare tools, and practical conclusions about what is ready for real workflows.

Why it stays current

AI tools move fast, so this site is updated continuously instead of waiting for a single static report at the end of the process.

Project resources

Follow the wider project, open models, and code outside this site.

Project partners

The ArchXAI consortium brings together applied research and national archival institutions from Finland, Estonia, and Latvia.

Recent benchmark notes

Short updates about what has been tested and what the current results suggest.

Browse blog
🔎 Embedding 1️⃣ Preliminary

Similarity and Semantic Search

Embedding benchmarks already show a usable multilingual default for semantic search, but Latvian remains a narrower case because its current evaluation is FLORES-only.

🗣 Tone 1️⃣ Preliminary

Tone and sentiment analysis

Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.

🔐 PII 1️⃣ Preliminary

Personally Identifiable Information

Presidio is the most practical default integration layer for PII detection, while MAPA remains valuable when anonymization and visual review matter most.