ArchXAI explores how AI can help archives digitize, index, search, and safely review records across languages. This site shares what we are testing, what works well, and where human review still matters.
At a glance
The current public-facing view of ArchXAI's benchmarking and tool evaluation work.
What ArchXAI does
The project develops and tests AI-assisted tools that can help archives improve access to records and respond to information requests faster.
Who it is for
Archivists, archive users, researchers, and the wider public who benefit from better indexing, search, and multilingual access.
What is published here
Short benchmark notes, explanations of how we compare tools, and practical conclusions about what is ready for real workflows.
Why it stays current
AI tools move fast, so this site is updated continuously instead of waiting for a single static report at the end of the process.
Project resources
Follow the wider project, open models, and code outside this site.
Project partners
The ArchXAI consortium brings together applied research and national archival institutions from Finland, Estonia, and Latvia.
Recent benchmark notes
Short updates about what has been tested and what the current results suggest.
Similarity and Semantic Search
Embedding benchmarks already show a usable multilingual default for semantic search, but Latvian remains a narrower case because its current evaluation is FLORES-only.
Tone and sentiment analysis
Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.
Personally Identifiable Information
Presidio is the most practical default integration layer for PII detection, while MAPA remains valuable when anonymization and visual review matter most.