ArchXAI Technology Updates | ArchXAI Technology Updates

ArchXAI is a cross-border project exploring how AI can improve archive access, indexing, search, and related public services. This site is the public web version of the project's benchmarking and technology-comparison work.

Explore topics Read blog

Project partners

The ArchXAI consortium brings together applied research and national archival institutions from Finland, Estonia, and Latvia.

Project summary

The common challenge addressed by ArchXAI is improving timely access to archive collections while both archival volumes and public information requests continue to grow. The project objective is to improve public services and archival access through jointly developed AI-based solutions that make cataloguing, indexing, and information request handling faster and more usable across borders.

The project outputs described in the application are an open source AI HTR tool, an open source AI OCR tool, a tool for enhanced cataloguing and indexing, and an AI-assisted toolset for information requests. The beneficiaries are archivists, archive users, researchers, and the broader public.

Internally, the underlying material comes from the project's technology-comparison deliverable. For external readers, the purpose is simpler: we test tools, explain what they are good at, and publish useful conclusions as the evidence becomes solid enough to share.

The current publication emphasizes practical questions:

Which model families are accurate enough for multilingual archival tasks?
Which approaches are fast enough for large-scale indexing?
Which tools are realistic to operate inside institutional archive environments?
Which solutions are still strong enough only for triage and review support, not for autonomous decisions?

Latest news

More...

🔎 Embedding 2️⃣ Secondary

Embedding Search Meets Archive RAG

Follow-up RAG-style tests show that dense embeddings are useful for semantic paraphrase search, but archival retrieval needs lexical, structured, and hybrid search as well.

30 May 2026

🔎 Embedding 1️⃣ Preliminary

Similarity and Semantic Search

Similarity and semantic search use embedding models to turn words, sentences, or passages into vectors so that related texts land close together in search. In archives, this mat...

30 April 2026

🗣 Tone 1️⃣ Preliminary

Tone and sentiment analysis

Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.

31 March 2026

More information

Follow the wider project, open models, and code outside this site.