ArchXAI is a cross-border project exploring how AI can improve archive access, indexing, search, and related public services. This site is the public web version of the project's benchmarking and technology-comparison work.
Project partners
The ArchXAI consortium brings together applied research and national archival institutions from Finland, Estonia, and Latvia.
Project summary
The common challenge addressed by ArchXAI is improving timely access to archive collections while both archival volumes and public information requests continue to grow. The project objective is to improve public services and archival access through jointly developed AI-based solutions that make cataloguing, indexing, and information request handling faster and more usable across borders.
The project outputs described in the application are an open source AI HTR tool, an open source AI OCR tool, a tool for enhanced cataloguing and indexing, and an AI-assisted toolset for information requests. The beneficiaries are archivists, archive users, researchers, and the broader public.
Internally, the underlying material comes from the project's technology-comparison deliverable. For external readers, the purpose is simpler: we test tools, explain what they are good at, and publish useful conclusions as the evidence becomes solid enough to share.
The current publication emphasizes practical questions:
- Which model families are accurate enough for multilingual archival tasks?
- Which approaches are fast enough for large-scale indexing?
- Which tools are realistic to operate inside institutional archive environments?
- Which solutions are still strong enough only for triage and review support, not for autonomous decisions?
Latest news
Embedding Search Meets Archive RAG
Follow-up RAG-style tests show that dense embeddings are useful for semantic paraphrase search, but archival retrieval needs lexical, structured, and hybrid search as well.
Similarity and Semantic Search
Similarity and semantic search use embedding models to turn words, sentences, or passages into vectors so that related texts land close together in search. In archives, this mat...
Tone and sentiment analysis
Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.
More information
Follow the wider project, open models, and code outside this site.