Technology landscape
Current and planned topics
This site publishes the parts of the ArchXAI comparison work that already have useful evidence behind them, while keeping the next planned topics visible.
Published
📇 Named Entity Recognition
Best current evidence for dedicated transformer and LLM entity extraction in Estonian, Finnish, Latvian, and Russian archival-adjacent text.
Published
🔐 PII Detection and Anonymization
Presidio and MAPA comparison with emphasis on legal-domain release-review support workflows.
Preliminary
🗣 Tone and Sentiment Analysis
Cross-language sentiment results are now published, with clear caveats around Estonian underperformance and a suspiciously strong Finnish score.
Published
🔎 Similarity and Semantic Search
Embedding benchmarks for multilingual similarity, paraphrase retrieval, cross-lingual search, and vector-based archive access experiments.
Planned
Image Classification
Reserved for visual archival classification tasks once the benchmark material is curated.
Preliminary
🤖 Large Language and Multimodal Models
Current LLM evidence appears in the secondary NER evaluation; broader multimodal archive assistance remains planned.
Publication principle
Each track should only move from planned to published when it has enough evidence to support a defensible engineering recommendation. This keeps the site credible and avoids turning the deliverable into a placeholder-heavy document with weak conclusions.