Publication stream
Benchmark blog
This page collects the current ArchXAI benchmark notes in a predictable format. Newest entries appear first, and each note starts with a short explanation before moving to the test scope, result table, interpretation, and next update.
All posts
Listed in reverse chronological order.
20 April 2026
Similarity and Semantic Search
Embedding benchmarks already show a usable multilingual default for semantic search, but Latvian remains a narrower case because its current evaluation is FLORES-only.
31 March 2026
Tone and sentiment analysis
Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.
27 February 2026
Personally Identifiable Information
Presidio is the most practical default integration layer for PII detection, while MAPA remains valuable when anonymization and visual review matter most.
30 January 2026
Large Language Models for Entities?
LLM-based NER is useful as a fallback and experimentation path, but current results make it slower and usually less accurate than dedicated NER models.
31 December 2025
Named Entity Recognition
Dedicated NER models remain the strongest default for high-throughput archive indexing, especially when they are routed by language and text domain.
No posts match this topic yet.