🗓️ Development Blog

Publication stream

This page is the evidence archive behind the topic pages. Newest entries appear first, and each note keeps the same predictable structure: explanation, test scope, headline result, interpretation, and next update.

Browse by topic How we test tools Learn the metrics

Archive

Listed in reverse chronological order.

🗂 All posts 📇 NER 🔐 PII 🗣 Tone and sentiment 🔎 Embedding 🤖 LLM

🔎 Embedding 2️⃣ Secondary

30 May 2026

Embedding Search Meets Archive RAG

Follow-up RAG-style tests show that dense embeddings are useful for semantic paraphrase search, but archival retrieval needs lexical, structured, and hybrid search as well.

🔎 Embedding 1️⃣ Preliminary

30 April 2026

Similarity and Semantic Search

Similarity and semantic search use embedding models to turn words, sentences, or passages into vectors so that related texts land close together in search. In archives, this matters when exact keyword matching is too brittle and users ne...

🗣 Tone 1️⃣ Preliminary

31 March 2026

Tone and sentiment analysis

Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.

🔐 PII 1️⃣ Preliminary

27 February 2026

Personally Identifiable Information

Presidio is the most practical default integration layer for PII detection, while MAPA remains valuable when anonymization and visual review matter most.

📇 NER 🤖 LLM 2️⃣ Secondary

30 January 2026

Large Language Models for Entities?

LLM-based NER is useful as a fallback and experimentation path, but current results make it slower and usually less accurate than dedicated NER models.

📇 NER 1️⃣ Preliminary

31 December 2025

Named Entity Recognition

Dedicated NER models remain the strongest default for high-throughput archive indexing, especially when they are routed by language and text domain.