🗓️ Development Blog
Publication stream
This page is the evidence archive behind the topic pages. Newest entries appear first, and each note keeps the same predictable structure: explanation, test scope, headline result, interpretation, and next update.
Archive
Listed in reverse chronological order.
30 May 2026
Embedding Search Meets Archive RAG
Follow-up RAG-style tests show that dense embeddings are useful for semantic paraphrase search, but archival retrieval needs lexical, structured, and hybrid search as well.
30 April 2026
Similarity and Semantic Search
Similarity and semantic search use embedding models to turn words, sentences, or passages into vectors so that related texts land close together in search. In archives, this matters when exact keyword matching is too brittle and users ne...
31 March 2026
Tone and sentiment analysis
Tone and sentiment analysis may help classify document style or attitude, but the current cross-language results need careful interpretation.
27 February 2026
Personally Identifiable Information
Presidio is the most practical default integration layer for PII detection, while MAPA remains valuable when anonymization and visual review matter most.
30 January 2026
Large Language Models for Entities?
LLM-based NER is useful as a fallback and experimentation path, but current results make it slower and usually less accurate than dedicated NER models.
31 December 2025
Named Entity Recognition
Dedicated NER models remain the strongest default for high-throughput archive indexing, especially when they are routed by language and text domain.
No posts match this topic yet.