Named Entity Recognition (NER) is used to automatically identify key entities in text, especially people, organisations, and locations. In archival work, these are some of the most useful entry points for search, indexing, and access services.

The first dedicated-model benchmark in ArchXAI asked a practical question: which Hugging Face NER models are the strongest current candidates for multilingual archive indexing when labels are mapped to a shared PER / ORG / LOC scheme?

What we tested

This note focuses on dedicated transformer-based NER models. The benchmark covered multilingual, Estonian, Finnish, Latvian, and Russian model candidates that could realistically be integrated into AI-based cataloguing and indexing workflows.

The headline score is F1, which balances two practical errors: marking things that are not entities, and missing entities that should have been found. Higher is better.

Because the source models and datasets do not all use the same label inventory, the benchmark maps results to a shared PER / LOC / ORG scheme before scoring.

Headline results

Language Best dedicated model Dataset F1 (%)
🇪🇪 Estonian 51la5/roberta-large-NER et_modern 75.7%
🇫🇮 Finnish Kansallisarkisto/finbert-ner fi_old 75.2%
🇱🇻 Latvian 51la5/roberta-large-NER lv_diverse 84.1%
🪆 Russian pierre-tassel/rapido-ner-entity ru_modern 91.2%

What the output looks like

Example of named entity recognition output highlighting people, organisations, and locations across several languages

Evaluation setup

Evaluation models

Evaluation datasets

Early interpretation

The main engineering conclusion is still straightforward: dedicated NER models are the default choice for large-scale indexing. They are fast enough for bulk processing, and the best-performing model can be selected by language and collection type.

The language-level results also show that a single multilingual baseline is not always the only answer. In Estonian and Latvian, 51la5/roberta-large-NER performed especially well, while Finnish favored Kansallisarkisto/finbert-ner, and Russian favored pierre-tassel/rapido-ner-entity.

Operationally, transformer NER still has the clearest advantage for routine archive processing: in local runs, these models process text in milliseconds per sentence, while LLM-based extraction takes seconds per sentence and is therefore better suited to targeted enrichment or fallback use.

What to update next

The next useful update is to add more dataset-specific detail below the language summary, especially for legal and historical material where the archive use case is hardest. The strongest dedicated models should then be fine-tuned on project-specific archival data to test how much additional recall and robustness can be gained.