L2 dataset

Notable People - Cross-verified Dataset

This cross-verified dataset contains 2.2 million individuals, it can be used for research purposes. This dataset is linked to the following paper that should be cited directly instead of the data itself:

Download MD
PublisherSciences Po Paris
AccessFile download
LicenseCC-BY-SA
Updated2026-06-07
Views1
Free to useNo account needed

Topics

birth date, people, history, wikipedia

Links

Details

Access & cost

Pricing
free open access — N/A N/A / N/A

Legal & licence

Access Rights
Public
Legal Risk Notes
In this paper, we introduced a multi-language database of notable individuals with the use of 7 language editions of Wikipedia and Wikidata to assemble a list of 4,678,040 individuals. This significantly reduced the Anglo-Saxon bias, but not all. Two main drawbacks remain. First, we did not exploit the non-Western language editions to cross-verify information on individual characteristics. Second, we did not collect the number of words beyond these 7 language editions: they enter in the notability index, but this index cannot be considered as global, resulting in a Western-world bias in notability measures. This is however partly compensated by the use of the total number of hits for all Wikipedia editions and not only 7, in our aggregate notability measure.The accuracy of Wikipedia being not perfect7,8, our data is as good as the source data, but our approach adds new possibilities: to cross-check across different language editions and reduce errors when possible.
Conforms To
DCAT-3SourceCommons-SCF-0.1

Coverage

Spatial
world
Wikidata Main Topics
public figure (Q662729)notability (Q4993710)

Identifiers & provenance

Wikidata Id
Q662729
Prefill Status
Not checked
All metadata (2 more fields)

Evaluations

5.0
Quality rating1 evaluation