Comparison and benchmark of name-to-gender inference services

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:PeerJ. Computer science Ročník 4; s. e156
Hlavní autori: Santamaría, Lucía, Mihaljević, Helena
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States PeerJ, Inc 16.07.2018
PeerJ Inc
Predmet:
ISSN:2376-5992, 2376-5992
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2376-5992
2376-5992
DOI:10.7717/peerj-cs.156