A note on using the F-measure for evaluating record linkage algorithms

Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a no...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Statistics and computing Ročník 28; číslo 3; s. 539 - 547
Hlavní autori: Hand, David, Christen, Peter
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.05.2018
Springer Nature B.V
Predmet:
ISSN:0960-3174, 1573-1375
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw.
AbstractList Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw.
Author Hand, David
Christen, Peter
Author_xml – sequence: 1
  givenname: David
  surname: Hand
  fullname: Hand, David
  organization: Imperial College, Winton Group Limited
– sequence: 2
  givenname: Peter
  orcidid: 0000-0003-3435-2015
  surname: Christen
  fullname: Christen, Peter
  email: peter.christen@anu.edu.au
  organization: The Australian National University
BookMark eNp9kE1LxDAQhoOs4O7qD_AW8BzNR5u0x2VxVVjwoueQtpNu126yJqngv7dLBUHQ0xxmnnlnngWaOe8AoWtGbxml6i4yxjknlClSqkwSeYbmLFeCMKHyGZrTUlIimMou0CLGPaWMSZHN0WaFnU-AvcND7FyL0w7whhzAxCEAtj5g-DD9YNKpGaD2ocF9595MC9j0rQ9d2h3iJTq3po9w9V2X6HVz_7J-JNvnh6f1aktqwWQiUFZFJcuqKZRSlbEcijw3VWOpLcBkhWW5qLmE3IC0tOJ1XjDRZBk1vOKqNmKJbqa9x-DfB4hJ7_0Q3BipOaVU0ExwPk6xaaoOPsYAVh9DdzDhUzOqT7r0pEuPuvRJl5Yjo34xdZfGr71LwXT9vySfyDimuBbCz01_Q1_Yo4CT
CitedBy_id crossref_primary_10_1109_ACCESS_2022_3165568
crossref_primary_10_2196_34067
crossref_primary_10_3390_app121910134
crossref_primary_10_1016_j_bspc_2024_106056
crossref_primary_10_1109_TIFS_2024_3421292
crossref_primary_10_3390_ijgi13030103
crossref_primary_10_1111_rssa_12477
crossref_primary_10_1007_s10489_025_06388_3
crossref_primary_10_1109_ACCESS_2022_3149914
crossref_primary_10_1145_3377878
crossref_primary_10_1016_j_bspc_2023_105118
crossref_primary_10_1186_s12874_017_0370_0
crossref_primary_10_3758_s13428_022_02040_x
crossref_primary_10_32604_jcs_2023_046915
crossref_primary_10_1016_j_ins_2024_120882
crossref_primary_10_1016_j_ins_2019_02_030
crossref_primary_10_1007_s00366_020_01078_9
crossref_primary_10_1007_s13595_021_01070_3
crossref_primary_10_1016_j_cmpb_2020_105351
crossref_primary_10_1016_j_knosys_2022_108288
crossref_primary_10_3389_fgene_2020_00207
crossref_primary_10_1007_s10994_021_05964_1
crossref_primary_10_3390_rs15245785
crossref_primary_10_1080_00207543_2021_1951868
crossref_primary_10_1109_ACCESS_2021_3134754
crossref_primary_10_3390_stats8030070
crossref_primary_10_1016_j_apgeog_2021_102532
crossref_primary_10_1080_17517575_2020_1790043
crossref_primary_10_1002_bimj_202200209
crossref_primary_10_1007_s11042_021_11031_7
crossref_primary_10_1007_s10844_024_00853_0
crossref_primary_10_1080_01615440_2019_1571466
crossref_primary_10_1109_ACCESS_2022_3198706
crossref_primary_10_3390_su142013627
crossref_primary_10_1016_j_chemosphere_2023_140191
crossref_primary_10_3390_agriculture12091467
crossref_primary_10_1016_j_ncl_2024_03_001
crossref_primary_10_1080_01431161_2025_2454042
crossref_primary_10_1007_s11222_025_10701_y
crossref_primary_10_1016_j_jbi_2022_104094
crossref_primary_10_1145_3352591
crossref_primary_10_3390_aerospace10030233
crossref_primary_10_1109_ACCESS_2020_2974292
crossref_primary_10_1038_s41598_021_87834_3
crossref_primary_10_1109_JIOT_2023_3282968
crossref_primary_10_1109_TNSM_2022_3177512
crossref_primary_10_3390_app12031550
crossref_primary_10_1016_j_is_2019_03_006
crossref_primary_10_1016_j_is_2023_102307
crossref_primary_10_2478_fman_2024_0012
crossref_primary_10_3390_app112210546
crossref_primary_10_1007_s41060_024_00657_z
crossref_primary_10_1080_00031305_2023_2191664
crossref_primary_10_1145_3721985
crossref_primary_10_1016_j_cageo_2022_105245
crossref_primary_10_7717_peerj_cs_2729
crossref_primary_10_1016_j_ijcip_2020_100357
crossref_primary_10_3390_app11083509
crossref_primary_10_1109_ACCESS_2025_3580958
crossref_primary_10_1007_s41050_021_00030_0
crossref_primary_10_1109_ACCESS_2020_3024558
crossref_primary_10_1080_00207543_2019_1694719
crossref_primary_10_1016_j_neucom_2023_126891
crossref_primary_10_3390_info15100584
crossref_primary_10_1002_aisy_202000276
crossref_primary_10_1007_s10115_018_1246_2
crossref_primary_10_3389_fenrg_2023_1287413
crossref_primary_10_1007_s11079_024_09779_0
crossref_primary_10_1080_13658816_2023_2273877
crossref_primary_10_1145_3533016
crossref_primary_10_1007_s10115_019_01370_1
crossref_primary_10_3390_s21062176
crossref_primary_10_1016_j_cmi_2021_02_028
crossref_primary_10_1016_j_datak_2020_101809
crossref_primary_10_1049_rpg2_12846
crossref_primary_10_3390_rs13132581
crossref_primary_10_1109_JIOT_2025_3540402
crossref_primary_10_1016_j_is_2024_102410
crossref_primary_10_1007_s13218_022_00763_9
crossref_primary_10_1016_j_jag_2024_104015
crossref_primary_10_1016_j_clindermatol_2023_12_021
crossref_primary_10_1145_3591356
crossref_primary_10_1121_10_0007063
crossref_primary_10_1145_3606367
crossref_primary_10_1080_01615440_2020_1707445
crossref_primary_10_1093_jamia_ocae248
crossref_primary_10_1016_j_infsof_2021_106664
crossref_primary_10_1007_s44174_023_00150_4
crossref_primary_10_2196_34834
crossref_primary_10_25300_misq_2025_18178
crossref_primary_10_3390_rs15030628
crossref_primary_10_1093_bib_bbac254
crossref_primary_10_1007_s11042_024_18519_y
crossref_primary_10_3390_e23081091
crossref_primary_10_3390_rs13040777
crossref_primary_10_1016_j_cageo_2021_104703
crossref_primary_10_1007_s10994_021_06012_8
crossref_primary_10_1142_S0218001424590110
crossref_primary_10_1016_j_aap_2021_106090
crossref_primary_10_1002_cpe_7418
crossref_primary_10_1016_j_is_2021_101959
crossref_primary_10_3390_s22145434
crossref_primary_10_1016_j_neucom_2020_01_036
crossref_primary_10_3390_rs15215218
crossref_primary_10_1136_bmjopen_2021_053349
crossref_primary_10_1016_j_watres_2023_120503
crossref_primary_10_1080_07038992_2022_2072277
crossref_primary_10_1109_ACCESS_2021_3057578
crossref_primary_10_1002_cpt_2266
crossref_primary_10_3390_healthcare12040439
crossref_primary_10_3389_fpubh_2021_770111
crossref_primary_10_3390_e20060471
crossref_primary_10_1016_j_jhydrol_2020_125682
crossref_primary_10_3390_s21124153
crossref_primary_10_1109_ACCESS_2021_3116128
crossref_primary_10_1128_msystems_00518_21
crossref_primary_10_1093_bib_bbad364
crossref_primary_10_3390_jmse12050709
crossref_primary_10_3390_math12081173
crossref_primary_10_1109_TGRS_2024_3446950
crossref_primary_10_7717_peerj_cs_465
crossref_primary_10_1177_00405175221128619
crossref_primary_10_1186_s12864_019_6413_7
crossref_primary_10_1007_s00500_023_08279_6
crossref_primary_10_3390_app10124378
crossref_primary_10_1016_j_swevo_2019_03_007
Cites_doi 10.1016/j.is.2003.12.003
10.1017/CBO9780511809071
10.2307/2982975
10.14778/2367502.2367564
10.1016/j.is.2012.11.005
10.1111/j.1751-5823.2012.00183.x
10.1080/01621459.1969.10501049
10.1080/01621459.1989.10478785
10.1080/01621459.2012.726889
10.1145/1656274.1656282
10.1109/TKDE.2011.127
10.1023/A:1025666923033
10.1080/01621459.2012.757231
10.1007/978-3-540-44918-8_6
10.1002/9781119072454
10.3366/hac.2002.14.1-2.61
10.1214/14-AOAS779
10.1080/01621459.1995.10476563
10.1198/016214501750332956
10.1002/sim.6586
10.2200/S00262ED1V01Y201003DTM003
10.1002/sim.3859
10.1145/347090.347123
10.1109/ICDM.2015.63
10.1007/s10994-009-5119-5
ContentType Journal Article
Copyright Springer Science+Business Media New York 2017
Copyright Springer Science & Business Media 2018
Copyright_xml – notice: Springer Science+Business Media New York 2017
– notice: Copyright Springer Science & Business Media 2018
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11222-017-9746-6
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList
ProQuest Computer Science Collection
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
Computer Science
EISSN 1573-1375
EndPage 547
ExternalDocumentID 10_1007_s11222_017_9746_6
GrantInformation_xml – fundername: Engineering and Physical Sciences Research Council
  grantid: EP/K032208/1
  funderid: http://dx.doi.org/10.13039/501100000266
GroupedDBID -52
-5D
-5G
-BR
-EM
-Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29Q
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BAPOH
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9R
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SDD
SDH
SDM
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TN5
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7U
Z7W
Z7X
Z7Y
Z81
Z83
Z87
Z88
Z8O
Z8R
Z8U
Z8W
Z91
Z92
ZMTXR
ZWQNP
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
JQ2
ID FETCH-LOGICAL-c316t-e9b8b69bd8777baf2e855abdf0f8ea48f153c26e5ae6f0b2c5813d440a2b27ca3
IEDL.DBID RSV
ISICitedReferencesCount 178
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000424686200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0960-3174
IngestDate Sun Nov 09 07:45:37 EST 2025
Sat Nov 29 03:32:42 EST 2025
Tue Nov 18 21:53:29 EST 2025
Fri Feb 21 02:34:27 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Recall
Precision
Class imbalance
Entity resolution
Classification
Data linkage
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-e9b8b69bd8777baf2e855abdf0f8ea48f153c26e5ae6f0b2c5813d440a2b27ca3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-3435-2015
PQID 2000304322
PQPubID 2043829
PageCount 9
ParticipantIDs proquest_journals_2000304322
crossref_primary_10_1007_s11222_017_9746_6
crossref_citationtrail_10_1007_s11222_017_9746_6
springer_journals_10_1007_s11222_017_9746_6
PublicationCentury 2000
PublicationDate 2018-05-01
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-05-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationTitle Statistics and computing
PublicationTitleAbbrev Stat Comput
PublicationYear 2018
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Fellegi, Sunter (CR10) 1969; 64
Newcombe (CR26) 1988
Jaro (CR20) 1989; 84
CR15
Christen, Goiser, Guillet, Hamilton (CR6) 2007
CR13
Harron, Goldstein, Dibben (CR18) 2015
Winkler (CR32) 2004; 29
CR33
Larsen, Rubin (CR21) 2001; 96
Manning, Raghavan, Schütze (CR22) 2008
Reid, Davies, Garrett (CR27) 2002; 14
van Rijsbergen (CR30) 1979
Christen (CR4) 2012; 24
Hand (CR16) 2010; 29
Herzog, Scheuren, Winkler (CR19) 2007
Vatsalan, Christen, Verykios (CR31) 2013; 38
Hand (CR14) 1997
Gutman, Afendulis, Zaslavsky (CR12) 2013; 108
Copas, Hilton (CR8) 1990; 153
CR5
Hand (CR17) 2012; 80
CR7
Christen (CR2) 2009; 11
Getoor, Machanavajjhala (CR11) 2012; 5
CR25
Sadinle, Fienberg (CR29) 2013; 108
CR23
Belin, Rubin (CR1) 1995; 90
Domingo-Ferrer, Torra (CR9) 2003; 13
Sadinle (CR28) 2014; 8
Murray (CR24) 2016; 7
Christen (CR3) 2012
P Christen (9746_CR3) 2012
P Christen (9746_CR6) 2007
HB Newcombe (9746_CR26) 1988
MD Larsen (9746_CR21) 2001; 96
9746_CR23
M Sadinle (9746_CR29) 2013; 108
IP Fellegi (9746_CR10) 1969; 64
9746_CR25
DJ Hand (9746_CR14) 1997
DJ Hand (9746_CR16) 2010; 29
P Christen (9746_CR4) 2012; 24
T Herzog (9746_CR19) 2007
WE Winkler (9746_CR32) 2004; 29
K Harron (9746_CR18) 2015
DJ Hand (9746_CR17) 2012; 80
TR Belin (9746_CR1) 1995; 90
JS Murray (9746_CR24) 2016; 7
CD Manning (9746_CR22) 2008
M Sadinle (9746_CR28) 2014; 8
L Getoor (9746_CR11) 2012; 5
9746_CR33
9746_CR13
D Vatsalan (9746_CR31) 2013; 38
9746_CR7
9746_CR15
J Copas (9746_CR8) 1990; 153
MA Jaro (9746_CR20) 1989; 84
C Rijsbergen van (9746_CR30) 1979
9746_CR5
A Reid (9746_CR27) 2002; 14
P Christen (9746_CR2) 2009; 11
J Domingo-Ferrer (9746_CR9) 2003; 13
R Gutman (9746_CR12) 2013; 108
References_xml – year: 1997
  ident: CR14
  publication-title: Construction and Assessment of Classification Rules
– volume: 29
  start-page: 531
  issue: 7
  year: 2004
  end-page: 550
  ident: CR32
  article-title: Methods for evaluating and creating data quality
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2003.12.003
– year: 1979
  ident: CR30
  publication-title: Information Retrieval
– year: 2008
  ident: CR22
  publication-title: Introduction to Information Retrieval
  doi: 10.1017/CBO9780511809071
– volume: 7
  start-page: 2
  issue: 1
  year: 2016
  ident: CR24
  article-title: Probabilistic record linkage and deduplication after indexing, blocking, and filtering
  publication-title: J. Priv. Confid.
– ident: CR33
– volume: 153
  start-page: 287
  issue: 3
  year: 1990
  end-page: 320
  ident: CR8
  article-title: Record linkage: statistical models for matching computer records
  publication-title: J. R. Stat. Soc. Ser. A (Stat. Soc.)
  doi: 10.2307/2982975
– volume: 29
  start-page: 1502
  issue: 14
  year: 2010
  end-page: 1510
  ident: CR16
  article-title: Evaluating diagnostic tests: the area under the ROC curve and the balance of errors
  publication-title: Stat. Med.
– year: 1988
  ident: CR26
  publication-title: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business
– ident: CR25
– year: 2012
  ident: CR3
  publication-title: Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications
– ident: CR23
– volume: 5
  start-page: 2018
  issue: 12
  year: 2012
  end-page: 2019
  ident: CR11
  article-title: Entity resolution: theory, practice and open challenges
  publication-title: VLDB Endow.
  doi: 10.14778/2367502.2367564
– volume: 38
  start-page: 946
  issue: 6
  year: 2013
  end-page: 969
  ident: CR31
  article-title: A taxonomy of privacy-preserving record linkage techniques
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2012.11.005
– volume: 80
  start-page: 400
  issue: 3
  year: 2012
  end-page: 414
  ident: CR17
  article-title: Assessing the performance of classification methods
  publication-title: Int. Stat. Rev.
  doi: 10.1111/j.1751-5823.2012.00183.x
– volume: 64
  start-page: 1183
  issue: 328
  year: 1969
  end-page: 1210
  ident: CR10
  article-title: A theory for record linkage
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1969.10501049
– volume: 84
  start-page: 414
  issue: 406
  year: 1989
  end-page: 420
  ident: CR20
  article-title: Advances in record-linkage methodology a applied to matching the 1985 Census of Tampa, Florida
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1989.10478785
– volume: 108
  start-page: 34
  issue: 501
  year: 2013
  end-page: 47
  ident: CR12
  article-title: A Bayesian procedure for file linking to analyze end-of-life medical costs
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.2012.726889
– volume: 11
  start-page: 39
  issue: 1
  year: 2009
  end-page: 48
  ident: CR2
  article-title: Development and user experiences of an open source data cleaning, deduplication and record linkage system
  publication-title: SIGKDD Explor.
  doi: 10.1145/1656274.1656282
– volume: 24
  start-page: 1537
  issue: 9
  year: 2012
  end-page: 1555
  ident: CR4
  article-title: A survey of indexing techniques for scalable record linkage and deduplication
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2011.127
– ident: CR15
– volume: 13
  start-page: 343
  issue: 4
  year: 2003
  end-page: 354
  ident: CR9
  article-title: Disclosure risk assessment in statistical microdata protection via advanced record linkage
  publication-title: Stat. Comput.
  doi: 10.1023/A:1025666923033
– ident: CR13
– volume: 108
  start-page: 385
  issue: 502
  year: 2013
  end-page: 397
  ident: CR29
  article-title: A generalized Fellegi–Sunter framework for multiple record linkage with application to homicide record systems
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.2012.757231
– start-page: 127
  year: 2007
  end-page: 151
  ident: CR6
  article-title: Quality and complexity measures for data linkage and deduplication
  publication-title: Quality Measures in Data Mining, Studies in Computational Intelligence
  doi: 10.1007/978-3-540-44918-8_6
– ident: CR5
– ident: CR7
– year: 2015
  ident: CR18
  publication-title: Methodological Developments in Data Linkage
  doi: 10.1002/9781119072454
– volume: 14
  start-page: 61
  issue: 1–2
  year: 2002
  end-page: 86
  ident: CR27
  article-title: Nineteenth-century Scottish demography from linked censuses and civil registers
  publication-title: Hist. Comput.
  doi: 10.3366/hac.2002.14.1-2.61
– volume: 8
  start-page: 2404
  issue: 4
  year: 2014
  end-page: 2434
  ident: CR28
  article-title: Detecting duplicates in a homicide registry using a Bayesian partitioning approach
  publication-title: Ann. Appl. Stat.
  doi: 10.1214/14-AOAS779
– volume: 90
  start-page: 694
  issue: 430
  year: 1995
  end-page: 707
  ident: CR1
  article-title: A method for calibrating false-match rates in record linkage
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1995.10476563
– year: 2007
  ident: CR19
  publication-title: Data Quality and Record Linkage Techniques
– volume: 96
  start-page: 32
  issue: 453
  year: 2001
  end-page: 41
  ident: CR21
  article-title: Iterative automated record linkage using mixture models
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1198/016214501750332956
– volume-title: Introduction to Information Retrieval
  year: 2008
  ident: 9746_CR22
  doi: 10.1017/CBO9780511809071
– volume: 108
  start-page: 385
  issue: 502
  year: 2013
  ident: 9746_CR29
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.2012.757231
– ident: 9746_CR5
– volume: 84
  start-page: 414
  issue: 406
  year: 1989
  ident: 9746_CR20
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1989.10478785
– volume: 29
  start-page: 531
  issue: 7
  year: 2004
  ident: 9746_CR32
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2003.12.003
– volume: 14
  start-page: 61
  issue: 1–2
  year: 2002
  ident: 9746_CR27
  publication-title: Hist. Comput.
  doi: 10.3366/hac.2002.14.1-2.61
– start-page: 127
  volume-title: Quality Measures in Data Mining, Studies in Computational Intelligence
  year: 2007
  ident: 9746_CR6
  doi: 10.1007/978-3-540-44918-8_6
– volume: 38
  start-page: 946
  issue: 6
  year: 2013
  ident: 9746_CR31
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2012.11.005
– volume-title: Data Quality and Record Linkage Techniques
  year: 2007
  ident: 9746_CR19
– ident: 9746_CR13
  doi: 10.1002/sim.6586
– ident: 9746_CR25
  doi: 10.2200/S00262ED1V01Y201003DTM003
– volume: 80
  start-page: 400
  issue: 3
  year: 2012
  ident: 9746_CR17
  publication-title: Int. Stat. Rev.
  doi: 10.1111/j.1751-5823.2012.00183.x
– ident: 9746_CR33
– volume: 29
  start-page: 1502
  issue: 14
  year: 2010
  ident: 9746_CR16
  publication-title: Stat. Med.
  doi: 10.1002/sim.3859
– volume: 11
  start-page: 39
  issue: 1
  year: 2009
  ident: 9746_CR2
  publication-title: SIGKDD Explor.
  doi: 10.1145/1656274.1656282
– volume: 24
  start-page: 1537
  issue: 9
  year: 2012
  ident: 9746_CR4
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2011.127
– volume: 153
  start-page: 287
  issue: 3
  year: 1990
  ident: 9746_CR8
  publication-title: J. R. Stat. Soc. Ser. A (Stat. Soc.)
  doi: 10.2307/2982975
– volume-title: Construction and Assessment of Classification Rules
  year: 1997
  ident: 9746_CR14
– volume-title: Information Retrieval
  year: 1979
  ident: 9746_CR30
– ident: 9746_CR23
  doi: 10.1145/347090.347123
– volume: 7
  start-page: 2
  issue: 1
  year: 2016
  ident: 9746_CR24
  publication-title: J. Priv. Confid.
– volume: 90
  start-page: 694
  issue: 430
  year: 1995
  ident: 9746_CR1
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1995.10476563
– volume: 13
  start-page: 343
  issue: 4
  year: 2003
  ident: 9746_CR9
  publication-title: Stat. Comput.
  doi: 10.1023/A:1025666923033
– volume-title: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business
  year: 1988
  ident: 9746_CR26
– ident: 9746_CR7
  doi: 10.1109/ICDM.2015.63
– volume: 5
  start-page: 2018
  issue: 12
  year: 2012
  ident: 9746_CR11
  publication-title: VLDB Endow.
  doi: 10.14778/2367502.2367564
– volume: 64
  start-page: 1183
  issue: 328
  year: 1969
  ident: 9746_CR10
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1969.10501049
– volume-title: Methodological Developments in Data Linkage
  year: 2015
  ident: 9746_CR18
  doi: 10.1002/9781119072454
– volume-title: Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications
  year: 2012
  ident: 9746_CR3
– volume: 108
  start-page: 34
  issue: 501
  year: 2013
  ident: 9746_CR12
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.2012.726889
– volume: 8
  start-page: 2404
  issue: 4
  year: 2014
  ident: 9746_CR28
  publication-title: Ann. Appl. Stat.
  doi: 10.1214/14-AOAS779
– volume: 96
  start-page: 32
  issue: 453
  year: 2001
  ident: 9746_CR21
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1198/016214501750332956
– ident: 9746_CR15
  doi: 10.1007/s10994-009-5119-5
SSID ssj0011634
Score 2.5831292
Snippet Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 539
SubjectTerms Artificial Intelligence
Classification
Ground truth
Information retrieval
Machine learning
Mathematics and Statistics
Probability and Statistics in Computer Science
Quality assessment
Recall
Statistical Theory and Methods
Statistics
Statistics and Computing/Statistics Programs
Title A note on using the F-measure for evaluating record linkage algorithms
URI https://link.springer.com/article/10.1007/s11222-017-9746-6
https://www.proquest.com/docview/2000304322
Volume 28
WOSCitedRecordID wos000424686200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB60eqgHq1WxWiUHT0pgu49s9ljExYMW8VF6W5JsUoV2K-3q73eyjxZFBT3nQZgk33xkMt8AnClPaBVyh_rGpNQPDOKgYAE1zGPGk55jiry14U04GPDRKLqr8rgX9W_3OiRZIPUq2a2HvoxaVEUOzChbhw30dtzexvuH4TJ0gASj0IxCao4AE_p1KPO7KT47oxXD_BIULXxN3PrXKndgu6KWpF-ehV1Y01kbWnXZBlLd4jZs3S6lWhdtaFq6Wao170HcJ9ks12SWEfshfkywI4nptHxIJEhwSS0Pjo3lAw-xy0RUImIyns1f8ufpYh-e4qvHy2taVVqgyuuxnOpIcskimVp1QCmMq3kQCJkax3AtfG4QF5XLdCA0M450VcB7Xur7jnClGyrhHUAjm2X6EIjkPPV5FDraTX0b04kUwmmgXS4cN-0FHXBqkyeqkiG31TAmyUpA2ZowQRMm1oQJ68D5cshrqcHxW-duvY9JdR0XttamDQEjeHXgot63VfOPkx39qfcxNJFO8fI7ZBca-fxNn8CmesddnJ8Wp_QDYKffwg
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZSwMxEB68QH3wqIr1zINPSmC7m81mH0UsFWsRL_q2JNmkCu1W2urvd7JHi6KCPucgTJJvPjKTbwBOdCCNjoRHmbUpZaFFHJQ8pJYH3AYq8Gz-b-2pHXU6otuNb8t_3OMq270KSeZIPfvs1kBfRh2qIgfmlM_DIkOH5fL47u6fpqEDJBi5ZhRScwSYiFWhzO-m-OyMZgzzS1A09zXN9X-tcgPWSmpJzouzsAlzJqvBelW2gZS3uAarN1Op1nENVhzdLNSat6B5TrLhxJBhRlxCfI9gR9Kkg-IhkSDBJZU8ODYWDzzELRNRich-bzh6mTwPxtvw2Lx8uGjRstIC1UGDT6iJlVA8VqlTB1TS-kaEoVSp9awwkgmLuKh9bkJpuPWUr0PRCFLGPOkrP9Iy2IGFbJiZXSBKiJSJOPKMnzIX04k1wmlofCE9P22EdfAqkye6lCF31TD6yUxA2ZkwQRMmzoQJr8PpdMhrocHxW-eDah-T8jqOXa1NFwJG8KrDWbVvs-YfJ9v7U-9jWG493LST9lXneh9WkFqJIjXyABYmozdzCEv6HXd0dJSf2A_mGuKm
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFH7oFJkHp1NxOjUHT0qwa9M0PQ61KOoQ_IG3kqTJFGYna_XvN1nbFUUF8ZzXEF6S733NS74HcCA9rmTAHEy0TjDxtcFBTn2sqUe1JzxHT9-tPVwFgwF7fAxvyjqnWXXbvUpJFm8arEpTmh-_Jvq4fvjWM3ENW4Q1fJhiOg8LxNYMsr_rtw-zNIIhG1P9KEPTDdgEpEprftfF58BUs80vCdJp3Ila_x7xKqyUlBP1izWyBnMqbUOrKueAyt3dhuXrmYRr1oampaGFivM6RH2UjnOFximyF-WHyBiiCL8UB4zIEF9UyYabxuLgB9khG7RCfDQcT57zp5dsA-6js7uTc1xWYMDS69Ecq1AwQUORWNVAwbWrmO9zkWhHM8UJ0wYvpUuVzxXVjnClz3peQojDXeEGknub0EjHqdoCJBhLCAsDR7kJsbmeUBqY9ZXLuOMmPb8DTuX-WJby5LZKxiiuhZWtC2Pjwti6MKYdOJx98lpoc_xm3K3mNC63aWZrcNrUsAG1DhxVc1g3_9jZ9p-s92Hp5jSKry4GlzvQNIyLFTcmu9DIJ29qFxblu5nQyd508X4As9_rig
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+note+on+using+the+F-measure+for+evaluating+record+linkage+algorithms&rft.jtitle=Statistics+and+computing&rft.au=Hand%2C+David&rft.au=Christen%2C+Peter&rft.date=2018-05-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=28&rft.issue=3&rft.spage=539&rft.epage=547&rft_id=info:doi/10.1007%2Fs11222-017-9746-6&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon