A note on using the F-measure for evaluating record linkage algorithms
Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a no...
Uložené v:
| Vydané v: | Statistics and computing Ročník 28; číslo 3; s. 539 - 547 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
Springer US
01.05.2018
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0960-3174, 1573-1375 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw. |
|---|---|
| AbstractList | Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw. |
| Author | Hand, David Christen, Peter |
| Author_xml | – sequence: 1 givenname: David surname: Hand fullname: Hand, David organization: Imperial College, Winton Group Limited – sequence: 2 givenname: Peter orcidid: 0000-0003-3435-2015 surname: Christen fullname: Christen, Peter email: peter.christen@anu.edu.au organization: The Australian National University |
| BookMark | eNp9kE1LxDAQhoOs4O7qD_AW8BzNR5u0x2VxVVjwoueQtpNu126yJqngv7dLBUHQ0xxmnnlnngWaOe8AoWtGbxml6i4yxjknlClSqkwSeYbmLFeCMKHyGZrTUlIimMou0CLGPaWMSZHN0WaFnU-AvcND7FyL0w7whhzAxCEAtj5g-DD9YNKpGaD2ocF9595MC9j0rQ9d2h3iJTq3po9w9V2X6HVz_7J-JNvnh6f1aktqwWQiUFZFJcuqKZRSlbEcijw3VWOpLcBkhWW5qLmE3IC0tOJ1XjDRZBk1vOKqNmKJbqa9x-DfB4hJ7_0Q3BipOaVU0ExwPk6xaaoOPsYAVh9DdzDhUzOqT7r0pEuPuvRJl5Yjo34xdZfGr71LwXT9vySfyDimuBbCz01_Q1_Yo4CT |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2022_3165568 crossref_primary_10_2196_34067 crossref_primary_10_3390_app121910134 crossref_primary_10_1016_j_bspc_2024_106056 crossref_primary_10_1109_TIFS_2024_3421292 crossref_primary_10_3390_ijgi13030103 crossref_primary_10_1111_rssa_12477 crossref_primary_10_1007_s10489_025_06388_3 crossref_primary_10_1109_ACCESS_2022_3149914 crossref_primary_10_1145_3377878 crossref_primary_10_1016_j_bspc_2023_105118 crossref_primary_10_1186_s12874_017_0370_0 crossref_primary_10_3758_s13428_022_02040_x crossref_primary_10_32604_jcs_2023_046915 crossref_primary_10_1016_j_ins_2024_120882 crossref_primary_10_1016_j_ins_2019_02_030 crossref_primary_10_1007_s00366_020_01078_9 crossref_primary_10_1007_s13595_021_01070_3 crossref_primary_10_1016_j_cmpb_2020_105351 crossref_primary_10_1016_j_knosys_2022_108288 crossref_primary_10_3389_fgene_2020_00207 crossref_primary_10_1007_s10994_021_05964_1 crossref_primary_10_3390_rs15245785 crossref_primary_10_1080_00207543_2021_1951868 crossref_primary_10_1109_ACCESS_2021_3134754 crossref_primary_10_3390_stats8030070 crossref_primary_10_1016_j_apgeog_2021_102532 crossref_primary_10_1080_17517575_2020_1790043 crossref_primary_10_1002_bimj_202200209 crossref_primary_10_1007_s11042_021_11031_7 crossref_primary_10_1007_s10844_024_00853_0 crossref_primary_10_1080_01615440_2019_1571466 crossref_primary_10_1109_ACCESS_2022_3198706 crossref_primary_10_3390_su142013627 crossref_primary_10_1016_j_chemosphere_2023_140191 crossref_primary_10_3390_agriculture12091467 crossref_primary_10_1016_j_ncl_2024_03_001 crossref_primary_10_1080_01431161_2025_2454042 crossref_primary_10_1007_s11222_025_10701_y crossref_primary_10_1016_j_jbi_2022_104094 crossref_primary_10_1145_3352591 crossref_primary_10_3390_aerospace10030233 crossref_primary_10_1109_ACCESS_2020_2974292 crossref_primary_10_1038_s41598_021_87834_3 crossref_primary_10_1109_JIOT_2023_3282968 crossref_primary_10_1109_TNSM_2022_3177512 crossref_primary_10_3390_app12031550 crossref_primary_10_1016_j_is_2019_03_006 crossref_primary_10_1016_j_is_2023_102307 crossref_primary_10_2478_fman_2024_0012 crossref_primary_10_3390_app112210546 crossref_primary_10_1007_s41060_024_00657_z crossref_primary_10_1080_00031305_2023_2191664 crossref_primary_10_1145_3721985 crossref_primary_10_1016_j_cageo_2022_105245 crossref_primary_10_7717_peerj_cs_2729 crossref_primary_10_1016_j_ijcip_2020_100357 crossref_primary_10_3390_app11083509 crossref_primary_10_1109_ACCESS_2025_3580958 crossref_primary_10_1007_s41050_021_00030_0 crossref_primary_10_1109_ACCESS_2020_3024558 crossref_primary_10_1080_00207543_2019_1694719 crossref_primary_10_1016_j_neucom_2023_126891 crossref_primary_10_3390_info15100584 crossref_primary_10_1002_aisy_202000276 crossref_primary_10_1007_s10115_018_1246_2 crossref_primary_10_3389_fenrg_2023_1287413 crossref_primary_10_1007_s11079_024_09779_0 crossref_primary_10_1080_13658816_2023_2273877 crossref_primary_10_1145_3533016 crossref_primary_10_1007_s10115_019_01370_1 crossref_primary_10_3390_s21062176 crossref_primary_10_1016_j_cmi_2021_02_028 crossref_primary_10_1016_j_datak_2020_101809 crossref_primary_10_1049_rpg2_12846 crossref_primary_10_3390_rs13132581 crossref_primary_10_1109_JIOT_2025_3540402 crossref_primary_10_1016_j_is_2024_102410 crossref_primary_10_1007_s13218_022_00763_9 crossref_primary_10_1016_j_jag_2024_104015 crossref_primary_10_1016_j_clindermatol_2023_12_021 crossref_primary_10_1145_3591356 crossref_primary_10_1121_10_0007063 crossref_primary_10_1145_3606367 crossref_primary_10_1080_01615440_2020_1707445 crossref_primary_10_1093_jamia_ocae248 crossref_primary_10_1016_j_infsof_2021_106664 crossref_primary_10_1007_s44174_023_00150_4 crossref_primary_10_2196_34834 crossref_primary_10_25300_misq_2025_18178 crossref_primary_10_3390_rs15030628 crossref_primary_10_1093_bib_bbac254 crossref_primary_10_1007_s11042_024_18519_y crossref_primary_10_3390_e23081091 crossref_primary_10_3390_rs13040777 crossref_primary_10_1016_j_cageo_2021_104703 crossref_primary_10_1007_s10994_021_06012_8 crossref_primary_10_1142_S0218001424590110 crossref_primary_10_1016_j_aap_2021_106090 crossref_primary_10_1002_cpe_7418 crossref_primary_10_1016_j_is_2021_101959 crossref_primary_10_3390_s22145434 crossref_primary_10_1016_j_neucom_2020_01_036 crossref_primary_10_3390_rs15215218 crossref_primary_10_1136_bmjopen_2021_053349 crossref_primary_10_1016_j_watres_2023_120503 crossref_primary_10_1080_07038992_2022_2072277 crossref_primary_10_1109_ACCESS_2021_3057578 crossref_primary_10_1002_cpt_2266 crossref_primary_10_3390_healthcare12040439 crossref_primary_10_3389_fpubh_2021_770111 crossref_primary_10_3390_e20060471 crossref_primary_10_1016_j_jhydrol_2020_125682 crossref_primary_10_3390_s21124153 crossref_primary_10_1109_ACCESS_2021_3116128 crossref_primary_10_1128_msystems_00518_21 crossref_primary_10_1093_bib_bbad364 crossref_primary_10_3390_jmse12050709 crossref_primary_10_3390_math12081173 crossref_primary_10_1109_TGRS_2024_3446950 crossref_primary_10_7717_peerj_cs_465 crossref_primary_10_1177_00405175221128619 crossref_primary_10_1186_s12864_019_6413_7 crossref_primary_10_1007_s00500_023_08279_6 crossref_primary_10_3390_app10124378 crossref_primary_10_1016_j_swevo_2019_03_007 |
| Cites_doi | 10.1016/j.is.2003.12.003 10.1017/CBO9780511809071 10.2307/2982975 10.14778/2367502.2367564 10.1016/j.is.2012.11.005 10.1111/j.1751-5823.2012.00183.x 10.1080/01621459.1969.10501049 10.1080/01621459.1989.10478785 10.1080/01621459.2012.726889 10.1145/1656274.1656282 10.1109/TKDE.2011.127 10.1023/A:1025666923033 10.1080/01621459.2012.757231 10.1007/978-3-540-44918-8_6 10.1002/9781119072454 10.3366/hac.2002.14.1-2.61 10.1214/14-AOAS779 10.1080/01621459.1995.10476563 10.1198/016214501750332956 10.1002/sim.6586 10.2200/S00262ED1V01Y201003DTM003 10.1002/sim.3859 10.1145/347090.347123 10.1109/ICDM.2015.63 10.1007/s10994-009-5119-5 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media New York 2017 Copyright Springer Science & Business Media 2018 |
| Copyright_xml | – notice: Springer Science+Business Media New York 2017 – notice: Copyright Springer Science & Business Media 2018 |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s11222-017-9746-6 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics Mathematics Computer Science |
| EISSN | 1573-1375 |
| EndPage | 547 |
| ExternalDocumentID | 10_1007_s11222_017_9746_6 |
| GrantInformation_xml | – fundername: Engineering and Physical Sciences Research Council grantid: EP/K032208/1 funderid: http://dx.doi.org/10.13039/501100000266 |
| GroupedDBID | -52 -5D -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29Q 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABLJU ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BAPOH BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9R PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TN5 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7U Z7W Z7X Z7Y Z81 Z83 Z87 Z88 Z8O Z8R Z8U Z8W Z91 Z92 ZMTXR ZWQNP ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION JQ2 |
| ID | FETCH-LOGICAL-c316t-e9b8b69bd8777baf2e855abdf0f8ea48f153c26e5ae6f0b2c5813d440a2b27ca3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 178 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000424686200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0960-3174 |
| IngestDate | Sun Nov 09 07:45:37 EST 2025 Sat Nov 29 03:32:42 EST 2025 Tue Nov 18 21:53:29 EST 2025 Fri Feb 21 02:34:27 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | Recall Precision Class imbalance Entity resolution Classification Data linkage |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c316t-e9b8b69bd8777baf2e855abdf0f8ea48f153c26e5ae6f0b2c5813d440a2b27ca3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-3435-2015 |
| PQID | 2000304322 |
| PQPubID | 2043829 |
| PageCount | 9 |
| ParticipantIDs | proquest_journals_2000304322 crossref_primary_10_1007_s11222_017_9746_6 crossref_citationtrail_10_1007_s11222_017_9746_6 springer_journals_10_1007_s11222_017_9746_6 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-05-01 |
| PublicationDateYYYYMMDD | 2018-05-01 |
| PublicationDate_xml | – month: 05 year: 2018 text: 2018-05-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York – name: Dordrecht |
| PublicationTitle | Statistics and computing |
| PublicationTitleAbbrev | Stat Comput |
| PublicationYear | 2018 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Fellegi, Sunter (CR10) 1969; 64 Newcombe (CR26) 1988 Jaro (CR20) 1989; 84 CR15 Christen, Goiser, Guillet, Hamilton (CR6) 2007 CR13 Harron, Goldstein, Dibben (CR18) 2015 Winkler (CR32) 2004; 29 CR33 Larsen, Rubin (CR21) 2001; 96 Manning, Raghavan, Schütze (CR22) 2008 Reid, Davies, Garrett (CR27) 2002; 14 van Rijsbergen (CR30) 1979 Christen (CR4) 2012; 24 Hand (CR16) 2010; 29 Herzog, Scheuren, Winkler (CR19) 2007 Vatsalan, Christen, Verykios (CR31) 2013; 38 Hand (CR14) 1997 Gutman, Afendulis, Zaslavsky (CR12) 2013; 108 Copas, Hilton (CR8) 1990; 153 CR5 Hand (CR17) 2012; 80 CR7 Christen (CR2) 2009; 11 Getoor, Machanavajjhala (CR11) 2012; 5 CR25 Sadinle, Fienberg (CR29) 2013; 108 CR23 Belin, Rubin (CR1) 1995; 90 Domingo-Ferrer, Torra (CR9) 2003; 13 Sadinle (CR28) 2014; 8 Murray (CR24) 2016; 7 Christen (CR3) 2012 P Christen (9746_CR3) 2012 P Christen (9746_CR6) 2007 HB Newcombe (9746_CR26) 1988 MD Larsen (9746_CR21) 2001; 96 9746_CR23 M Sadinle (9746_CR29) 2013; 108 IP Fellegi (9746_CR10) 1969; 64 9746_CR25 DJ Hand (9746_CR14) 1997 DJ Hand (9746_CR16) 2010; 29 P Christen (9746_CR4) 2012; 24 T Herzog (9746_CR19) 2007 WE Winkler (9746_CR32) 2004; 29 K Harron (9746_CR18) 2015 DJ Hand (9746_CR17) 2012; 80 TR Belin (9746_CR1) 1995; 90 JS Murray (9746_CR24) 2016; 7 CD Manning (9746_CR22) 2008 M Sadinle (9746_CR28) 2014; 8 L Getoor (9746_CR11) 2012; 5 9746_CR33 9746_CR13 D Vatsalan (9746_CR31) 2013; 38 9746_CR7 9746_CR15 J Copas (9746_CR8) 1990; 153 MA Jaro (9746_CR20) 1989; 84 C Rijsbergen van (9746_CR30) 1979 9746_CR5 A Reid (9746_CR27) 2002; 14 P Christen (9746_CR2) 2009; 11 J Domingo-Ferrer (9746_CR9) 2003; 13 R Gutman (9746_CR12) 2013; 108 |
| References_xml | – year: 1997 ident: CR14 publication-title: Construction and Assessment of Classification Rules – volume: 29 start-page: 531 issue: 7 year: 2004 end-page: 550 ident: CR32 article-title: Methods for evaluating and creating data quality publication-title: Inf. Syst. doi: 10.1016/j.is.2003.12.003 – year: 1979 ident: CR30 publication-title: Information Retrieval – year: 2008 ident: CR22 publication-title: Introduction to Information Retrieval doi: 10.1017/CBO9780511809071 – volume: 7 start-page: 2 issue: 1 year: 2016 ident: CR24 article-title: Probabilistic record linkage and deduplication after indexing, blocking, and filtering publication-title: J. Priv. Confid. – ident: CR33 – volume: 153 start-page: 287 issue: 3 year: 1990 end-page: 320 ident: CR8 article-title: Record linkage: statistical models for matching computer records publication-title: J. R. Stat. Soc. Ser. A (Stat. Soc.) doi: 10.2307/2982975 – volume: 29 start-page: 1502 issue: 14 year: 2010 end-page: 1510 ident: CR16 article-title: Evaluating diagnostic tests: the area under the ROC curve and the balance of errors publication-title: Stat. Med. – year: 1988 ident: CR26 publication-title: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business – ident: CR25 – year: 2012 ident: CR3 publication-title: Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications – ident: CR23 – volume: 5 start-page: 2018 issue: 12 year: 2012 end-page: 2019 ident: CR11 article-title: Entity resolution: theory, practice and open challenges publication-title: VLDB Endow. doi: 10.14778/2367502.2367564 – volume: 38 start-page: 946 issue: 6 year: 2013 end-page: 969 ident: CR31 article-title: A taxonomy of privacy-preserving record linkage techniques publication-title: Inf. Syst. doi: 10.1016/j.is.2012.11.005 – volume: 80 start-page: 400 issue: 3 year: 2012 end-page: 414 ident: CR17 article-title: Assessing the performance of classification methods publication-title: Int. Stat. Rev. doi: 10.1111/j.1751-5823.2012.00183.x – volume: 64 start-page: 1183 issue: 328 year: 1969 end-page: 1210 ident: CR10 article-title: A theory for record linkage publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1969.10501049 – volume: 84 start-page: 414 issue: 406 year: 1989 end-page: 420 ident: CR20 article-title: Advances in record-linkage methodology a applied to matching the 1985 Census of Tampa, Florida publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1989.10478785 – volume: 108 start-page: 34 issue: 501 year: 2013 end-page: 47 ident: CR12 article-title: A Bayesian procedure for file linking to analyze end-of-life medical costs publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.2012.726889 – volume: 11 start-page: 39 issue: 1 year: 2009 end-page: 48 ident: CR2 article-title: Development and user experiences of an open source data cleaning, deduplication and record linkage system publication-title: SIGKDD Explor. doi: 10.1145/1656274.1656282 – volume: 24 start-page: 1537 issue: 9 year: 2012 end-page: 1555 ident: CR4 article-title: A survey of indexing techniques for scalable record linkage and deduplication publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2011.127 – ident: CR15 – volume: 13 start-page: 343 issue: 4 year: 2003 end-page: 354 ident: CR9 article-title: Disclosure risk assessment in statistical microdata protection via advanced record linkage publication-title: Stat. Comput. doi: 10.1023/A:1025666923033 – ident: CR13 – volume: 108 start-page: 385 issue: 502 year: 2013 end-page: 397 ident: CR29 article-title: A generalized Fellegi–Sunter framework for multiple record linkage with application to homicide record systems publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.2012.757231 – start-page: 127 year: 2007 end-page: 151 ident: CR6 article-title: Quality and complexity measures for data linkage and deduplication publication-title: Quality Measures in Data Mining, Studies in Computational Intelligence doi: 10.1007/978-3-540-44918-8_6 – ident: CR5 – ident: CR7 – year: 2015 ident: CR18 publication-title: Methodological Developments in Data Linkage doi: 10.1002/9781119072454 – volume: 14 start-page: 61 issue: 1–2 year: 2002 end-page: 86 ident: CR27 article-title: Nineteenth-century Scottish demography from linked censuses and civil registers publication-title: Hist. Comput. doi: 10.3366/hac.2002.14.1-2.61 – volume: 8 start-page: 2404 issue: 4 year: 2014 end-page: 2434 ident: CR28 article-title: Detecting duplicates in a homicide registry using a Bayesian partitioning approach publication-title: Ann. Appl. Stat. doi: 10.1214/14-AOAS779 – volume: 90 start-page: 694 issue: 430 year: 1995 end-page: 707 ident: CR1 article-title: A method for calibrating false-match rates in record linkage publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1995.10476563 – year: 2007 ident: CR19 publication-title: Data Quality and Record Linkage Techniques – volume: 96 start-page: 32 issue: 453 year: 2001 end-page: 41 ident: CR21 article-title: Iterative automated record linkage using mixture models publication-title: J. Am. Stat. Assoc. doi: 10.1198/016214501750332956 – volume-title: Introduction to Information Retrieval year: 2008 ident: 9746_CR22 doi: 10.1017/CBO9780511809071 – volume: 108 start-page: 385 issue: 502 year: 2013 ident: 9746_CR29 publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.2012.757231 – ident: 9746_CR5 – volume: 84 start-page: 414 issue: 406 year: 1989 ident: 9746_CR20 publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1989.10478785 – volume: 29 start-page: 531 issue: 7 year: 2004 ident: 9746_CR32 publication-title: Inf. Syst. doi: 10.1016/j.is.2003.12.003 – volume: 14 start-page: 61 issue: 1–2 year: 2002 ident: 9746_CR27 publication-title: Hist. Comput. doi: 10.3366/hac.2002.14.1-2.61 – start-page: 127 volume-title: Quality Measures in Data Mining, Studies in Computational Intelligence year: 2007 ident: 9746_CR6 doi: 10.1007/978-3-540-44918-8_6 – volume: 38 start-page: 946 issue: 6 year: 2013 ident: 9746_CR31 publication-title: Inf. Syst. doi: 10.1016/j.is.2012.11.005 – volume-title: Data Quality and Record Linkage Techniques year: 2007 ident: 9746_CR19 – ident: 9746_CR13 doi: 10.1002/sim.6586 – ident: 9746_CR25 doi: 10.2200/S00262ED1V01Y201003DTM003 – volume: 80 start-page: 400 issue: 3 year: 2012 ident: 9746_CR17 publication-title: Int. Stat. Rev. doi: 10.1111/j.1751-5823.2012.00183.x – ident: 9746_CR33 – volume: 29 start-page: 1502 issue: 14 year: 2010 ident: 9746_CR16 publication-title: Stat. Med. doi: 10.1002/sim.3859 – volume: 11 start-page: 39 issue: 1 year: 2009 ident: 9746_CR2 publication-title: SIGKDD Explor. doi: 10.1145/1656274.1656282 – volume: 24 start-page: 1537 issue: 9 year: 2012 ident: 9746_CR4 publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2011.127 – volume: 153 start-page: 287 issue: 3 year: 1990 ident: 9746_CR8 publication-title: J. R. Stat. Soc. Ser. A (Stat. Soc.) doi: 10.2307/2982975 – volume-title: Construction and Assessment of Classification Rules year: 1997 ident: 9746_CR14 – volume-title: Information Retrieval year: 1979 ident: 9746_CR30 – ident: 9746_CR23 doi: 10.1145/347090.347123 – volume: 7 start-page: 2 issue: 1 year: 2016 ident: 9746_CR24 publication-title: J. Priv. Confid. – volume: 90 start-page: 694 issue: 430 year: 1995 ident: 9746_CR1 publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1995.10476563 – volume: 13 start-page: 343 issue: 4 year: 2003 ident: 9746_CR9 publication-title: Stat. Comput. doi: 10.1023/A:1025666923033 – volume-title: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business year: 1988 ident: 9746_CR26 – ident: 9746_CR7 doi: 10.1109/ICDM.2015.63 – volume: 5 start-page: 2018 issue: 12 year: 2012 ident: 9746_CR11 publication-title: VLDB Endow. doi: 10.14778/2367502.2367564 – volume: 64 start-page: 1183 issue: 328 year: 1969 ident: 9746_CR10 publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1969.10501049 – volume-title: Methodological Developments in Data Linkage year: 2015 ident: 9746_CR18 doi: 10.1002/9781119072454 – volume-title: Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications year: 2012 ident: 9746_CR3 – volume: 108 start-page: 34 issue: 501 year: 2013 ident: 9746_CR12 publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.2012.726889 – volume: 8 start-page: 2404 issue: 4 year: 2014 ident: 9746_CR28 publication-title: Ann. Appl. Stat. doi: 10.1214/14-AOAS779 – volume: 96 start-page: 32 issue: 453 year: 2001 ident: 9746_CR21 publication-title: J. Am. Stat. Assoc. doi: 10.1198/016214501750332956 – ident: 9746_CR15 doi: 10.1007/s10994-009-5119-5 |
| SSID | ssj0011634 |
| Score | 2.5831292 |
| Snippet | Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 539 |
| SubjectTerms | Artificial Intelligence Classification Ground truth Information retrieval Machine learning Mathematics and Statistics Probability and Statistics in Computer Science Quality assessment Recall Statistical Theory and Methods Statistics Statistics and Computing/Statistics Programs |
| Title | A note on using the F-measure for evaluating record linkage algorithms |
| URI | https://link.springer.com/article/10.1007/s11222-017-9746-6 https://www.proquest.com/docview/2000304322 |
| Volume | 28 |
| WOSCitedRecordID | wos000424686200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-1375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEB60eqgHq1WxWiUHT0pgu49s9ljExYMW8VF6W5JsUoV2K-3q73eyjxZFBT3nQZgk33xkMt8AnClPaBVyh_rGpNQPDOKgYAE1zGPGk55jiry14U04GPDRKLqr8rgX9W_3OiRZIPUq2a2HvoxaVEUOzChbhw30dtzexvuH4TJ0gASj0IxCao4AE_p1KPO7KT47oxXD_BIULXxN3PrXKndgu6KWpF-ehV1Y01kbWnXZBlLd4jZs3S6lWhdtaFq6Wao170HcJ9ks12SWEfshfkywI4nptHxIJEhwSS0Pjo3lAw-xy0RUImIyns1f8ufpYh-e4qvHy2taVVqgyuuxnOpIcskimVp1QCmMq3kQCJkax3AtfG4QF5XLdCA0M450VcB7Xur7jnClGyrhHUAjm2X6EIjkPPV5FDraTX0b04kUwmmgXS4cN-0FHXBqkyeqkiG31TAmyUpA2ZowQRMm1oQJ68D5cshrqcHxW-duvY9JdR0XttamDQEjeHXgot63VfOPkx39qfcxNJFO8fI7ZBca-fxNn8CmesddnJ8Wp_QDYKffwg |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZSwMxEB68QH3wqIr1zINPSmC7m81mH0UsFWsRL_q2JNmkCu1W2urvd7JHi6KCPucgTJJvPjKTbwBOdCCNjoRHmbUpZaFFHJQ8pJYH3AYq8Gz-b-2pHXU6otuNb8t_3OMq270KSeZIPfvs1kBfRh2qIgfmlM_DIkOH5fL47u6fpqEDJBi5ZhRScwSYiFWhzO-m-OyMZgzzS1A09zXN9X-tcgPWSmpJzouzsAlzJqvBelW2gZS3uAarN1Op1nENVhzdLNSat6B5TrLhxJBhRlxCfI9gR9Kkg-IhkSDBJZU8ODYWDzzELRNRich-bzh6mTwPxtvw2Lx8uGjRstIC1UGDT6iJlVA8VqlTB1TS-kaEoVSp9awwkgmLuKh9bkJpuPWUr0PRCFLGPOkrP9Iy2IGFbJiZXSBKiJSJOPKMnzIX04k1wmlofCE9P22EdfAqkye6lCF31TD6yUxA2ZkwQRMmzoQJr8PpdMhrocHxW-eDah-T8jqOXa1NFwJG8KrDWbVvs-YfJ9v7U-9jWG493LST9lXneh9WkFqJIjXyABYmozdzCEv6HXd0dJSf2A_mGuKm |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFH7oFJkHp1NxOjUHT0qwa9M0PQ61KOoQ_IG3kqTJFGYna_XvN1nbFUUF8ZzXEF6S733NS74HcCA9rmTAHEy0TjDxtcFBTn2sqUe1JzxHT9-tPVwFgwF7fAxvyjqnWXXbvUpJFm8arEpTmh-_Jvq4fvjWM3ENW4Q1fJhiOg8LxNYMsr_rtw-zNIIhG1P9KEPTDdgEpEprftfF58BUs80vCdJp3Ila_x7xKqyUlBP1izWyBnMqbUOrKueAyt3dhuXrmYRr1oampaGFivM6RH2UjnOFximyF-WHyBiiCL8UB4zIEF9UyYabxuLgB9khG7RCfDQcT57zp5dsA-6js7uTc1xWYMDS69Ecq1AwQUORWNVAwbWrmO9zkWhHM8UJ0wYvpUuVzxXVjnClz3peQojDXeEGknub0EjHqdoCJBhLCAsDR7kJsbmeUBqY9ZXLuOMmPb8DTuX-WJby5LZKxiiuhZWtC2Pjwti6MKYdOJx98lpoc_xm3K3mNC63aWZrcNrUsAG1DhxVc1g3_9jZ9p-s92Hp5jSKry4GlzvQNIyLFTcmu9DIJ29qFxblu5nQyd508X4As9_rig |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+note+on+using+the+F-measure+for+evaluating+record+linkage+algorithms&rft.jtitle=Statistics+and+computing&rft.au=Hand%2C+David&rft.au=Christen%2C+Peter&rft.date=2018-05-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=28&rft.issue=3&rft.spage=539&rft.epage=547&rft_id=info:doi/10.1007%2Fs11222-017-9746-6&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon |