Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques

Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Medical Informatics Association : JAMIA Jg. 13; H. 5; S. 516
Hauptverfasser: Pakhomov, Serguei V S, Buntrock, James D, Chute, Christopher G
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England 01.09.2006
Schlagworte:
ISSN:1067-5027
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed. Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%. Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
AbstractList Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.OBJECTIVEHuman classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.METHODSWe have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.MEASUREMENTSStandard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.CONCLUSIONOver two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed. Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%. Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
Author Pakhomov, Serguei V S
Buntrock, James D
Chute, Christopher G
Author_xml – sequence: 1
  givenname: Serguei V S
  surname: Pakhomov
  fullname: Pakhomov, Serguei V S
  email: pakhomov.serguei@mayo.edu
  organization: Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. pakhomov.serguei@mayo.edu
– sequence: 2
  givenname: James D
  surname: Buntrock
  fullname: Buntrock, James D
– sequence: 3
  givenname: Christopher G
  surname: Chute
  fullname: Chute, Christopher G
BackLink https://www.ncbi.nlm.nih.gov/pubmed/16799125$$D View this record in MEDLINE/PubMed
BookMark eNo1kD1PwzAQhj0U0Q-Y2JEntoDt1LEzVhVfUhELzJFjn1tXsR1iB8G_p4UynU73vPdI7xxNQgyA0BUlt5TW4m6vvFO3L4wIMUEzSipRcMLEFM1T2hNCK1byczSllahryvgMfa7GHL3KLmxx3gFWKblt8BAyjhYbp7YhJpewjgYSzhH3B_Z4haDjGDIMCY_pmIYv5fsOilYlMFgFg73SOxcAd6CG8CsAvQvuY4R0gc6s6hJcnuYCvT_cv62fis3r4_N6tSn0Upa5sGCrmkpWaiKlaHVpa1mbpaa8BcaFUKI2rZKHxQpuNGhN2lYJyxgvtTaWLdDN399-iEdvbrxLGrpOBYhjaiopy0ry8gBen8Cx9WCafnBeDd_Nf1PsB6ktbq4
CitedBy_id crossref_primary_10_1016_j_ijmedinf_2008_08_004
crossref_primary_10_1016_j_ijmedinf_2019_05_015
crossref_primary_10_1109_ACCESS_2024_3460976
crossref_primary_10_3389_frai_2022_1000283
crossref_primary_10_1016_j_neucom_2020_05_115
crossref_primary_10_1016_j_hlpt_2014_05_002
crossref_primary_10_1093_jamia_ocx138
crossref_primary_10_1260_2040_2295_1_4_595
crossref_primary_10_1016_j_ijmedinf_2023_105212
crossref_primary_10_1197_jamia_M2435
crossref_primary_10_1038_s41578_021_00339_3
crossref_primary_10_1093_jamia_ocac046
crossref_primary_10_1016_j_jcrc_2014_10_001
crossref_primary_10_1080_20476965_2020_1729666
crossref_primary_10_1109_ACCESS_2021_3080085
crossref_primary_10_3389_fbioe_2020_00867
crossref_primary_10_1016_j_compbiomed_2017_04_011
crossref_primary_10_1016_j_jbi_2017_09_004
crossref_primary_10_1109_TBDATA_2020_3021389
crossref_primary_10_1093_jamia_ocv201
crossref_primary_10_1177_1833358317741354
crossref_primary_10_1186_cc6969
crossref_primary_10_1038_s41746_021_00404_9
crossref_primary_10_1136_jamia_2009_001024
crossref_primary_10_1197_jamia_M3097
crossref_primary_10_1016_j_cmpb_2013_07_018
crossref_primary_10_1016_j_ijmedinf_2017_02_006
crossref_primary_10_1177_21925682211062831
crossref_primary_10_1186_s12911_016_0269_4
crossref_primary_10_1136_amiajnl_2013_002190
crossref_primary_10_1111_j_1742_481X_2008_00542_x
crossref_primary_10_1016_j_artmed_2025_103187
crossref_primary_10_1136_amiajnl_2013_002159
crossref_primary_10_1016_j_cmpb_2019_05_024
crossref_primary_10_1111_pme_12713
crossref_primary_10_1007_s10729_021_09554_4
crossref_primary_10_1007_s42979_023_02466_w
crossref_primary_10_1109_TKDE_2014_2330813
crossref_primary_10_1017_S1138741600001815
crossref_primary_10_1016_j_jbi_2013_11_008
crossref_primary_10_1007_s00261_025_04810_5
crossref_primary_10_1371_journal_pone_0173410
crossref_primary_10_1016_j_ijmedinf_2012_11_013
crossref_primary_10_3389_frai_2024_1481581
crossref_primary_10_1109_TKDE_2016_2605687
crossref_primary_10_1016_j_jss_2016_09_058
crossref_primary_10_1109_TCBB_2018_2817488
crossref_primary_10_1016_j_jinf_2016_02_009
crossref_primary_10_1016_j_jbi_2016_12_004
crossref_primary_10_1186_1472_6947_14_94
crossref_primary_10_1016_j_neucom_2018_04_081
crossref_primary_10_1287_ijoc_2015_0655
crossref_primary_10_1016_j_ijmedinf_2020_104135
crossref_primary_10_1186_s12911_019_0788_x
crossref_primary_10_1016_j_ijmedinf_2024_105506
crossref_primary_10_1002_cpt_951
crossref_primary_10_3414_ME11_02_0024
crossref_primary_10_1016_j_future_2021_01_013
crossref_primary_10_1177_15589447241295328
crossref_primary_10_1097_MLR_0b013e31828d1210
crossref_primary_10_1016_j_landurbplan_2019_05_012
crossref_primary_10_1093_bib_bbw001
crossref_primary_10_1016_j_jbi_2007_08_009
crossref_primary_10_1016_j_jvsvi_2024_100111
crossref_primary_10_4137_BII_S11634
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1197/jamia.M2077
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
ExternalDocumentID 16799125
Genre Journal Article
GroupedDBID ---
--K
.DC
.GJ
0R~
18M
1B1
1TH
29L
2WC
3V.
4.4
48X
53G
5GY
5RE
5WD
6PF
7RV
7X7
7~T
88E
88I
8AF
8AO
8FE
8FG
8FI
8FJ
8FW
AABZA
AACZT
AAEDT
AAJQQ
AALRI
AAMVS
AAOGV
AAPGJ
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAWDT
AAWTL
AAXUO
ABDFA
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABNHQ
ABOCM
ABPTD
ABQLI
ABQNK
ABSAR
ABSMQ
ABUWG
ABVGC
ABWST
ABWVN
ABXVV
ACFRR
ACGFO
ACGFS
ACGOD
ACHQT
ACRPL
ACUFI
ACUTJ
ACYHN
ACZBC
ADBBV
ADGZP
ADHKW
ADHZD
ADIPN
ADJOM
ADJQC
ADMUD
ADNMO
ADQBN
ADRIX
ADRTK
ADVEK
ADYVW
AEGPL
AEJOX
AEKSI
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFQV
AFFZL
AFIYH
AFKRA
AFOFC
AFXEN
AFYAG
AGINJ
AGKRT
AGMDO
AGQXC
AGSYK
AGUTN
AHMBA
AHMMS
AJEEA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQC
APIBT
APJGH
AQDSO
AQKUS
AQUVI
ARAPS
ATGXG
AVWKF
AXUDD
AYCSE
AZQEC
BAWUL
BAYMD
BCRHZ
BENPR
BEYMZ
BGLVJ
BHONS
BKEYQ
BPHCQ
BTRTY
BVRKM
BVXVI
BZKNY
C1A
C45
CCPQU
CDBKE
CGR
CS3
CUY
CVF
DAKXR
DIK
DILTD
DU5
DWQXO
E3Z
EBD
EBS
ECM
EIF
EIHJH
EJD
EMOBN
ENERS
EO8
EX3
F5P
FDB
FECEO
FLUFQ
FOEOM
FOTVD
FQBLK
FYUFA
G-Q
GAUVT
GJXCC
GNUQQ
GX1
H13
HAR
HCIFZ
HMCUK
IH2
IHE
J21
K6V
K7-
KBUDW
KOP
KSI
KSN
LSO
M0N
M0T
M1P
M2P
M2Q
M41
MBLQV
MHKGH
NAPCQ
NOMLY
NOYVH
NPM
NQ-
NU-
NVLIB
O9-
OAUYM
OAWHX
OCZFY
ODMLO
OJQWA
OJZSN
OK1
OPAEJ
OVD
OWPYF
P2P
P62
PAFKI
PCD
PEELM
PQQKQ
PROAC
PSQYO
Q5Y
R53
RIG
ROL
ROX
ROZ
RPM
RPZ
RUSNO
RWL
RXO
S0X
SSZ
SV3
TAE
TEORI
TJX
TMA
UKHRP
WOQ
WOW
YAYTL
YHZ
YKOAZ
YXANX
ZGI
~S-
77I
7X8
ABPQP
ADNBA
AEMQT
AFXAL
AJBYB
AJNCP
ALXQX
JXSIZ
ID FETCH-LOGICAL-c483t-fef691823c0887bc3f989d4c15be2577a79dba8be2f75dcecc0bba7f2253ccdf2
IEDL.DBID 7X8
ISICitedReferencesCount 104
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000240607400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1067-5027
IngestDate Thu Oct 02 13:06:26 EDT 2025
Wed Feb 19 01:45:47 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c483t-fef691823c0887bc3f989d4c15be2577a79dba8be2f75dcecc0bba7f2253ccdf2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://academic.oup.com/jamia/article-pdf/13/5/516/2187623/13-5-516.pdf
PMID 16799125
PQID 68836853
PQPubID 23479
ParticipantIDs proquest_miscellaneous_68836853
pubmed_primary_16799125
PublicationCentury 2000
PublicationDate 2006-09-01
PublicationDateYYYYMMDD 2006-09-01
PublicationDate_xml – month: 09
  year: 2006
  text: 2006-09-01
  day: 01
PublicationDecade 2000
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of the American Medical Informatics Association : JAMIA
PublicationTitleAlternate J Am Med Inform Assoc
PublicationYear 2006
References 12668687 - J Am Med Inform Assoc. 2003 Jul-Aug;10(4):330-8
8594285 - Mayo Clin Proc. 1996 Mar;71(3):266-74
7027437 - Sci Am. 1981 Oct;245(4):54-63
8591332 - Medinfo. 1995;8 Pt 1:8-12
15617980 - Artif Intell Med. 2005 Jan;33(1):31-40
7949912 - Proc Annu Symp Comput Appl Med Care. 1994;:162-6
10495099 - J Am Med Inform Assoc. 1999 Sep-Oct;6(5):393-411
7949911 - Proc Annu Symp Comput Appl Med Care. 1994;:157-61
7719797 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74
11079835 - Proc AMIA Symp. 2000;:12-6
8902364 - Comput Biomed Res. 1996 Oct;29(5):351-72
7719796 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60
11062233 - J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604
11079887 - Proc AMIA Symp. 2000;:270-4
14643721 - J Biomed Inform. 2003 Aug-Oct;36(4-5):260-70
14728472 - AMIA Annu Symp Proc. 2003;:968
15360845 - Stud Health Technol Inform. 2004;107(Pt 1):411-5
References_xml – reference: 7949912 - Proc Annu Symp Comput Appl Med Care. 1994;:162-6
– reference: 12668687 - J Am Med Inform Assoc. 2003 Jul-Aug;10(4):330-8
– reference: 8591332 - Medinfo. 1995;8 Pt 1:8-12
– reference: 14643721 - J Biomed Inform. 2003 Aug-Oct;36(4-5):260-70
– reference: 14728472 - AMIA Annu Symp Proc. 2003;:968
– reference: 7027437 - Sci Am. 1981 Oct;245(4):54-63
– reference: 11079887 - Proc AMIA Symp. 2000;:270-4
– reference: 15360845 - Stud Health Technol Inform. 2004;107(Pt 1):411-5
– reference: 11062233 - J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604
– reference: 15617980 - Artif Intell Med. 2005 Jan;33(1):31-40
– reference: 7719796 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60
– reference: 10495099 - J Am Med Inform Assoc. 1999 Sep-Oct;6(5):393-411
– reference: 8902364 - Comput Biomed Res. 1996 Oct;29(5):351-72
– reference: 7949911 - Proc Annu Symp Comput Appl Med Care. 1994;:157-61
– reference: 11079835 - Proc AMIA Symp. 2000;:12-6
– reference: 8594285 - Mayo Clin Proc. 1996 Mar;71(3):266-74
– reference: 7719797 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74
SSID ssj0016235
Score 2.2326016
Snippet Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 516
SubjectTerms Abstracting and Indexing as Topic - methods
Artificial Intelligence
Disease - classification
Forms and Records Control - methods
Humans
International Classification of Diseases
Medical Records Systems, Computerized
Natural Language Processing
Pilot Projects
User-Computer Interface
Title Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques
URI https://www.ncbi.nlm.nih.gov/pubmed/16799125
https://www.proquest.com/docview/68836853
Volume 13
WOSCitedRecordID wos000240607400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELYKRYiF96M8PbAGSJzEjoSEKkTFQKsOIHWL_KwykBSSVvx87vIQE2JgiZQhcuQ73332ff6OkGuXCMtNYDwF5vVC8BhPmCCGdaVcIAWm9VrE9YVPJmI2S6Y9ct_dhUFaZRcT60BtCo1n5LexEKiVzh4WHx72jMLaattAY430GQAZJHTx2U8NARJ7VNc6IRREsPtqb-f5CUfNoUzejIM7zn9HlnWGGe387992yXaLLOmwcYU90rP5Ptkct7XzA7IaLqsC8Wk-p4D6KKDmbF5zAWjhqGkod1lJ8ZJ7SauCtpKrFKUulzj_JUWW_JzaL4mawh5mQENlbuh7Tcm0tO1BAQN00rDlIXkbPb0-Pntt1wVPh4JVnrMuTmDXwTQGIKUZmDMxofYjZWF9c8kTo6SAF8cjo8EF7pSS3IGZmdbGBUdkPS9ye0Io941wYcCdYlHIpZEcAKRSfhg5xVkcDshVN58peDWWKmRui2WZdjM6IMeNSdJFI76RYtkoAVR2-ue3Z2SrOS9BQtg56TtYz_aCbOhVlZWfl7WzwHMyHX8DFV_NzQ
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automating+the+assignment+of+diagnosis+codes+to+patient+encounters+using+example-based+and+machine+learning+techniques&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Pakhomov%2C+Serguei+V+S&rft.au=Buntrock%2C+James+D&rft.au=Chute%2C+Christopher+G&rft.date=2006-09-01&rft.issn=1067-5027&rft.volume=13&rft.issue=5&rft.spage=516&rft_id=info:doi/10.1197%2Fjamia.M2077&rft_id=info%3Apmid%2F16799125&rft_id=info%3Apmid%2F16799125&rft.externalDocID=16799125
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1067-5027&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1067-5027&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1067-5027&client=summon