Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques
Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical...
Gespeichert in:
| Veröffentlicht in: | Journal of the American Medical Informatics Association : JAMIA Jg. 13; H. 5; S. 516 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
01.09.2006
|
| Schlagworte: | |
| ISSN: | 1067-5027 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.
We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.
Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.
Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers. |
|---|---|
| AbstractList | Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.OBJECTIVEHuman classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.METHODSWe have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.MEASUREMENTSStandard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.CONCLUSIONOver two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers. Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed. Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%. Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers. |
| Author | Pakhomov, Serguei V S Buntrock, James D Chute, Christopher G |
| Author_xml | – sequence: 1 givenname: Serguei V S surname: Pakhomov fullname: Pakhomov, Serguei V S email: pakhomov.serguei@mayo.edu organization: Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA. pakhomov.serguei@mayo.edu – sequence: 2 givenname: James D surname: Buntrock fullname: Buntrock, James D – sequence: 3 givenname: Christopher G surname: Chute fullname: Chute, Christopher G |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/16799125$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1kD1PwzAQhj0U0Q-Y2JEntoDt1LEzVhVfUhELzJFjn1tXsR1iB8G_p4UynU73vPdI7xxNQgyA0BUlt5TW4m6vvFO3L4wIMUEzSipRcMLEFM1T2hNCK1byczSllahryvgMfa7GHL3KLmxx3gFWKblt8BAyjhYbp7YhJpewjgYSzhH3B_Z4haDjGDIMCY_pmIYv5fsOilYlMFgFg73SOxcAd6CG8CsAvQvuY4R0gc6s6hJcnuYCvT_cv62fis3r4_N6tSn0Upa5sGCrmkpWaiKlaHVpa1mbpaa8BcaFUKI2rZKHxQpuNGhN2lYJyxgvtTaWLdDN399-iEdvbrxLGrpOBYhjaiopy0ry8gBen8Cx9WCafnBeDd_Nf1PsB6ktbq4 |
| CitedBy_id | crossref_primary_10_1016_j_ijmedinf_2008_08_004 crossref_primary_10_1016_j_ijmedinf_2019_05_015 crossref_primary_10_1109_ACCESS_2024_3460976 crossref_primary_10_3389_frai_2022_1000283 crossref_primary_10_1016_j_neucom_2020_05_115 crossref_primary_10_1016_j_hlpt_2014_05_002 crossref_primary_10_1093_jamia_ocx138 crossref_primary_10_1260_2040_2295_1_4_595 crossref_primary_10_1016_j_ijmedinf_2023_105212 crossref_primary_10_1197_jamia_M2435 crossref_primary_10_1038_s41578_021_00339_3 crossref_primary_10_1093_jamia_ocac046 crossref_primary_10_1016_j_jcrc_2014_10_001 crossref_primary_10_1080_20476965_2020_1729666 crossref_primary_10_1109_ACCESS_2021_3080085 crossref_primary_10_3389_fbioe_2020_00867 crossref_primary_10_1016_j_compbiomed_2017_04_011 crossref_primary_10_1016_j_jbi_2017_09_004 crossref_primary_10_1109_TBDATA_2020_3021389 crossref_primary_10_1093_jamia_ocv201 crossref_primary_10_1177_1833358317741354 crossref_primary_10_1186_cc6969 crossref_primary_10_1038_s41746_021_00404_9 crossref_primary_10_1136_jamia_2009_001024 crossref_primary_10_1197_jamia_M3097 crossref_primary_10_1016_j_cmpb_2013_07_018 crossref_primary_10_1016_j_ijmedinf_2017_02_006 crossref_primary_10_1177_21925682211062831 crossref_primary_10_1186_s12911_016_0269_4 crossref_primary_10_1136_amiajnl_2013_002190 crossref_primary_10_1111_j_1742_481X_2008_00542_x crossref_primary_10_1016_j_artmed_2025_103187 crossref_primary_10_1136_amiajnl_2013_002159 crossref_primary_10_1016_j_cmpb_2019_05_024 crossref_primary_10_1111_pme_12713 crossref_primary_10_1007_s10729_021_09554_4 crossref_primary_10_1007_s42979_023_02466_w crossref_primary_10_1109_TKDE_2014_2330813 crossref_primary_10_1017_S1138741600001815 crossref_primary_10_1016_j_jbi_2013_11_008 crossref_primary_10_1007_s00261_025_04810_5 crossref_primary_10_1371_journal_pone_0173410 crossref_primary_10_1016_j_ijmedinf_2012_11_013 crossref_primary_10_3389_frai_2024_1481581 crossref_primary_10_1109_TKDE_2016_2605687 crossref_primary_10_1016_j_jss_2016_09_058 crossref_primary_10_1109_TCBB_2018_2817488 crossref_primary_10_1016_j_jinf_2016_02_009 crossref_primary_10_1016_j_jbi_2016_12_004 crossref_primary_10_1186_1472_6947_14_94 crossref_primary_10_1016_j_neucom_2018_04_081 crossref_primary_10_1287_ijoc_2015_0655 crossref_primary_10_1016_j_ijmedinf_2020_104135 crossref_primary_10_1186_s12911_019_0788_x crossref_primary_10_1016_j_ijmedinf_2024_105506 crossref_primary_10_1002_cpt_951 crossref_primary_10_3414_ME11_02_0024 crossref_primary_10_1016_j_future_2021_01_013 crossref_primary_10_1177_15589447241295328 crossref_primary_10_1097_MLR_0b013e31828d1210 crossref_primary_10_1016_j_landurbplan_2019_05_012 crossref_primary_10_1093_bib_bbw001 crossref_primary_10_1016_j_jbi_2007_08_009 crossref_primary_10_1016_j_jvsvi_2024_100111 crossref_primary_10_4137_BII_S11634 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1197/jamia.M2077 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| ExternalDocumentID | 16799125 |
| Genre | Journal Article |
| GroupedDBID | --- --K .DC .GJ 0R~ 18M 1B1 1TH 29L 2WC 3V. 4.4 48X 53G 5GY 5RE 5WD 6PF 7RV 7X7 7~T 88E 88I 8AF 8AO 8FE 8FG 8FI 8FJ 8FW AABZA AACZT AAEDT AAJQQ AALRI AAMVS AAOGV AAPGJ AAPQZ AAPXW AARHZ AAUAY AAUQX AAVAP AAWDT AAWTL AAXUO ABDFA ABEJV ABEUO ABGNP ABIXL ABJNI ABNHQ ABOCM ABPTD ABQLI ABQNK ABSAR ABSMQ ABUWG ABVGC ABWST ABWVN ABXVV ACFRR ACGFO ACGFS ACGOD ACHQT ACRPL ACUFI ACUTJ ACYHN ACZBC ADBBV ADGZP ADHKW ADHZD ADIPN ADJOM ADJQC ADMUD ADNMO ADQBN ADRIX ADRTK ADVEK ADYVW AEGPL AEJOX AEKSI AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFQV AFFZL AFIYH AFKRA AFOFC AFXEN AFYAG AGINJ AGKRT AGMDO AGQXC AGSYK AGUTN AHMBA AHMMS AJEEA ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQC APIBT APJGH AQDSO AQKUS AQUVI ARAPS ATGXG AVWKF AXUDD AYCSE AZQEC BAWUL BAYMD BCRHZ BENPR BEYMZ BGLVJ BHONS BKEYQ BPHCQ BTRTY BVRKM BVXVI BZKNY C1A C45 CCPQU CDBKE CGR CS3 CUY CVF DAKXR DIK DILTD DU5 DWQXO E3Z EBD EBS ECM EIF EIHJH EJD EMOBN ENERS EO8 EX3 F5P FDB FECEO FLUFQ FOEOM FOTVD FQBLK FYUFA G-Q GAUVT GJXCC GNUQQ GX1 H13 HAR HCIFZ HMCUK IH2 IHE J21 K6V K7- KBUDW KOP KSI KSN LSO M0N M0T M1P M2P M2Q M41 MBLQV MHKGH NAPCQ NOMLY NOYVH NPM NQ- NU- NVLIB O9- OAUYM OAWHX OCZFY ODMLO OJQWA OJZSN OK1 OPAEJ OVD OWPYF P2P P62 PAFKI PCD PEELM PQQKQ PROAC PSQYO Q5Y R53 RIG ROL ROX ROZ RPM RPZ RUSNO RWL RXO S0X SSZ SV3 TAE TEORI TJX TMA UKHRP WOQ WOW YAYTL YHZ YKOAZ YXANX ZGI ~S- 77I 7X8 ABPQP ADNBA AEMQT AFXAL AJBYB AJNCP ALXQX JXSIZ |
| ID | FETCH-LOGICAL-c483t-fef691823c0887bc3f989d4c15be2577a79dba8be2f75dcecc0bba7f2253ccdf2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 104 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000240607400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1067-5027 |
| IngestDate | Thu Oct 02 13:06:26 EDT 2025 Wed Feb 19 01:45:47 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c483t-fef691823c0887bc3f989d4c15be2577a79dba8be2f75dcecc0bba7f2253ccdf2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://academic.oup.com/jamia/article-pdf/13/5/516/2187623/13-5-516.pdf |
| PMID | 16799125 |
| PQID | 68836853 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_68836853 pubmed_primary_16799125 |
| PublicationCentury | 2000 |
| PublicationDate | 2006-09-01 |
| PublicationDateYYYYMMDD | 2006-09-01 |
| PublicationDate_xml | – month: 09 year: 2006 text: 2006-09-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Journal of the American Medical Informatics Association : JAMIA |
| PublicationTitleAlternate | J Am Med Inform Assoc |
| PublicationYear | 2006 |
| References | 12668687 - J Am Med Inform Assoc. 2003 Jul-Aug;10(4):330-8 8594285 - Mayo Clin Proc. 1996 Mar;71(3):266-74 7027437 - Sci Am. 1981 Oct;245(4):54-63 8591332 - Medinfo. 1995;8 Pt 1:8-12 15617980 - Artif Intell Med. 2005 Jan;33(1):31-40 7949912 - Proc Annu Symp Comput Appl Med Care. 1994;:162-6 10495099 - J Am Med Inform Assoc. 1999 Sep-Oct;6(5):393-411 7949911 - Proc Annu Symp Comput Appl Med Care. 1994;:157-61 7719797 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74 11079835 - Proc AMIA Symp. 2000;:12-6 8902364 - Comput Biomed Res. 1996 Oct;29(5):351-72 7719796 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60 11062233 - J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604 11079887 - Proc AMIA Symp. 2000;:270-4 14643721 - J Biomed Inform. 2003 Aug-Oct;36(4-5):260-70 14728472 - AMIA Annu Symp Proc. 2003;:968 15360845 - Stud Health Technol Inform. 2004;107(Pt 1):411-5 |
| References_xml | – reference: 7949912 - Proc Annu Symp Comput Appl Med Care. 1994;:162-6 – reference: 12668687 - J Am Med Inform Assoc. 2003 Jul-Aug;10(4):330-8 – reference: 8591332 - Medinfo. 1995;8 Pt 1:8-12 – reference: 14643721 - J Biomed Inform. 2003 Aug-Oct;36(4-5):260-70 – reference: 14728472 - AMIA Annu Symp Proc. 2003;:968 – reference: 7027437 - Sci Am. 1981 Oct;245(4):54-63 – reference: 11079887 - Proc AMIA Symp. 2000;:270-4 – reference: 15360845 - Stud Health Technol Inform. 2004;107(Pt 1):411-5 – reference: 11062233 - J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604 – reference: 15617980 - Artif Intell Med. 2005 Jan;33(1):31-40 – reference: 7719796 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60 – reference: 10495099 - J Am Med Inform Assoc. 1999 Sep-Oct;6(5):393-411 – reference: 8902364 - Comput Biomed Res. 1996 Oct;29(5):351-72 – reference: 7949911 - Proc Annu Symp Comput Appl Med Care. 1994;:157-61 – reference: 11079835 - Proc AMIA Symp. 2000;:12-6 – reference: 8594285 - Mayo Clin Proc. 1996 Mar;71(3):266-74 – reference: 7719797 - J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74 |
| SSID | ssj0016235 |
| Score | 2.2326016 |
| Snippet | Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 516 |
| SubjectTerms | Abstracting and Indexing as Topic - methods Artificial Intelligence Disease - classification Forms and Records Control - methods Humans International Classification of Diseases Medical Records Systems, Computerized Natural Language Processing Pilot Projects User-Computer Interface |
| Title | Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/16799125 https://www.proquest.com/docview/68836853 |
| Volume | 13 |
| WOSCitedRecordID | wos000240607400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELYKRYiF96M8PbAGSJzEjoSEKkTFQKsOIHWL_KwykBSSVvx87vIQE2JgiZQhcuQ73332ff6OkGuXCMtNYDwF5vVC8BhPmCCGdaVcIAWm9VrE9YVPJmI2S6Y9ct_dhUFaZRcT60BtCo1n5LexEKiVzh4WHx72jMLaattAY430GQAZJHTx2U8NARJ7VNc6IRREsPtqb-f5CUfNoUzejIM7zn9HlnWGGe387992yXaLLOmwcYU90rP5Ptkct7XzA7IaLqsC8Wk-p4D6KKDmbF5zAWjhqGkod1lJ8ZJ7SauCtpKrFKUulzj_JUWW_JzaL4mawh5mQENlbuh7Tcm0tO1BAQN00rDlIXkbPb0-Pntt1wVPh4JVnrMuTmDXwTQGIKUZmDMxofYjZWF9c8kTo6SAF8cjo8EF7pSS3IGZmdbGBUdkPS9ye0Io941wYcCdYlHIpZEcAKRSfhg5xVkcDshVN58peDWWKmRui2WZdjM6IMeNSdJFI76RYtkoAVR2-ue3Z2SrOS9BQtg56TtYz_aCbOhVlZWfl7WzwHMyHX8DFV_NzQ |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automating+the+assignment+of+diagnosis+codes+to+patient+encounters+using+example-based+and+machine+learning+techniques&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Pakhomov%2C+Serguei+V+S&rft.au=Buntrock%2C+James+D&rft.au=Chute%2C+Christopher+G&rft.date=2006-09-01&rft.issn=1067-5027&rft.volume=13&rft.issue=5&rft.spage=516&rft_id=info:doi/10.1197%2Fjamia.M2077&rft_id=info%3Apmid%2F16799125&rft_id=info%3Apmid%2F16799125&rft.externalDocID=16799125 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1067-5027&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1067-5027&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1067-5027&client=summon |