University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports
We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Kn...
Uloženo v:
| Vydáno v: | Health informatics journal Ročník 20; číslo 4; s. 288 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
01.12.2014
|
| Témata: | |
| ISSN: | 1741-2811, 1741-2811 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types. |
|---|---|
| AbstractList | We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types.We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types. We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types. |
| Author | Dahm, Lisa Ashish, Naveen Boicey, Charles |
| Author_xml | – sequence: 1 givenname: Naveen surname: Ashish fullname: Ashish, Naveen email: nashish@loni.usc.edu organization: University of California, Irvine, USA nashish@loni.usc.edu – sequence: 2 givenname: Lisa surname: Dahm fullname: Dahm, Lisa organization: University of California, Irvine, USA – sequence: 3 givenname: Charles surname: Boicey fullname: Boicey, Charles organization: University of California, Irvine, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25155030$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkD9PwzAQxS1URGlhZ0IeGQj4b5yyoapApUp0oHPkxBdqlNjBcSv6OfjCRFCgy907vd97w43QwHkHCF1QckOpUrdUpETIjFEuJoJwdoROqRI0YRmlgwM9RKOueyOEcCL5CRoySaXsj1P0uXJ2C6GzcYd9hae6tpUPzuprPA9b6yBZ6rj2tX_d4dlHDLqM1ju8tC3UvXuH4xpw-4fAP9LuEdz3Yev62ehv44Cpgm8O0gFaH2J3ho4rXXdwvt9jtHqYvUyfksXz43x6v0hKrkhMirTUtGIAhhEtOOVGUOClMSaTUhnGJ4IZWSgDkFaaGk3STChVFhVPi0JxNkZXP71t8O8b6GLe2K6EutYO_KbLacoUURmhpEcv9-imaMDkbbCNDrv895HsC6pZe7g |
| CitedBy_id | crossref_primary_10_1186_s12911_018_0609_7 crossref_primary_10_1136_bmjhci_2025_101521 crossref_primary_10_1016_j_jbi_2017_07_012 crossref_primary_10_2196_12239 crossref_primary_10_2196_70257 crossref_primary_10_1136_jclinpath_2016_203872 crossref_primary_10_2196_42477 crossref_primary_10_1145_3490234 crossref_primary_10_2196_13331 crossref_primary_10_1186_s12911_019_0783_2 crossref_primary_10_4103_jpi_jpi_55_17 crossref_primary_10_1186_s12874_022_01583_z |
| ContentType | Journal Article |
| Copyright | The Author(s) 2014. |
| Copyright_xml | – notice: The Author(s) 2014. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1177/1460458213494032 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine Nursing |
| EISSN | 1741-2811 |
| ExternalDocumentID | 25155030 |
| Genre | Journal Article |
| GeographicLocations | California |
| GeographicLocations_xml | – name: California |
| GroupedDBID | --- -TM .2G .DC 01A 0R~ 29I 31S 31Y 4.4 53G 54M 5GY 5VS 77K AABMB AABOD AACKU AADUE AAGGD AAJIQ AAJOX AAJPV AAKTJ AAMFR AANSI AAPEO AAQDB AAQXH AARDL AARIX AASGM AAWLO AAWTL AAYTG ABAWP ABCCA ABDWY ABEIX ABFWQ ABHKI ABKRH ABPGX ABQKF ABQXT ABRHV ABVFX ABYTW ACAEP ACARO ACDSZ ACDXX ACFMA ACGBL ACGFS ACLHI ACOFE ACROE ACRPL ACUFS ADBBV ADEIA ADMLS ADNMO ADOGD ADPEE ADTBJ ADUKL ADYCS AECVZ AENEX AEOBU AEQLS AERKM AESMA AEUHG AEWDL AEXNY AFCOW AFEET AFKBI AFKRG AFRWT AFUIA AFWMB AGNHF AHBZF AHHFK AJEFB AJMMQ AJUZI ALMA_UNASSIGNED_HOLDINGS ARTOV AUTPY AUVAJ AYAKG AZFZN B8O B93 BDDNI BDZRT BMVBW BSEHC BYIEH CAG CBRKF CCGJY CEADM CFDXU CGR COF CORYS CQQTX CS3 CUY CVF DC- DC. DD- DD0 DD~ DE- DF. DG. DOPDO D~Y EBS ECM EIF EIHBH EJD F5P FEDTE GROUPED_DOAJ GROUPED_SAGE_PREMIER_JOURNAL_COLLECTION H13 HF~ HVGLF HZ~ IAO IEA IER IHR INH INR IVC J8X K.F K.J M4V N9A NPM O9- OK1 P.B Q1R Q7K Q7X Q82 RIG ROL S01 SAUOL SBI SCDPB SCNPE SFB SFC SGA SGP SGX SQCSI SSDHQ UCV XH6 ZONMY ZPLXX ZPPRI ZRKOI ~32 77I 7X8 ACHEB ADEBD |
| ID | FETCH-LOGICAL-c370t-b6ca1f2eed20a4313d41e3cddd8557d23942d5b7dee6fa1da068477cbf36bb732 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 15 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000345339000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1741-2811 |
| IngestDate | Thu Sep 04 17:51:48 EDT 2025 Thu Apr 03 07:08:33 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | evidence-based practice decision-support systems databases and data mining information and knowledge management Clinical decision-making |
| Language | English |
| License | The Author(s) 2014. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c370t-b6ca1f2eed20a4313d41e3cddd8557d23942d5b7dee6fa1da068477cbf36bb732 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 25155030 |
| PQID | 1627078010 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1627078010 pubmed_primary_25155030 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-12-01 |
| PublicationDateYYYYMMDD | 2014-12-01 |
| PublicationDate_xml | – month: 12 year: 2014 text: 2014-12-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Health informatics journal |
| PublicationTitleAlternate | Health Informatics J |
| PublicationYear | 2014 |
| SSID | ssj0003053 |
| Score | 2.064788 |
| Snippet | We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 288 |
| SubjectTerms | Academic Medical Centers California Data Mining - methods Decision Support Systems, Clinical Electronic Health Records - utilization Evidence-Based Practice Female Hospitals, University Humans Information Storage and Retrieval - methods Male Natural Language Processing Neoplasms - pathology Pathology, Clinical - methods Systems Integration |
| Title | University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/25155030 https://www.proquest.com/docview/1627078010 |
| Volume | 20 |
| WOSCitedRecordID | wos000345339000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LS8MwGA_qVLz4mK_5IoJHw5o0bVovIrKh4MYOKruNvAq7dHWbon-H_7Bf2sx6EQQvvfRLKcmXX375nghdaMqFYTIjWghJuNKCSJ4o2PFxljDLU1masp8fRL-fDIfpwBvcZj6scoGJJVCbiXY28jaNmStMA9eH6-KFuK5RzrvqW2gso0YIVMaFdIlhXS0cdDmsEiIpYQmltZuyDQDhXISunlnKA9d85DeCWR403a3__uI22vQUE99UOrGDlmzeROs970RvojVvIdhFn3VYBp5kuM7TusT3ACG5JQM5L-HxA3fe59MqCQIPxoXLYrdXGNgjLr5FbC1SeBEM38O-OGv54oeMy2z5Mdp7L_bQU7fzeHtHfJcGokMRzImKtaQZg7OWBRLoSGg4taE2xiRRBHoQppyZSAljbZxJamQQw4kotMrCWCkRsn20kk9ye4gwNUbCDSfQgmke6CR1DbNUpGwqdBoJ2kLni4kfwS5wrg2Z28nrbFRPfQsdVKs3KqpyHSPmutiALhz9YfQx2gBGxKt4lRPUyAAD7Cla1W_z8Wx6VqoXPPuD3hd2Yd2U |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=University+of+California%2C+Irvine-Pathology+Extraction+Pipeline%3A+the+pathology+extraction+pipeline+for+information+extraction+from+pathology+reports&rft.jtitle=Health+informatics+journal&rft.au=Ashish%2C+Naveen&rft.au=Dahm%2C+Lisa&rft.au=Boicey%2C+Charles&rft.date=2014-12-01&rft.eissn=1741-2811&rft.volume=20&rft.issue=4&rft.spage=288&rft_id=info:doi/10.1177%2F1460458213494032&rft_id=info%3Apmid%2F25155030&rft_id=info%3Apmid%2F25155030&rft.externalDocID=25155030 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1741-2811&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1741-2811&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1741-2811&client=summon |