University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports

We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Kn...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Health informatics journal Ročník 20; číslo 4; s. 288
Hlavní autoři: Ashish, Naveen, Dahm, Lisa, Boicey, Charles
Médium: Journal Article
Jazyk:angličtina
Vydáno: England 01.12.2014
Témata:
ISSN:1741-2811, 1741-2811
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types.
AbstractList We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types.We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types.
We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from pathology reports, with the goal of populating the extracted data into a research data warehouse. Specifically, we have built upon Medical Knowledge Analysis Tool pipeline (MedKATp), which is an extraction framework focused on pathology reports. Our particular contributions include additional customization and development on MedKATp to extract data elements and relationships from cancer pathology reports in richer detail than at present, an abstraction layer that provides significantly easier configuration of MedKATp for extraction tasks, and a machine-learning-based approach that makes the extraction more resilient to deviations from the common reporting format in a pathology reports corpus. We present experimental results demonstrating the effectiveness of our pipeline for information extraction in a real-world task, demonstrating performance improvement due to our approach for increasing extractor resilience to format deviation, and finally demonstrating the scalability of the pipeline across pathology reports for different cancer types.
Author Dahm, Lisa
Ashish, Naveen
Boicey, Charles
Author_xml – sequence: 1
  givenname: Naveen
  surname: Ashish
  fullname: Ashish, Naveen
  email: nashish@loni.usc.edu
  organization: University of California, Irvine, USA nashish@loni.usc.edu
– sequence: 2
  givenname: Lisa
  surname: Dahm
  fullname: Dahm, Lisa
  organization: University of California, Irvine, USA
– sequence: 3
  givenname: Charles
  surname: Boicey
  fullname: Boicey, Charles
  organization: University of California, Irvine, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25155030$$D View this record in MEDLINE/PubMed
BookMark eNpNkD9PwzAQxS1URGlhZ0IeGQj4b5yyoapApUp0oHPkxBdqlNjBcSv6OfjCRFCgy907vd97w43QwHkHCF1QckOpUrdUpETIjFEuJoJwdoROqRI0YRmlgwM9RKOueyOEcCL5CRoySaXsj1P0uXJ2C6GzcYd9hae6tpUPzuprPA9b6yBZ6rj2tX_d4dlHDLqM1ju8tC3UvXuH4xpw-4fAP9LuEdz3Yev62ehv44Cpgm8O0gFaH2J3ho4rXXdwvt9jtHqYvUyfksXz43x6v0hKrkhMirTUtGIAhhEtOOVGUOClMSaTUhnGJ4IZWSgDkFaaGk3STChVFhVPi0JxNkZXP71t8O8b6GLe2K6EutYO_KbLacoUURmhpEcv9-imaMDkbbCNDrv895HsC6pZe7g
CitedBy_id crossref_primary_10_1186_s12911_018_0609_7
crossref_primary_10_1136_bmjhci_2025_101521
crossref_primary_10_1016_j_jbi_2017_07_012
crossref_primary_10_2196_12239
crossref_primary_10_2196_70257
crossref_primary_10_1136_jclinpath_2016_203872
crossref_primary_10_2196_42477
crossref_primary_10_1145_3490234
crossref_primary_10_2196_13331
crossref_primary_10_1186_s12911_019_0783_2
crossref_primary_10_4103_jpi_jpi_55_17
crossref_primary_10_1186_s12874_022_01583_z
ContentType Journal Article
Copyright The Author(s) 2014.
Copyright_xml – notice: The Author(s) 2014.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1177/1460458213494032
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
Nursing
EISSN 1741-2811
ExternalDocumentID 25155030
Genre Journal Article
GeographicLocations California
GeographicLocations_xml – name: California
GroupedDBID ---
-TM
.2G
.DC
01A
0R~
29I
31S
31Y
4.4
53G
54M
5GY
5VS
77K
AABMB
AABOD
AACKU
AADUE
AAGGD
AAJIQ
AAJOX
AAJPV
AAKTJ
AAMFR
AANSI
AAPEO
AAQDB
AAQXH
AARDL
AARIX
AASGM
AAWLO
AAWTL
AAYTG
ABAWP
ABCCA
ABDWY
ABEIX
ABFWQ
ABHKI
ABKRH
ABPGX
ABQKF
ABQXT
ABRHV
ABVFX
ABYTW
ACAEP
ACARO
ACDSZ
ACDXX
ACFMA
ACGBL
ACGFS
ACLHI
ACOFE
ACROE
ACRPL
ACUFS
ADBBV
ADEIA
ADMLS
ADNMO
ADOGD
ADPEE
ADTBJ
ADUKL
ADYCS
AECVZ
AENEX
AEOBU
AEQLS
AERKM
AESMA
AEUHG
AEWDL
AEXNY
AFCOW
AFEET
AFKBI
AFKRG
AFRWT
AFUIA
AFWMB
AGNHF
AHBZF
AHHFK
AJEFB
AJMMQ
AJUZI
ALMA_UNASSIGNED_HOLDINGS
ARTOV
AUTPY
AUVAJ
AYAKG
AZFZN
B8O
B93
BDDNI
BDZRT
BMVBW
BSEHC
BYIEH
CAG
CBRKF
CCGJY
CEADM
CFDXU
CGR
COF
CORYS
CQQTX
CS3
CUY
CVF
DC-
DC.
DD-
DD0
DD~
DE-
DF.
DG.
DOPDO
D~Y
EBS
ECM
EIF
EIHBH
EJD
F5P
FEDTE
GROUPED_DOAJ
GROUPED_SAGE_PREMIER_JOURNAL_COLLECTION
H13
HF~
HVGLF
HZ~
IAO
IEA
IER
IHR
INH
INR
IVC
J8X
K.F
K.J
M4V
N9A
NPM
O9-
OK1
P.B
Q1R
Q7K
Q7X
Q82
RIG
ROL
S01
SAUOL
SBI
SCDPB
SCNPE
SFB
SFC
SGA
SGP
SGX
SQCSI
SSDHQ
UCV
XH6
ZONMY
ZPLXX
ZPPRI
ZRKOI
~32
77I
7X8
ACHEB
ADEBD
ID FETCH-LOGICAL-c370t-b6ca1f2eed20a4313d41e3cddd8557d23942d5b7dee6fa1da068477cbf36bb732
IEDL.DBID 7X8
ISICitedReferencesCount 15
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000345339000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1741-2811
IngestDate Thu Sep 04 17:51:48 EDT 2025
Thu Apr 03 07:08:33 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords evidence-based practice
decision-support systems
databases and data mining
information and knowledge management
Clinical decision-making
Language English
License The Author(s) 2014.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c370t-b6ca1f2eed20a4313d41e3cddd8557d23942d5b7dee6fa1da068477cbf36bb732
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 25155030
PQID 1627078010
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1627078010
pubmed_primary_25155030
PublicationCentury 2000
PublicationDate 2014-12-01
PublicationDateYYYYMMDD 2014-12-01
PublicationDate_xml – month: 12
  year: 2014
  text: 2014-12-01
  day: 01
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Health informatics journal
PublicationTitleAlternate Health Informatics J
PublicationYear 2014
SSID ssj0003053
Score 2.064788
Snippet We describe Pathology Extraction Pipeline (PEP)--a new Open Health Natural Language Processing pipeline that we have developed for information extraction from...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 288
SubjectTerms Academic Medical Centers
California
Data Mining - methods
Decision Support Systems, Clinical
Electronic Health Records - utilization
Evidence-Based Practice
Female
Hospitals, University
Humans
Information Storage and Retrieval - methods
Male
Natural Language Processing
Neoplasms - pathology
Pathology, Clinical - methods
Systems Integration
Title University of California, Irvine-Pathology Extraction Pipeline: the pathology extraction pipeline for information extraction from pathology reports
URI https://www.ncbi.nlm.nih.gov/pubmed/25155030
https://www.proquest.com/docview/1627078010
Volume 20
WOSCitedRecordID wos000345339000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LS8MwGA_qVLz4mK_5IoJHw5o0bVovIrKh4MYOKruNvAq7dHWbon-H_7Bf2sx6EQQvvfRLKcmXX375nghdaMqFYTIjWghJuNKCSJ4o2PFxljDLU1masp8fRL-fDIfpwBvcZj6scoGJJVCbiXY28jaNmStMA9eH6-KFuK5RzrvqW2gso0YIVMaFdIlhXS0cdDmsEiIpYQmltZuyDQDhXISunlnKA9d85DeCWR403a3__uI22vQUE99UOrGDlmzeROs970RvojVvIdhFn3VYBp5kuM7TusT3ACG5JQM5L-HxA3fe59MqCQIPxoXLYrdXGNgjLr5FbC1SeBEM38O-OGv54oeMy2z5Mdp7L_bQU7fzeHtHfJcGokMRzImKtaQZg7OWBRLoSGg4taE2xiRRBHoQppyZSAljbZxJamQQw4kotMrCWCkRsn20kk9ye4gwNUbCDSfQgmke6CR1DbNUpGwqdBoJ2kLni4kfwS5wrg2Z28nrbFRPfQsdVKs3KqpyHSPmutiALhz9YfQx2gBGxKt4lRPUyAAD7Cla1W_z8Wx6VqoXPPuD3hd2Yd2U
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=University+of+California%2C+Irvine-Pathology+Extraction+Pipeline%3A+the+pathology+extraction+pipeline+for+information+extraction+from+pathology+reports&rft.jtitle=Health+informatics+journal&rft.au=Ashish%2C+Naveen&rft.au=Dahm%2C+Lisa&rft.au=Boicey%2C+Charles&rft.date=2014-12-01&rft.eissn=1741-2811&rft.volume=20&rft.issue=4&rft.spage=288&rft_id=info:doi/10.1177%2F1460458213494032&rft_id=info%3Apmid%2F25155030&rft_id=info%3Apmid%2F25155030&rft.externalDocID=25155030
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1741-2811&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1741-2811&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1741-2811&client=summon