Semiautomated Extraction of Research Topics and Trends From National Cancer Institute Funding in Radiological Sciences From 2000 to 2020

Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) g...

Full description

Saved in:
Bibliographic Details
Published in:International journal of radiation oncology, biology, physics Vol. 122; no. 2; p. 458
Main Authors: Nguyen, Mark H, Beidler, Peter G, Tsai, Joseph, Anderson, August, Chen, Daniel, Kinahan, Paul E, Kang, John
Format: Journal Article
Language:English
Published: United States 01.06.2025
Subjects:
ISSN:1879-355X, 1879-355X
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing. We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends. We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000. We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.
AbstractList Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing.PURPOSEInvestigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing.We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends.METHODS AND MATERIALSWe selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends.We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000.RESULTSWe included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000.We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.CONCLUSIONSWe developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.
Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing. We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends. We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000. We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.
Author Anderson, August
Tsai, Joseph
Chen, Daniel
Kinahan, Paul E
Nguyen, Mark H
Beidler, Peter G
Kang, John
Author_xml – sequence: 1
  givenname: Mark H
  surname: Nguyen
  fullname: Nguyen, Mark H
  organization: University of Washington School of Medicine
– sequence: 2
  givenname: Peter G
  surname: Beidler
  fullname: Beidler, Peter G
  organization: University of Washington School of Medicine
– sequence: 3
  givenname: Joseph
  surname: Tsai
  fullname: Tsai, Joseph
  organization: Department of Radiation Oncology, University of Washington
– sequence: 4
  givenname: August
  surname: Anderson
  fullname: Anderson, August
  organization: Department of Radiation Oncology, University of Washington
– sequence: 5
  givenname: Daniel
  surname: Chen
  fullname: Chen, Daniel
  organization: Detroit Medical Center, Detroit, Michigan, USA
– sequence: 6
  givenname: Paul E
  surname: Kinahan
  fullname: Kinahan, Paul E
  organization: Department of Radiation Oncology, University of Washington; Department of Radiology, University of Washington, Seattle, Washington
– sequence: 7
  givenname: John
  surname: Kang
  fullname: Kang, John
  email: jkang3@uw.edu
  organization: Department of Radiation Oncology, University of Washington. Electronic address: jkang3@uw.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39870216$$D View this record in MEDLINE/PubMed
BookMark eNpN0M1KAzEQB_Agilr1DURy9NJ18rHp7lGK1YIo2AreymySaspuUpMs6Bv42K5YwdN_Dr8ZZmZE9n3wlpBzBgUDpq42hdvE0GwLDrwsgBUA9R45ZtWkHouyfNn_Vx-RUUobAGBsIg_JkairCXCmjsnXwnYO-xw6zNbQm48cUWcXPA1r-mSTxajf6DJsnU4UvaHLaL1JdBZDRx_wR2JLp-i1jXTuU3a5z5bOem-cf6XO0yc0LrTh1ekBLrSzA93182EjmsOQHE7JwRrbZM92eUKeZzfL6d34_vF2Pr2-H2uhZB5rEGutUNeN4RJrvRaiaqRBZjhWjIMBIYfDGlYi01DJulS6FFbVGhouQPITcvk7dxvDe29TXnUuadu26G3o00owBVLJStUDvdjRvumsWW2j6zB-rv6-x78BFk91Bw
ContentType Journal Article
Copyright Copyright © 2025 Elsevier Inc. All rights reserved.
Copyright_xml – notice: Copyright © 2025 Elsevier Inc. All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1016/j.ijrobp.2025.01.009
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1879-355X
ExternalDocumentID 39870216
Genre Journal Article
GeographicLocations United States
GeographicLocations_xml – name: United States
GroupedDBID ---
--K
.1-
.FO
0R~
1B1
1P~
1RT
1~5
4.4
457
4G.
5RE
7-5
AAEDT
AAEDW
AALRI
AAWTL
AAXUO
ABJNI
ABLJU
ABNEU
ABOCM
ABUDA
ACGFS
ACIUM
ADBBV
AENEX
AEVXI
AFCTW
AFJKZ
AFRHN
AFTJW
AGCQF
AHHHB
AITUG
AJUYK
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
BELOY
CGR
CUY
CVF
DU5
EBS
ECM
EFKBS
EIF
F5P
FDB
GBLVA
HED
HMO
IHE
J1W
KOM
LX3
M41
MO0
NPM
O9-
OC~
OO-
RNS
ROL
RPZ
SDG
SEL
SES
SEW
SSZ
UV1
XH2
Z5R
~S-
7X8
ID FETCH-LOGICAL-c364t-c03fc6ac9bd24a9cf338b4da1d2a8120d034216b15a1c084956c53e69c0b23042
IEDL.DBID 7X8
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001502780900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1879-355X
IngestDate Sat Sep 27 16:38:27 EDT 2025
Mon Jul 21 05:50:20 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License Copyright © 2025 Elsevier Inc. All rights reserved.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c364t-c03fc6ac9bd24a9cf338b4da1d2a8120d034216b15a1c084956c53e69c0b23042
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 39870216
PQID 3160464869
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3160464869
pubmed_primary_39870216
PublicationCentury 2000
PublicationDate 2025-06-01
PublicationDateYYYYMMDD 2025-06-01
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-06-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle International journal of radiation oncology, biology, physics
PublicationTitleAlternate Int J Radiat Oncol Biol Phys
PublicationYear 2025
SSID ssj0001174
Score 2.4793696
Snippet Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 458
SubjectTerms Algorithms
Biomedical Research - economics
Biomedical Research - trends
Humans
National Cancer Institute (U.S.) - economics
Natural Language Processing
Radiation Oncology - economics
Radiation Oncology - trends
Radiology - economics
Radiology - trends
Research Support as Topic - economics
Research Support as Topic - statistics & numerical data
Research Support as Topic - trends
United States
Title Semiautomated Extraction of Research Topics and Trends From National Cancer Institute Funding in Radiological Sciences From 2000 to 2020
URI https://www.ncbi.nlm.nih.gov/pubmed/39870216
https://www.proquest.com/docview/3160464869
Volume 122
WOSCitedRecordID wos001502780900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7Uinjx_X4wgtdgsknT7ElEWjzYUrRKb2WzD4hotqap-BP82c4mG3sTQQLJaSHszM5-u_PNN4RcqpDyDj5eWzNhRbVxSQnOPM4UDbXwE1b1jHy-7wwGyXjMhu7CbeZolU1MrAK1NMLekV-FQWyzcEnMrqfvnu0aZbOrroXGMmmFCGUspaszXqiFB7UKs22o7eG-Om5K5yp-V_ZSmNQqVtJauNP_BWRWm01v87-_uUU2HMyEm9ovtsmSynfIWt8l0nfJ16N6y_i8NAhYlYTuZ1nUFQ5gNDRsPBiZaSZmwHMJNXcWeoV5A6el_Qq31mUK-CEcQG9eFclAlsMDl1kTWMFFEDfe1u1AafBL_T3y1OuObu8815TBE2EclZ7w0YQxFyyVNOJMaDzjppHkgaQcwYIvraZgEKdBmwdoanv-Eu1QxUz4qb2ApvtkJTe5OiQgqVQyVAFuF1HElE4iSXWiaRroNiIjfUQumjmeoNPbTAbPlZnPJotZPiIHtaEm01qdY4Ju0EHgEh__YfQJWbf2r6lfp6SlccmrM7IqPspsVpxX3oTvwbD_Da3g1Ko
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semiautomated+Extraction+of+Research+Topics+and+Trends+From+National+Cancer+Institute+Funding+in+Radiological+Sciences+From+2000+to+2020&rft.jtitle=International+journal+of+radiation+oncology%2C+biology%2C+physics&rft.au=Nguyen%2C+Mark+H&rft.au=Beidler%2C+Peter+G&rft.au=Tsai%2C+Joseph&rft.au=Anderson%2C+August&rft.date=2025-06-01&rft.issn=1879-355X&rft.eissn=1879-355X&rft_id=info:doi/10.1016%2Fj.ijrobp.2025.01.009&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1879-355X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1879-355X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1879-355X&client=summon