Semiautomated Extraction of Research Topics and Trends From National Cancer Institute Funding in Radiological Sciences From 2000 to 2020
Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) g...
Saved in:
| Published in: | International journal of radiation oncology, biology, physics Vol. 122; no. 2; p. 458 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
01.06.2025
|
| Subjects: | |
| ISSN: | 1879-355X, 1879-355X |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing.
We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends.
We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000.
We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends. |
|---|---|
| AbstractList | Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing.PURPOSEInvestigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing.We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends.METHODS AND MATERIALSWe selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends.We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000.RESULTSWe included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000.We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.CONCLUSIONSWe developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends. Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing. We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends. We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000. We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends. |
| Author | Anderson, August Tsai, Joseph Chen, Daniel Kinahan, Paul E Nguyen, Mark H Beidler, Peter G Kang, John |
| Author_xml | – sequence: 1 givenname: Mark H surname: Nguyen fullname: Nguyen, Mark H organization: University of Washington School of Medicine – sequence: 2 givenname: Peter G surname: Beidler fullname: Beidler, Peter G organization: University of Washington School of Medicine – sequence: 3 givenname: Joseph surname: Tsai fullname: Tsai, Joseph organization: Department of Radiation Oncology, University of Washington – sequence: 4 givenname: August surname: Anderson fullname: Anderson, August organization: Department of Radiation Oncology, University of Washington – sequence: 5 givenname: Daniel surname: Chen fullname: Chen, Daniel organization: Detroit Medical Center, Detroit, Michigan, USA – sequence: 6 givenname: Paul E surname: Kinahan fullname: Kinahan, Paul E organization: Department of Radiation Oncology, University of Washington; Department of Radiology, University of Washington, Seattle, Washington – sequence: 7 givenname: John surname: Kang fullname: Kang, John email: jkang3@uw.edu organization: Department of Radiation Oncology, University of Washington. Electronic address: jkang3@uw.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39870216$$D View this record in MEDLINE/PubMed |
| BookMark | eNpN0M1KAzEQB_Agilr1DURy9NJ18rHp7lGK1YIo2AreymySaspuUpMs6Bv42K5YwdN_Dr8ZZmZE9n3wlpBzBgUDpq42hdvE0GwLDrwsgBUA9R45ZtWkHouyfNn_Vx-RUUobAGBsIg_JkairCXCmjsnXwnYO-xw6zNbQm48cUWcXPA1r-mSTxajf6DJsnU4UvaHLaL1JdBZDRx_wR2JLp-i1jXTuU3a5z5bOem-cf6XO0yc0LrTh1ekBLrSzA93182EjmsOQHE7JwRrbZM92eUKeZzfL6d34_vF2Pr2-H2uhZB5rEGutUNeN4RJrvRaiaqRBZjhWjIMBIYfDGlYi01DJulS6FFbVGhouQPITcvk7dxvDe29TXnUuadu26G3o00owBVLJStUDvdjRvumsWW2j6zB-rv6-x78BFk91Bw |
| ContentType | Journal Article |
| Copyright | Copyright © 2025 Elsevier Inc. All rights reserved. |
| Copyright_xml | – notice: Copyright © 2025 Elsevier Inc. All rights reserved. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1016/j.ijrobp.2025.01.009 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1879-355X |
| ExternalDocumentID | 39870216 |
| Genre | Journal Article |
| GeographicLocations | United States |
| GeographicLocations_xml | – name: United States |
| GroupedDBID | --- --K .1- .FO 0R~ 1B1 1P~ 1RT 1~5 4.4 457 4G. 5RE 7-5 AAEDT AAEDW AALRI AAWTL AAXUO ABJNI ABLJU ABNEU ABOCM ABUDA ACGFS ACIUM ADBBV AENEX AEVXI AFCTW AFJKZ AFRHN AFTJW AGCQF AHHHB AITUG AJUYK AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ BELOY CGR CUY CVF DU5 EBS ECM EFKBS EIF F5P FDB GBLVA HED HMO IHE J1W KOM LX3 M41 MO0 NPM O9- OC~ OO- RNS ROL RPZ SDG SEL SES SEW SSZ UV1 XH2 Z5R ~S- 7X8 |
| ID | FETCH-LOGICAL-c364t-c03fc6ac9bd24a9cf338b4da1d2a8120d034216b15a1c084956c53e69c0b23042 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001502780900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1879-355X |
| IngestDate | Sat Sep 27 16:38:27 EDT 2025 Mon Jul 21 05:50:20 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | Copyright © 2025 Elsevier Inc. All rights reserved. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c364t-c03fc6ac9bd24a9cf338b4da1d2a8120d034216b15a1c084956c53e69c0b23042 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 39870216 |
| PQID | 3160464869 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3160464869 pubmed_primary_39870216 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-06-01 |
| PublicationDateYYYYMMDD | 2025-06-01 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | International journal of radiation oncology, biology, physics |
| PublicationTitleAlternate | Int J Radiat Oncol Biol Phys |
| PublicationYear | 2025 |
| SSID | ssj0001174 |
| Score | 2.4793696 |
| Snippet | Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 458 |
| SubjectTerms | Algorithms Biomedical Research - economics Biomedical Research - trends Humans National Cancer Institute (U.S.) - economics Natural Language Processing Radiation Oncology - economics Radiation Oncology - trends Radiology - economics Radiology - trends Research Support as Topic - economics Research Support as Topic - statistics & numerical data Research Support as Topic - trends United States |
| Title | Semiautomated Extraction of Research Topics and Trends From National Cancer Institute Funding in Radiological Sciences From 2000 to 2020 |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39870216 https://www.proquest.com/docview/3160464869 |
| Volume | 122 |
| WOSCitedRecordID | wos001502780900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7Uinjx_X4wgtdgsknT7ElEWjzYUrRKb2WzD4hotqap-BP82c4mG3sTQQLJaSHszM5-u_PNN4RcqpDyDj5eWzNhRbVxSQnOPM4UDbXwE1b1jHy-7wwGyXjMhu7CbeZolU1MrAK1NMLekV-FQWyzcEnMrqfvnu0aZbOrroXGMmmFCGUspaszXqiFB7UKs22o7eG-Om5K5yp-V_ZSmNQqVtJauNP_BWRWm01v87-_uUU2HMyEm9ovtsmSynfIWt8l0nfJ16N6y_i8NAhYlYTuZ1nUFQ5gNDRsPBiZaSZmwHMJNXcWeoV5A6el_Qq31mUK-CEcQG9eFclAlsMDl1kTWMFFEDfe1u1AafBL_T3y1OuObu8815TBE2EclZ7w0YQxFyyVNOJMaDzjppHkgaQcwYIvraZgEKdBmwdoanv-Eu1QxUz4qb2ApvtkJTe5OiQgqVQyVAFuF1HElE4iSXWiaRroNiIjfUQumjmeoNPbTAbPlZnPJotZPiIHtaEm01qdY4Ju0EHgEh__YfQJWbf2r6lfp6SlccmrM7IqPspsVpxX3oTvwbD_Da3g1Ko |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semiautomated+Extraction+of+Research+Topics+and+Trends+From+National+Cancer+Institute+Funding+in+Radiological+Sciences+From+2000+to+2020&rft.jtitle=International+journal+of+radiation+oncology%2C+biology%2C+physics&rft.au=Nguyen%2C+Mark+H&rft.au=Beidler%2C+Peter+G&rft.au=Tsai%2C+Joseph&rft.au=Anderson%2C+August&rft.date=2025-06-01&rft.issn=1879-355X&rft.eissn=1879-355X&rft_id=info:doi/10.1016%2Fj.ijrobp.2025.01.009&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1879-355X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1879-355X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1879-355X&client=summon |