A survey of word embeddings for clinical text

[Display omitted] •We survey methods of representing clinical text using neural networks.•We provide a “how-to” guide for training these representations on clinical text.•We describe word models, corpora, evaluation methods, and applications. Representing words as numerical vectors based on the cont...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of biomedical informatics Ročník 100; s. 100057
Hlavní autori: Khattak, Faiza Khan, Jeblee, Serena, Pou-Prom, Chloé, Abdalla, Mohamed, Meaney, Christopher, Rudzicz, Frank
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Inc 01.01.2019
Predmet:
ISSN:1532-0464, 1532-0480, 1532-0480
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract [Display omitted] •We survey methods of representing clinical text using neural networks.•We provide a “how-to” guide for training these representations on clinical text.•We describe word models, corpora, evaluation methods, and applications. Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.
AbstractList [Display omitted] •We survey methods of representing clinical text using neural networks.•We provide a “how-to” guide for training these representations on clinical text.•We describe word models, corpora, evaluation methods, and applications. Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.
Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.
ArticleNumber 100057
Author Abdalla, Mohamed
Khattak, Faiza Khan
Pou-Prom, Chloé
Jeblee, Serena
Meaney, Christopher
Rudzicz, Frank
Author_xml – sequence: 1
  givenname: Faiza Khan
  surname: Khattak
  fullname: Khattak, Faiza Khan
  email: faizakk@cs.toronto.edu
  organization: Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
– sequence: 2
  givenname: Serena
  surname: Jeblee
  fullname: Jeblee, Serena
  email: sjeblee@cs.toronto.edu
  organization: Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
– sequence: 3
  givenname: Chloé
  surname: Pou-Prom
  fullname: Pou-Prom, Chloé
  email: poupromc@smh.ca
  organization: Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
– sequence: 4
  givenname: Mohamed
  surname: Abdalla
  fullname: Abdalla, Mohamed
  email: mohamed.abdalla@mail.utoronto.ca
  organization: Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
– sequence: 5
  givenname: Christopher
  surname: Meaney
  fullname: Meaney, Christopher
  email: christopher.meaney@utoronto.ca
  organization: Department of Biostatistics, University of Toronto, Toronto, Ontario, Canada
– sequence: 6
  givenname: Frank
  orcidid: 0000-0002-9902-0583
  surname: Rudzicz
  fullname: Rudzicz, Frank
  email: frank@cs.toronto.edu
  organization: Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
BookMark eNqFkDtPwzAUhS1UJFrgHzBkZEm5dh52GJCqipdUiQVmy3GukaPULnZa2n9PqiAGBpju0dX5zvDNyMR5h4RcUZhToOVNOz-0tXX7OQNaDS-Agp-QKS0ylkIuYPKTy_yMzGJsASgtinJK0kUSt2GHh8Sb5NOHJsF1jU1j3XtMjA-J7qyzWnVJj_v-gpwa1UW8_L7n5O3h_nX5lK5eHp-Xi1WqMwCeVqISuRCCC1E2DLiqKANWCW4QQZTANTe1MjmrS8ZUqXllmhwyGOo55wqyc3I97m6C_9hi7OXaRo1dpxz6bZSsKGkuOIAYqrdjVQcfY0Ajte1Vb73rg7KdpCCPjmQrR0fy6EiOjgY4_wVvgl2rcPgPuxsxHBzsLAYZtUWnsbEBdS8bb_8e-ALRVIH8
CitedBy_id crossref_primary_10_1016_j_jbi_2021_103982
crossref_primary_10_3390_info12120491
crossref_primary_10_1109_ACCESS_2024_3460976
crossref_primary_10_1016_j_engappai_2025_110827
crossref_primary_10_1007_s00521_020_05211_z
crossref_primary_10_3390_computers13090236
crossref_primary_10_1109_TCSS_2023_3322002
crossref_primary_10_1371_journal_pone_0283800
crossref_primary_10_1016_j_neucom_2025_129638
crossref_primary_10_1055_a_2521_4372
crossref_primary_10_1146_annurev_biodatasci_030421_030931
crossref_primary_10_3390_su13179775
crossref_primary_10_1007_s42979_021_00656_y
crossref_primary_10_1109_TR_2024_3513834
crossref_primary_10_1016_j_engappai_2025_110142
crossref_primary_10_1515_cllt_2024_0070
crossref_primary_10_1016_j_neucom_2024_128263
crossref_primary_10_1016_j_neucom_2025_130575
crossref_primary_10_1007_s41870_022_01123_4
crossref_primary_10_2196_43014
crossref_primary_10_1109_ACCESS_2023_3326757
crossref_primary_10_3390_app131910725
crossref_primary_10_1093_jamia_ocab236
crossref_primary_10_1186_s40537_021_00429_7
crossref_primary_10_3390_ijerph17197054
crossref_primary_10_1016_j_jbi_2023_104403
crossref_primary_10_1038_s41598_025_04651_8
crossref_primary_10_1007_s41666_023_00125_6
crossref_primary_10_1007_s41870_023_01338_z
crossref_primary_10_3389_fdgth_2021_778305
crossref_primary_10_2196_45171
crossref_primary_10_3390_app12042179
crossref_primary_10_1007_s40593_023_00375_w
crossref_primary_10_1515_bams_2021_0117
crossref_primary_10_1093_comjnl_bxae004
crossref_primary_10_1145_3524887
crossref_primary_10_1038_s41598_021_93018_w
crossref_primary_10_1145_3626523
crossref_primary_10_3389_fgene_2021_569120
crossref_primary_10_1093_jssam_smad015
crossref_primary_10_1155_2022_3524090
crossref_primary_10_7717_peerj_cs_1985
crossref_primary_10_1016_j_jbi_2021_103902
crossref_primary_10_1186_s12911_022_01850_5
crossref_primary_10_1371_journal_pone_0248663
crossref_primary_10_1007_s11277_022_09646_6
crossref_primary_10_1109_ACCESS_2021_3115617
crossref_primary_10_1109_TETC_2020_2983404
crossref_primary_10_1109_ACCESS_2023_3335196
crossref_primary_10_1080_00051144_2021_1922150
crossref_primary_10_24054_rcta_v2i44_3018
crossref_primary_10_1007_s11192_023_04689_3
crossref_primary_10_3390_bdcc7010046
crossref_primary_10_1016_j_inffus_2025_103503
crossref_primary_10_1109_ACCESS_2023_3268165
crossref_primary_10_1016_j_patrec_2020_12_013
crossref_primary_10_1109_ACCESS_2024_3521279
crossref_primary_10_1109_ACCESS_2025_3532397
crossref_primary_10_1016_j_rser_2024_114705
crossref_primary_10_2478_ijssis_2022_0002
crossref_primary_10_1371_journal_pone_0276539
crossref_primary_10_1186_s12911_022_01924_4
crossref_primary_10_1016_j_procs_2021_05_078
crossref_primary_10_2196_22651
crossref_primary_10_1007_s10579_022_09620_5
crossref_primary_10_1007_s41666_024_00159_4
crossref_primary_10_1177_00031348221117039
crossref_primary_10_1016_j_compbiomed_2021_104433
crossref_primary_10_1038_s41598_024_75331_2
crossref_primary_10_3390_app11052045
crossref_primary_10_1007_s11042_022_14043_z
crossref_primary_10_3389_fpsyg_2024_1401084
crossref_primary_10_1001_jamanetworkopen_2025_26339
crossref_primary_10_3390_app10217557
crossref_primary_10_1016_j_jbi_2021_103971
crossref_primary_10_1145_3611651
crossref_primary_10_2196_31063
crossref_primary_10_1016_j_surg_2024_03_006
crossref_primary_10_3390_su15054216
crossref_primary_10_1007_s13042_025_02627_8
crossref_primary_10_1016_j_jbi_2021_103898
crossref_primary_10_1186_s13040_024_00373_1
crossref_primary_10_1177_20552076231212296
crossref_primary_10_1016_j_eswa_2022_118034
crossref_primary_10_1007_s42979_020_00164_5
crossref_primary_10_1177_20563051231186368
crossref_primary_10_1051_e3sconf_201914003006
crossref_primary_10_2196_24020
crossref_primary_10_1080_08839514_2024_2423326
crossref_primary_10_2196_21679
crossref_primary_10_1016_j_nlp_2023_100026
crossref_primary_10_1109_ACCESS_2024_3409818
crossref_primary_10_1038_s41531_022_00422_8
crossref_primary_10_7717_peerj_cs_1163
crossref_primary_10_1093_jamia_ocac216
Cites_doi 10.3115/v1/P14-1023
10.3115/v1/P14-2050
10.18653/v1/P18-1031
10.1109/JBHI.2016.2633963
10.1016/j.jbi.2006.06.004
10.1109/TASLP.2018.2837384
10.2139/ssrn.3064761
10.1006/jbin.2001.1029
10.1016/j.jbi.2010.10.004
10.1038/sdata.2016.35
10.1371/journal.pone.0192360
10.1136/amiajnl-2011-000203
10.1162/COLI_a_00237
10.1073/pnas.1516047113
10.1186/gb-2008-9-s2-s2
10.1197/jamia.M2408
10.1016/j.jbi.2015.07.010
10.1186/s12911-017-0498-1
10.3115/1620754.1620758
10.1109/GRC.2006.1635880
ContentType Journal Article
Copyright 2019 The Author(s)
Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.
Copyright_xml – notice: 2019 The Author(s)
– notice: Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.
DBID 6I.
AAFTH
AAYXX
CITATION
7X8
DOI 10.1016/j.yjbinx.2019.100057
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Engineering
Public Health
EISSN 1532-0480
ExternalDocumentID 10_1016_j_yjbinx_2019_100057
S2590177X19300563
GroupedDBID ---
--K
--M
-~X
.DC
.GJ
.~1
0R~
1B1
1RT
1~.
1~5
29J
4.4
457
4G.
53G
5GY
5VS
6I.
7-5
71M
8P~
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAWTL
AAXUO
AAYFN
ABBOA
ABBQC
ABFRF
ABJNI
ABLVK
ABMAC
ABMZM
ABVKL
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADMUD
AEBSH
AEFWE
AEKER
AENEX
AEXQZ
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGYEJ
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AJRQY
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ANZVX
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BAWUL
BKOJK
BLXMC
BNPGV
CAG
COF
CS3
DIK
DM4
DU5
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HVGLF
HZ~
IHE
IXB
J1W
KOM
LCYCR
LG5
M41
MO0
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSH
SSV
SSZ
T5K
UAP
UHS
UNMZH
XPP
ZGI
ZMT
ZU3
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACIEU
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7X8
ID FETCH-LOGICAL-c3007-989848887886d207a91202987fee08607c7fbaf42b622a6c79fd4030886477a03
ISSN 1532-0464
1532-0480
IngestDate Sun Sep 28 09:31:52 EDT 2025
Sat Nov 29 06:51:52 EST 2025
Tue Nov 18 21:46:49 EST 2025
Fri Feb 23 02:44:22 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Clinical data
Natural language processing
Word embeddings
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c3007-989848887886d207a91202987fee08607c7fbaf42b622a6c79fd4030886477a03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
ORCID 0000-0002-9902-0583
OpenAccessLink https://dx.doi.org/10.1016/j.yjbinx.2019.100057
PQID 2561487008
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2561487008
crossref_citationtrail_10_1016_j_yjbinx_2019_100057
crossref_primary_10_1016_j_yjbinx_2019_100057
elsevier_sciencedirect_doi_10_1016_j_yjbinx_2019_100057
PublicationCentury 2000
PublicationDate 20190101
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – month: 01
  year: 2019
  text: 20190101
  day: 01
PublicationDecade 2010
PublicationTitle Journal of biomedical informatics
PublicationYear 2019
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Chiu, Korhonen, Pyysalo (b0330) 2016
Pakhomov, Pedersen, McInnes, Melton, Ruggieri, Chute (b0380) 2011; 44
D. Nelson, C. McEvoy, T. Schreiber, The university of south florida word association, rhyme, and word fragment norms.
G. Lample, A. Conneau, Cross-lingual language model pretraining, arXiv preprint arXiv:1901.07291.
Maaten, Hinton (b0140) 2008; 9
Nickel, Kiela (b0105) 2017
Tsvetkov, Faruqui, Ling, Lample, Dyer (b0335) 2015
H. Nguyen, H. Al-Mubaid, New ontology-based semantic similarity measure for the biomedical domain, 2006, pp. 623 – 628.
Y. Si, J. Wang, H. Xu, K. Roberts, Enhancing Clinical Concept Extraction with Contextual Embedding, JAMIA (in press) arXiv:1902.08691.
Zhu, Kiros, Zemel, Salakhutdinov, Urtasun, Torralba, Fidler (b0070) 2015
Uzuner, Goldstein, Luo, Kohane (b0235) 2008; 15
Levy, Goldberg (b0460) 2014
Huang, Xu, Vydiswaran (b0230) 2016
Hill, Reichart, Korhonen (b0315) 2015; 41
J. Howard, S. Ruder, Universal language model fine-tuning for text classification, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2018, pp. 328–339.
Radford, Wu, Child, Luan, Amodei, Sutskever (b0120) 2019; 1
L. De Vine, M. Kholghi, G. Zuccon, L. Sitbon, A. Nguyen, Analysis of word embeddings and sequence features for clinical information extraction, 2015.
M. Baroni, G. Dinu, G. Kruszewski, Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2014, pp. 238–247.
Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., Google’s neural machine translation system: Bridging the gap between human and machine translation, arXiv preprint arXiv:1609.08144.
Pham, Tran, Phung, Venkatesh (b0195) 2016
X. Rong, word2vec parameter learning explained, arXiv preprint arXiv:1411.2738.
McDonald, Ramscar (b0010) 2001; vol. 23
F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S.J. Gershman, D. O’Brien, S. Shieber, J. Waldo, D. Weinberger, A. Wood, Accountability of AI Under the Law: The Role of Explanation, 2017. arXiv:1711.01134, doi:10.2139/ssrn.3064761.
L.K. Şenel, İhsan Utlu, V. Yücesoy, A. Koç, T. Çukur, Semantic structure and interpretability of word embeddings, IEEE/ACM Trans. Audio Speech Language Process. (2018).
Pakhomov, McInnes, Adams, Liu, Pedersen, Melton (b0300) 2010
Szarvas, Vincze, Farkas, Csirik (b0220) 2008
Smith, Tanabe, nee Ando, Kuo, Chung, Hsu, Lin, Klinger, Friedrich, Ganchev (b0365) 2008; 9
Chapman, Bridewell, Hanbury, Cooper, Buchanan (b0225) 2001; 34
A.L. Beam, B. Kompa, I. Fried, N. Palmer, X. Shi, T. Cai, I.S. Kohane, Clinical Concept Embeddings Learned from Massive Sources of Medical Data, arXiv, 2018, pp. 1–27 arXiv:1804.01486. URL
Pedersen, Pakhomov, Patwardhan, Chute (b0255) 2007; 40
A. Hliaoutakis, Semantic similarity measures in mesh ontology and their application to information retrieval on medline, Master’s thesis, 2005.
Nguyen, Tran, Wickramasinghe, Venkatesh (b0190) 2017; 21
Zhu, Yan, Wang (b0210) 2017; 17
,
Mikolov, Sutskever, Chen, Corrado, Dean (b0020) 2013
.
Shin, Lu, Kim, Seff, Yao, Summers (b0155) 2015
Y. Peng, S. Yan, Z. Lu, Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets, arXiv preprint arXiv:1906.05474.
Yu, Cohen, Bernstam, Johnson, Wallace (b0260) 2016
De Vries, Nayak, Kutty, Geva, Tagarelli (b0390) 2010
Chiu, Crichton, Korhonen, Pyysalo (b0360) 2016
Miller, Leacock, Tengi, Bunker (b0340) 1993
Gehrmann, Dernoncourt, Li, Carlson, Wu, Welt, Foote, Moseley, Grant, Tyler (b0180) 2018; 13
Hoffman, Trawalter, Axt, Oliver (b0425) 2016; 113
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, J. Kang, Biobert: pre-trained biomedical language representation model for biomedical text mining, arXiv preprint arXiv:1901.08746.
T. Bolukbasi, K.-W. Chang, J.Y. Zou, V. Saligrama, A.T. Kalai, Man is to computer programmer as woman is to homemaker? debiasing word embeddings, in: Advances in Neural Information Processing Systems, 2016, pp. 4349–4357.
Peters, Neumann, Iyyer, Gardner, Clark, Lee, Zettlemoyer (b0055) 2018
Nam, Mencía, Fürnkranz (b0295) 2016
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b0065) 2017
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q.V. Le, Xlnet: Generalized autoregressive pretraining for language understanding, arXiv preprint arXiv:1906.08237.
C. Culnane, B.I.P. Rubinstein, V. Teague, Health data in an open world, CoRR abs/1712.05627. arXiv:1712.05627.
Dwork, McSherry, Nissim, Smith (b0450) 2006
Voorhees, Hersh (b0240) 2012
Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, H. Liu, A comparison of word embeddings for the biomedical natural language processing, arXiv preprint arXiv:1802.00400.
W. Boag, H. Kané, AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus arXiv:1712.01460.
E. Craig, C. Arias, D. Gillman, Predicting readmission risk from doctors’ notes, arXiv preprint arXiv:1711.10663.
Uzuner, South, Shen, DuVall (b0405) 2011; 18
Zhao, Masino, Yang (b0215) 2018
Bruni, Tran, Marco (b0325) 2013; 49
A.C. Kozlowski, M. Taddy, J.A. Evans, The geometry of culture: Analyzing meaning through word embeddings, arXiv preprint arXiv:1803.09288.
Rogers, Bodenreider (b0310) 2008
Arthur, Vassilvitskii (b0385) 2007
B. Athiwaratkun, A.G. Wilson, A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings, arXiv preprint arXiv:1806.02901.
S. Dubois, N. Romano, Learning effective embeddings from medical notes, arXiv preprint arXiv:1705.07025.
Faruqui, Dodge, Jauhar, Dyer, Hovy, Smith (b0265) 2015
O. Levy, Y. Goldberg, Dependency-based word embeddings, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014, pp. 302–308.
I. Beltagy, A. Cohan, K. Lo, Scibert: Pretrained contextualized embeddings for scientific text, arXiv preprint arXiv:1903.10676.
K. Huang, J. Altosaar, R. Ranganath, Clinicalbert: Modeling clinical notes and predicting hospital readmission, arXiv preprint arXiv:1904.05342.
Kholghi, De Vine, Sitbon, Zuccon, Nguyen (b0170) 2016
Moen, Ananiadou (b0205) 2013
B.T. McInnes, T. Pedersen, S.V.S. Pakhomov, UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity, vol. 2009, American Medical Informatics Association, 2009, pp. 431–435.
Y. Choi, C.Y.-I. Chiu, D. Sontag, Learning Low-Dimensional Representations of Medical Concepts, vol. 2016, American Medical Informatics Association, 2016, pp. 41.
Alsentzer, Murphy, Boag, Weng, Jindi, Naumann, McDermott (b0080) 2019
Johnson, Pollard, Shen, Lehman, Feng, Ghassemi, Moody, Szolovits, Anthony Celi, Mark, Celi, Mark (b0090) 2016; 3
Mikolov, Yih, Zweig (b0025) 2013
Leaman, Khare, Lu (b0005) 2015; 57
Patel, Patel, Golakiya, Bhattacharyya, Birari (b0175) 2017; 2017
H. Zhu, I.C. Paschalidis, A. Tahmasebi, Clinical concept extraction with contextual word embedding, arXiv preprint arXiv:1810.10566.
J.-B. Escudié, A. Saade, A. Coucke, M. Lelarge, Deep representation for patient visits from electronic health records, arXiv preprint arXiv:1803.09533.
S. Pradhan, N. Elhadad, B.R. South, D. Martinez, L.M. Christensen, A. Vogel, H. Suominen, W.W. Chapman, G.K. Savova, Task 1: Share/clef ehealth evaluation lab 2013, in: CLEF (Working Notes), 2013.
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.
Pennington, Socher, Manning (b0040) 2014
Kim, Ohta, Tsuruoka, Tateisi, Collier (b0370) 2004
Fellbaum (b0345) 1998
Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, H. Wu, Ernie: Enhanced representation through knowledge integration, arXiv preprint arXiv:1904.09223.
Bhattacharyya (b0440) 1943; 35
Le, Mikolov (b0035) 2014
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606.
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805.
Finlayson, LePendu, Shah (b0285) 2014; 1
E.L. Mencia, G. de Melo, J. Nam, Medical Concept Embeddings via Labeled Background Corpora, 2016, pp. 4629–4636. URL
W. Ammar, D. Groeneveld, C. Bhagavatula, I. Beltagy, M. Crawford, D. Downey, J. Dunkelberger, A. Elgohary, S. Feldman, V. Ha, et al., Construction of the literature graph in semantic scholar, arXiv preprint arXiv:1805.02262.
Socher, Perelygin, Wu, Chuang, Manning, Ng, Potts (b0350) 2013
E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, A. Soroa, A study on similarity and relatedness using distributional and wordnet-based approaches, in: Proceedings of NAACL-HLT 2009, (2009).
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding with unsupervised learning, Tech. Rep., Technical Report, OpenAI, 2018.
Patel (10.1016/j.yjbinx.2019.100057_b0175) 2017; 2017
Yu (10.1016/j.yjbinx.2019.100057_b0260) 2016
10.1016/j.yjbinx.2019.100057_b0085
10.1016/j.yjbinx.2019.100057_b0160
10.1016/j.yjbinx.2019.100057_b0280
Mikolov (10.1016/j.yjbinx.2019.100057_b0020) 2013
Bhattacharyya (10.1016/j.yjbinx.2019.100057_b0440) 1943; 35
Le (10.1016/j.yjbinx.2019.100057_b0035) 2014
Tsvetkov (10.1016/j.yjbinx.2019.100057_b0335) 2015
Alsentzer (10.1016/j.yjbinx.2019.100057_b0080) 2019
Voorhees (10.1016/j.yjbinx.2019.100057_b0240) 2012
Rogers (10.1016/j.yjbinx.2019.100057_b0310) 2008
Nguyen (10.1016/j.yjbinx.2019.100057_b0190) 2017; 21
Miller (10.1016/j.yjbinx.2019.100057_b0340) 1993
Hoffman (10.1016/j.yjbinx.2019.100057_b0425) 2016; 113
Leaman (10.1016/j.yjbinx.2019.100057_b0005) 2015; 57
Hill (10.1016/j.yjbinx.2019.100057_b0315) 2015; 41
Kim (10.1016/j.yjbinx.2019.100057_b0370) 2004
10.1016/j.yjbinx.2019.100057_b0435
10.1016/j.yjbinx.2019.100057_b0115
10.1016/j.yjbinx.2019.100057_b0355
10.1016/j.yjbinx.2019.100057_b0430
10.1016/j.yjbinx.2019.100057_b0110
Shin (10.1016/j.yjbinx.2019.100057_b0155) 2015
10.1016/j.yjbinx.2019.100057_b0275
10.1016/j.yjbinx.2019.100057_b0395
10.1016/j.yjbinx.2019.100057_b0075
10.1016/j.yjbinx.2019.100057_b0030
10.1016/j.yjbinx.2019.100057_b0150
10.1016/j.yjbinx.2019.100057_b0270
Chiu (10.1016/j.yjbinx.2019.100057_b0330) 2016
Peters (10.1016/j.yjbinx.2019.100057_b0055) 2018
Maaten (10.1016/j.yjbinx.2019.100057_b0140) 2008; 9
Dwork (10.1016/j.yjbinx.2019.100057_b0450) 2006
Chapman (10.1016/j.yjbinx.2019.100057_b0225) 2001; 34
10.1016/j.yjbinx.2019.100057_b0305
10.1016/j.yjbinx.2019.100057_b0465
10.1016/j.yjbinx.2019.100057_b0145
10.1016/j.yjbinx.2019.100057_b0420
10.1016/j.yjbinx.2019.100057_b0100
Pakhomov (10.1016/j.yjbinx.2019.100057_b0380) 2011; 44
Pham (10.1016/j.yjbinx.2019.100057_b0195) 2016
Mikolov (10.1016/j.yjbinx.2019.100057_b0025) 2013
10.1016/j.yjbinx.2019.100057_b0185
Uzuner (10.1016/j.yjbinx.2019.100057_b0235) 2008; 15
Faruqui (10.1016/j.yjbinx.2019.100057_b0265) 2015
10.1016/j.yjbinx.2019.100057_b0060
Finlayson (10.1016/j.yjbinx.2019.100057_b0285) 2014; 1
Nickel (10.1016/j.yjbinx.2019.100057_b0105) 2017
Socher (10.1016/j.yjbinx.2019.100057_b0350) 2013
Radford (10.1016/j.yjbinx.2019.100057_b0120) 2019; 1
Fellbaum (10.1016/j.yjbinx.2019.100057_b0345) 1998
De Vries (10.1016/j.yjbinx.2019.100057_b0390) 2010
Bruni (10.1016/j.yjbinx.2019.100057_b0325) 2013; 49
10.1016/j.yjbinx.2019.100057_b0415
Pennington (10.1016/j.yjbinx.2019.100057_b0040) 2014
Moen (10.1016/j.yjbinx.2019.100057_b0205) 2013
10.1016/j.yjbinx.2019.100057_b0015
10.1016/j.yjbinx.2019.100057_b0455
10.1016/j.yjbinx.2019.100057_b0135
10.1016/j.yjbinx.2019.100057_b0410
10.1016/j.yjbinx.2019.100057_b0375
Zhao (10.1016/j.yjbinx.2019.100057_b0215) 2018
10.1016/j.yjbinx.2019.100057_b0130
Gehrmann (10.1016/j.yjbinx.2019.100057_b0180) 2018; 13
Pedersen (10.1016/j.yjbinx.2019.100057_b0255) 2007; 40
10.1016/j.yjbinx.2019.100057_b0250
10.1016/j.yjbinx.2019.100057_b0095
10.1016/j.yjbinx.2019.100057_b0050
10.1016/j.yjbinx.2019.100057_b0290
Levy (10.1016/j.yjbinx.2019.100057_b0460) 2014
Smith (10.1016/j.yjbinx.2019.100057_b0365) 2008; 9
Nam (10.1016/j.yjbinx.2019.100057_b0295) 2016
McDonald (10.1016/j.yjbinx.2019.100057_b0010) 2001; vol. 23
Arthur (10.1016/j.yjbinx.2019.100057_b0385) 2007
Vaswani (10.1016/j.yjbinx.2019.100057_b0065) 2017
Zhu (10.1016/j.yjbinx.2019.100057_b0210) 2017; 17
Chiu (10.1016/j.yjbinx.2019.100057_b0360) 2016
Szarvas (10.1016/j.yjbinx.2019.100057_b0220) 2008
Zhu (10.1016/j.yjbinx.2019.100057_b0070) 2015
10.1016/j.yjbinx.2019.100057_b0445
10.1016/j.yjbinx.2019.100057_b0125
10.1016/j.yjbinx.2019.100057_b0400
Johnson (10.1016/j.yjbinx.2019.100057_b0090) 2016; 3
10.1016/j.yjbinx.2019.100057_b0245
Kholghi (10.1016/j.yjbinx.2019.100057_b0170) 2016
10.1016/j.yjbinx.2019.100057_b0200
10.1016/j.yjbinx.2019.100057_b0045
Huang (10.1016/j.yjbinx.2019.100057_b0230) 2016
Pakhomov (10.1016/j.yjbinx.2019.100057_b0300) 2010
10.1016/j.yjbinx.2019.100057_b0320
10.1016/j.yjbinx.2019.100057_b0165
Uzuner (10.1016/j.yjbinx.2019.100057_b0405) 2011; 18
References_xml – reference: H. Zhu, I.C. Paschalidis, A. Tahmasebi, Clinical concept extraction with contextual word embedding, arXiv preprint arXiv:1810.10566.
– reference: A. Hliaoutakis, Semantic similarity measures in mesh ontology and their application to information retrieval on medline, Master’s thesis, 2005.
– volume: 44
  start-page: 251
  year: 2011
  end-page: 265
  ident: b0380
  article-title: Towards a framework for developing semantic relatedness reference standards
  publication-title: J. Biomed. Informat.
– reference: Y. Choi, C.Y.-I. Chiu, D. Sontag, Learning Low-Dimensional Representations of Medical Concepts, vol. 2016, American Medical Informatics Association, 2016, pp. 41.
– start-page: 1188
  year: 2014
  end-page: 1196
  ident: b0035
  article-title: Distributed representations of sentences and documents
  publication-title: International Conference on Machine Learning
– reference: W. Boag, H. Kané, AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus arXiv:1712.01460.
– reference: E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, A. Soroa, A study on similarity and relatedness using distributional and wordnet-based approaches, in: Proceedings of NAACL-HLT 2009, (2009).
– reference: C. Culnane, B.I.P. Rubinstein, V. Teague, Health data in an open world, CoRR abs/1712.05627. arXiv:1712.05627.
– reference: T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.
– volume: 2017
  start-page: 302
  year: 2017
  end-page: 306
  ident: b0175
  article-title: Adapting pre-trained word embeddings for use in medical coding
  publication-title: BioNLP
– reference: J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, J. Kang, Biobert: pre-trained biomedical language representation model for biomedical text mining, arXiv preprint arXiv:1901.08746.
– reference: E. Craig, C. Arias, D. Gillman, Predicting readmission risk from doctors’ notes, arXiv preprint arXiv:1711.10663.
– volume: 40
  start-page: 288
  year: 2007
  end-page: 299
  ident: b0255
  article-title: Measures of semantic similarity and relatedness in the biomedical domain
  publication-title: J. Biomed. Informat.
– reference: Y. Si, J. Wang, H. Xu, K. Roberts, Enhancing Clinical Concept Extraction with Contextual Embedding, JAMIA (in press) arXiv:1902.08691.
– volume: 41
  start-page: 665
  year: 2015
  end-page: 695
  ident: b0315
  article-title: Simlex-999: Evaluating semantic models with (Genuine) similarity estimation
  publication-title: Comput. Linguist.
– year: 2015
  ident: b0265
  article-title: Retrofitting word vectors to semantic lexicons
  publication-title: Proceedings of NAACL-HLT
– volume: 1
  year: 2019
  ident: b0120
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAI Blog
– reference: Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., Google’s neural machine translation system: Bridging the gap between human and machine translation, arXiv preprint arXiv:1609.08144.
– reference: Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q.V. Le, Xlnet: Generalized autoregressive pretraining for language understanding, arXiv preprint arXiv:1906.08237.
– volume: 49
  start-page: 1
  year: 2013
  end-page: 47
  ident: b0325
  article-title: Multimodal distributional semantics
  publication-title: J. Artif. Intell. Res.
– start-page: 746
  year: 2013
  end-page: 751
  ident: b0025
  article-title: Linguistic regularities in continuous space word representations
  publication-title: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
– volume: 34
  start-page: 301
  year: 2001
  end-page: 310
  ident: b0225
  article-title: A simple algorithm for identifying negated findings and diseases in discharge summaries
  publication-title: J. Biomed. Informat.
– reference: B.T. McInnes, T. Pedersen, S.V.S. Pakhomov, UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity, vol. 2009, American Medical Informatics Association, 2009, pp. 431–435.
– start-page: 2049
  year: 2015
  end-page: 2054
  ident: b0335
  article-title: Evaluation of Word Vector Representations by Subspace Alignment
  publication-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September 2015 (September) (2015)
– reference: Y. Peng, S. Yan, Z. Lu, Transfer learning in biomedical natural language processing: An evaluation of bert and elmo on ten benchmarking datasets, arXiv preprint arXiv:1906.05474.
– reference: B. Athiwaratkun, A.G. Wilson, A. Anandkumar, Probabilistic fasttext for multi-sense word embeddings, arXiv preprint arXiv:1806.02901.
– start-page: 156
  year: 2018
  end-page: 160
  ident: b0215
  article-title: A framework for developing and evaluating word embeddings of drug-named entity
  publication-title: Proceedings of the BioNLP 2018
– reference: S. Dubois, N. Romano, Learning effective embeddings from medical notes, arXiv preprint arXiv:1705.07025.
– reference: S. Pradhan, N. Elhadad, B.R. South, D. Martinez, L.M. Christensen, A. Vogel, H. Suominen, W.W. Chapman, G.K. Savova, Task 1: Share/clef ehealth evaluation lab 2013, in: CLEF (Working Notes), 2013.
– volume: 1
  start-page: 1
  year: 2014
  end-page: 9
  ident: b0285
  article-title: Building the graph of medicine from millions of clinical narratives
  publication-title: Sci. Data
– reference: L.K. Şenel, İhsan Utlu, V. Yücesoy, A. Koç, T. Çukur, Semantic structure and interpretability of word embeddings, IEEE/ACM Trans. Audio Speech Language Process. (2018).
– reference: Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, H. Liu, A comparison of word embeddings for the biomedical natural language processing, arXiv preprint arXiv:1802.00400.
– year: 1998
  ident: b0345
  article-title: WordNet: An Electronic Lexical Database
– start-page: 1027
  year: 2007
  end-page: 1035
  ident: b0385
  article-title: k-means++: The advantages of careful seeding
  publication-title: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
– reference: L. De Vine, M. Kholghi, G. Zuccon, L. Sitbon, A. Nguyen, Analysis of word embeddings and sequence features for clinical information extraction, 2015.
– start-page: 38
  year: 2008
  end-page: 45
  ident: b0220
  article-title: The bioscope corpus: annotation for negation, uncertainty and their scope in biomedical texts
  publication-title: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
– reference: O. Levy, Y. Goldberg, Dependency-based word embeddings, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014, pp. 302–308.
– reference: K. Huang, J. Altosaar, R. Ranganath, Clinicalbert: Modeling clinical notes and predicting hospital readmission, arXiv preprint arXiv:1904.05342.
– volume: 15
  start-page: 14
  year: 2008
  end-page: 24
  ident: b0235
  article-title: Identifying patient smoking status from medical discharge records
  publication-title: J. Am. Med. Inform. Assoc.
– reference: .
– start-page: 1090
  year: 2015
  end-page: 1099
  ident: b0155
  article-title: Interleaved text/image deep mining on a very large-scale radiology database
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 265
  year: 2006
  end-page: 284
  ident: b0450
  article-title: Calibrating noise to sensitivity in private data analysis
  publication-title: Theory Cryptography Conference
– volume: 21
  start-page: 22
  year: 2017
  end-page: 30
  ident: b0190
  article-title: Deepr: A convolutional net for medical records
  publication-title: IEEE J. Biomed. Health Informat.
– reference: W. Ammar, D. Groeneveld, C. Bhagavatula, I. Beltagy, M. Crawford, D. Downey, J. Dunkelberger, A. Elgohary, S. Feldman, V. Ha, et al., Construction of the literature graph in semantic scholar, arXiv preprint arXiv:1805.02262.
– reference: H. Nguyen, H. Al-Mubaid, New ontology-based semantic similarity measure for the biomedical domain, 2006, pp. 623 – 628.
– start-page: 30
  year: 2008
  end-page: 36
  ident: b0310
  article-title: Snomed ct: Browsing the browsers
  publication-title: KR-MED
– reference: Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, H. Wu, Ernie: Enhanced representation through knowledge integration, arXiv preprint arXiv:1904.09223.
– reference: ,
– start-page: 303
  year: 1993
  end-page: 308
  ident: b0340
  article-title: A semantic concordance
  publication-title: Proceedings of Human Language Technologies
– year: 2012
  ident: b0240
  article-title: Overview of the trec 2012 medical records track
  publication-title: TREC
– reference: G. Lample, A. Conneau, Cross-lingual language model pretraining, arXiv preprint arXiv:1901.07291.
– start-page: 363
  year: 2010
  end-page: 376
  ident: b0390
  article-title: Overview of the inex 2010 xml mining track: Clustering and classification of xml documents
  publication-title: International Workshop of the Initiative for the Evaluation of XML Retrieval
– volume: 57
  start-page: 28
  year: 2015
  end-page: 37
  ident: b0005
  article-title: Challenges in clinical natural language processing for automated disorder normalization
  publication-title: J. Biomed. Inform.
– reference: J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805.
– volume: 17
  start-page: 95
  year: 2017
  ident: b0210
  article-title: Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec
  publication-title: BMC Med. Inform. Decis. Mak.
– reference: D. Nelson, C. McEvoy, T. Schreiber, The university of south florida word association, rhyme, and word fragment norms.
– volume: 13
  start-page: e0192360
  year: 2018
  ident: b0180
  article-title: Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
  publication-title: PloS One
– reference: J. Howard, S. Ruder, Universal language model fine-tuning for text classification, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2018, pp. 328–339.
– volume: 35
  start-page: 99
  year: 1943
  end-page: 109
  ident: b0440
  article-title: On a measure of divergence between two statistical populations defined by their probability distributions
  publication-title: Bull. Calcutta Math. Soc.
– reference: X. Rong, word2vec parameter learning explained, arXiv preprint arXiv:1411.2738.
– reference: J.-B. Escudié, A. Saade, A. Coucke, M. Lelarge, Deep representation for patient visits from electronic health records, arXiv preprint arXiv:1803.09533.
– start-page: 43
  year: 2016
  end-page: 51
  ident: b0260
  publication-title: Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures
– reference: M. Baroni, G. Dinu, G. Kruszewski, Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2014, pp. 238–247.
– year: 2018
  ident: b0055
  article-title: Deep contextualized word representations
  publication-title: Proc. of NAACL
– reference: F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S.J. Gershman, D. O’Brien, S. Shieber, J. Waldo, D. Weinberger, A. Wood, Accountability of AI Under the Law: The Role of Explanation, 2017. arXiv:1711.01134, doi:10.2139/ssrn.3064761.
– start-page: 70
  year: 2004
  end-page: 75
  ident: b0370
  article-title: Introduction to the bio-entity recognition task at jnlpba
  publication-title: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
– year: 2013
  ident: b0350
  article-title: Recursive deep models for semantic compositionality over a sentiment treebank
  publication-title: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)
– year: 2010
  ident: b0300
  article-title: Semantic similarity and relatedness between clinical terms: An experimental study
  publication-title: Proceedings of the Annual Symposium of the American Medical Informatics Association
– reference: A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding with unsupervised learning, Tech. Rep., Technical Report, OpenAI, 2018.
– volume: 113
  start-page: 4296
  year: 2016
  end-page: 4301
  ident: b0425
  article-title: Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites
  publication-title: Proc. Nat. Acad. Sci.
– start-page: 6338
  year: 2017
  end-page: 6347
  ident: b0105
  article-title: Poincaré embeddings for learning hierarchical representations
  publication-title: Adv. Neural Informat. Process. Syst.
– reference: E.L. Mencia, G. de Melo, J. Nam, Medical Concept Embeddings via Labeled Background Corpora, 2016, pp. 4629–4636. URL
– start-page: 2177
  year: 2014
  end-page: 2185
  ident: b0460
  article-title: Neural word embedding as implicit matrix factorization
  publication-title: Adv. Neural Informat. Process. Syst.
– reference: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606.
– volume: 9
  start-page: S2
  year: 2008
  ident: b0365
  article-title: Overview of biocreative ii gene mention recognition
  publication-title: Genome Biol.
– start-page: 3111
  year: 2013
  end-page: 3119
  ident: b0020
  article-title: Distributed representations of words and phrases and their compositionality
  publication-title: Adv. Neural Informat. Process. Syst.
– start-page: 1
  year: 2016
  end-page: 6
  ident: b0330
  article-title: Intrinsic evaluation of word vectors fails to predict extrinsic performance
  publication-title: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP
– start-page: 1948
  year: 2016
  end-page: 1954
  ident: b0295
  article-title: All-in Text: learning document, label, and word representations jointly
  publication-title: Thirtieth AAAI Conference on Artificial Intelligence
– volume: 9
  start-page: 2579
  year: 2008
  end-page: 2605
  ident: b0140
  article-title: Visualizing data using t-sne
  publication-title: J. Machine Learn. Res.
– start-page: 527
  year: 2016
  end-page: 533
  ident: b0230
  article-title: Analyzing multiple medical corpora using word embedding
  publication-title: 2016 IEEE International Conference on Healthcare Informatics (ICHI)
– start-page: 1532
  year: 2014
  end-page: 1543
  ident: b0040
  article-title: Glove: Global vectors for word representation
  publication-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
– reference: .
– volume: vol. 23
  year: 2001
  ident: b0010
  article-title: Testing the distributioanl hypothesis: The influence of context on judgements of semantic similarity
  publication-title: Proceedings of the Annual Meeting of the Cognitive Science Society
– volume: 18
  start-page: 552
  year: 2011
  end-page: 556
  ident: b0405
  article-title: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text
  publication-title: J. Am. Med. Inform. Assoc.
– reference: A.C. Kozlowski, M. Taddy, J.A. Evans, The geometry of culture: Analyzing meaning through word embeddings, arXiv preprint arXiv:1803.09288.
– start-page: 5998
  year: 2017
  end-page: 6008
  ident: b0065
  article-title: Attention is all you need
  publication-title: Adv. Neural Informat. Process. Syst.
– reference: A.L. Beam, B. Kompa, I. Fried, N. Palmer, X. Shi, T. Cai, I.S. Kohane, Clinical Concept Embeddings Learned from Massive Sources of Medical Data, arXiv, 2018, pp. 1–27 arXiv:1804.01486. URL
– start-page: 25
  year: 2016
  end-page: 34
  ident: b0170
  article-title: The benefits of word embeddings features for active learning in clinical information extraction
  publication-title: Proceedings of the Australasian Language Technology Association Workshop 2016
– start-page: 19
  year: 2015
  end-page: 27
  ident: b0070
  article-title: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 39
  year: 2013
  end-page: 43
  ident: b0205
  article-title: Distributional semantics resources for biomedical text processing
  publication-title: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan
– start-page: 166
  year: 2016
  end-page: 174
  ident: b0360
  article-title: How to train good word embeddings for biomedical nlp
  publication-title: Proceedings of the 15th Workshop on Biomedical Natural Language Processing
– reference: T. Bolukbasi, K.-W. Chang, J.Y. Zou, V. Saligrama, A.T. Kalai, Man is to computer programmer as woman is to homemaker? debiasing word embeddings, in: Advances in Neural Information Processing Systems, 2016, pp. 4349–4357.
– reference: I. Beltagy, A. Cohan, K. Lo, Scibert: Pretrained contextualized embeddings for scientific text, arXiv preprint arXiv:1903.10676.
– start-page: 72
  year: 2019
  end-page: 78
  ident: b0080
  article-title: Publicly available clinical BERT embeddings
  publication-title: Proceedings of the 2nd Clinical Natural Language Processing Workshop
– start-page: 30
  year: 2016
  end-page: 41
  ident: b0195
  article-title: Deepcare: A deep dynamic memory model for predictive medicine
  publication-title: Pacific-Asia Conference on Knowledge Discovery and Data Mining
– volume: 3
  start-page: 160035
  year: 2016
  ident: b0090
  article-title: MIMIC-III, a freely accessible critical care database
  publication-title: Sci. Data
– start-page: 30
  year: 2008
  ident: 10.1016/j.yjbinx.2019.100057_b0310
  article-title: Snomed ct: Browsing the browsers
– ident: 10.1016/j.yjbinx.2019.100057_b0125
– ident: 10.1016/j.yjbinx.2019.100057_b0150
– year: 1998
  ident: 10.1016/j.yjbinx.2019.100057_b0345
– ident: 10.1016/j.yjbinx.2019.100057_b0400
– ident: 10.1016/j.yjbinx.2019.100057_b0455
  doi: 10.3115/v1/P14-1023
– start-page: 19
  year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0070
  article-title: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
– ident: 10.1016/j.yjbinx.2019.100057_b0250
  doi: 10.3115/v1/P14-2050
– volume: 1
  issue: 8
  year: 2019
  ident: 10.1016/j.yjbinx.2019.100057_b0120
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAI Blog
– ident: 10.1016/j.yjbinx.2019.100057_b0465
– ident: 10.1016/j.yjbinx.2019.100057_b0110
  doi: 10.18653/v1/P18-1031
– ident: 10.1016/j.yjbinx.2019.100057_b0135
– start-page: 38
  year: 2008
  ident: 10.1016/j.yjbinx.2019.100057_b0220
  article-title: The bioscope corpus: annotation for negation, uncertainty and their scope in biomedical texts
– ident: 10.1016/j.yjbinx.2019.100057_b0410
– volume: 2017
  start-page: 302
  year: 2017
  ident: 10.1016/j.yjbinx.2019.100057_b0175
  article-title: Adapting pre-trained word embeddings for use in medical coding
  publication-title: BioNLP
– year: 2013
  ident: 10.1016/j.yjbinx.2019.100057_b0350
  article-title: Recursive deep models for semantic compositionality over a sentiment treebank
– volume: 21
  start-page: 22
  issue: 1
  year: 2017
  ident: 10.1016/j.yjbinx.2019.100057_b0190
  article-title: Deepr: A convolutional net for medical records
  publication-title: IEEE J. Biomed. Health Informat.
  doi: 10.1109/JBHI.2016.2633963
– year: 2012
  ident: 10.1016/j.yjbinx.2019.100057_b0240
  article-title: Overview of the trec 2012 medical records track
– start-page: 1
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0330
  article-title: Intrinsic evaluation of word vectors fails to predict extrinsic performance
– ident: 10.1016/j.yjbinx.2019.100057_b0305
– ident: 10.1016/j.yjbinx.2019.100057_b0060
– start-page: 156
  year: 2018
  ident: 10.1016/j.yjbinx.2019.100057_b0215
  article-title: A framework for developing and evaluating word embeddings of drug-named entity
– ident: 10.1016/j.yjbinx.2019.100057_b0045
– volume: 40
  start-page: 288
  issue: 3
  year: 2007
  ident: 10.1016/j.yjbinx.2019.100057_b0255
  article-title: Measures of semantic similarity and relatedness in the biomedical domain
  publication-title: J. Biomed. Informat.
  doi: 10.1016/j.jbi.2006.06.004
– ident: 10.1016/j.yjbinx.2019.100057_b0435
  doi: 10.1109/TASLP.2018.2837384
– ident: 10.1016/j.yjbinx.2019.100057_b0430
  doi: 10.2139/ssrn.3064761
– volume: 34
  start-page: 301
  issue: 5
  year: 2001
  ident: 10.1016/j.yjbinx.2019.100057_b0225
  article-title: A simple algorithm for identifying negated findings and diseases in discharge summaries
  publication-title: J. Biomed. Informat.
  doi: 10.1006/jbin.2001.1029
– year: 2018
  ident: 10.1016/j.yjbinx.2019.100057_b0055
  article-title: Deep contextualized word representations
– volume: 44
  start-page: 251
  issue: 2
  year: 2011
  ident: 10.1016/j.yjbinx.2019.100057_b0380
  article-title: Towards a framework for developing semantic relatedness reference standards
  publication-title: J. Biomed. Informat.
  doi: 10.1016/j.jbi.2010.10.004
– ident: 10.1016/j.yjbinx.2019.100057_b0200
– volume: 3
  start-page: 160035
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0090
  article-title: MIMIC-III, a freely accessible critical care database
  publication-title: Sci. Data
  doi: 10.1038/sdata.2016.35
– ident: 10.1016/j.yjbinx.2019.100057_b0355
– ident: 10.1016/j.yjbinx.2019.100057_b0050
– ident: 10.1016/j.yjbinx.2019.100057_b0185
– volume: 1
  start-page: 1
  issue: 140032
  year: 2014
  ident: 10.1016/j.yjbinx.2019.100057_b0285
  article-title: Building the graph of medicine from millions of clinical narratives
  publication-title: Sci. Data
– ident: 10.1016/j.yjbinx.2019.100057_b0160
– ident: 10.1016/j.yjbinx.2019.100057_b0375
– start-page: 2177
  year: 2014
  ident: 10.1016/j.yjbinx.2019.100057_b0460
  article-title: Neural word embedding as implicit matrix factorization
– ident: 10.1016/j.yjbinx.2019.100057_b0075
– start-page: 70
  year: 2004
  ident: 10.1016/j.yjbinx.2019.100057_b0370
  article-title: Introduction to the bio-entity recognition task at jnlpba
– volume: 35
  start-page: 99
  year: 1943
  ident: 10.1016/j.yjbinx.2019.100057_b0440
  article-title: On a measure of divergence between two statistical populations defined by their probability distributions
  publication-title: Bull. Calcutta Math. Soc.
– ident: 10.1016/j.yjbinx.2019.100057_b0030
– volume: 9
  start-page: 2579
  issue: Nov
  year: 2008
  ident: 10.1016/j.yjbinx.2019.100057_b0140
  article-title: Visualizing data using t-sne
  publication-title: J. Machine Learn. Res.
– volume: 13
  start-page: e0192360
  issue: 2
  year: 2018
  ident: 10.1016/j.yjbinx.2019.100057_b0180
  article-title: Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
  publication-title: PloS One
  doi: 10.1371/journal.pone.0192360
– ident: 10.1016/j.yjbinx.2019.100057_b0290
– ident: 10.1016/j.yjbinx.2019.100057_b0115
– ident: 10.1016/j.yjbinx.2019.100057_b0245
– year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0265
  article-title: Retrofitting word vectors to semantic lexicons
– start-page: 265
  year: 2006
  ident: 10.1016/j.yjbinx.2019.100057_b0450
  article-title: Calibrating noise to sensitivity in private data analysis
– ident: 10.1016/j.yjbinx.2019.100057_b0415
– start-page: 303
  year: 1993
  ident: 10.1016/j.yjbinx.2019.100057_b0340
  article-title: A semantic concordance
– start-page: 1090
  year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0155
  article-title: Interleaved text/image deep mining on a very large-scale radiology database
– start-page: 527
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0230
  article-title: Analyzing multiple medical corpora using word embedding
– start-page: 1532
  year: 2014
  ident: 10.1016/j.yjbinx.2019.100057_b0040
  article-title: Glove: Global vectors for word representation
– volume: 18
  start-page: 552
  issue: 5
  year: 2011
  ident: 10.1016/j.yjbinx.2019.100057_b0405
  article-title: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1136/amiajnl-2011-000203
– volume: 41
  start-page: 665
  issue: 4
  year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0315
  article-title: Simlex-999: Evaluating semantic models with (Genuine) similarity estimation
  publication-title: Comput. Linguist.
  doi: 10.1162/COLI_a_00237
– ident: 10.1016/j.yjbinx.2019.100057_b0280
– ident: 10.1016/j.yjbinx.2019.100057_b0100
– start-page: 5998
  year: 2017
  ident: 10.1016/j.yjbinx.2019.100057_b0065
  article-title: Attention is all you need
– volume: 113
  start-page: 4296
  issue: 16
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0425
  article-title: Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites
  publication-title: Proc. Nat. Acad. Sci.
  doi: 10.1073/pnas.1516047113
– volume: 9
  start-page: S2
  issue: 2
  year: 2008
  ident: 10.1016/j.yjbinx.2019.100057_b0365
  article-title: Overview of biocreative ii gene mention recognition
  publication-title: Genome Biol.
  doi: 10.1186/gb-2008-9-s2-s2
– ident: 10.1016/j.yjbinx.2019.100057_b0165
– ident: 10.1016/j.yjbinx.2019.100057_b0095
– start-page: 43
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0260
  publication-title: Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures
– start-page: 39
  year: 2013
  ident: 10.1016/j.yjbinx.2019.100057_b0205
  article-title: Distributional semantics resources for biomedical text processing
– volume: 15
  start-page: 14
  issue: 1
  year: 2008
  ident: 10.1016/j.yjbinx.2019.100057_b0235
  article-title: Identifying patient smoking status from medical discharge records
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1197/jamia.M2408
– volume: 57
  start-page: 28
  year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0005
  article-title: Challenges in clinical natural language processing for automated disorder normalization
  publication-title: J. Biomed. Inform.
  doi: 10.1016/j.jbi.2015.07.010
– start-page: 30
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0195
  article-title: Deepcare: A deep dynamic memory model for predictive medicine
– ident: 10.1016/j.yjbinx.2019.100057_b0270
– start-page: 3111
  year: 2013
  ident: 10.1016/j.yjbinx.2019.100057_b0020
  article-title: Distributed representations of words and phrases and their compositionality
– start-page: 25
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0170
  article-title: The benefits of word embeddings features for active learning in clinical information extraction
– ident: 10.1016/j.yjbinx.2019.100057_b0085
– start-page: 746
  year: 2013
  ident: 10.1016/j.yjbinx.2019.100057_b0025
  article-title: Linguistic regularities in continuous space word representations
– ident: 10.1016/j.yjbinx.2019.100057_b0015
– start-page: 1948
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0295
  article-title: All-in Text: learning document, label, and word representations jointly
– volume: vol. 23
  year: 2001
  ident: 10.1016/j.yjbinx.2019.100057_b0010
  article-title: Testing the distributioanl hypothesis: The influence of context on judgements of semantic similarity
– ident: 10.1016/j.yjbinx.2019.100057_b0145
– ident: 10.1016/j.yjbinx.2019.100057_b0420
– volume: 17
  start-page: 95
  issue: 1
  year: 2017
  ident: 10.1016/j.yjbinx.2019.100057_b0210
  article-title: Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec
  publication-title: BMC Med. Inform. Decis. Mak.
  doi: 10.1186/s12911-017-0498-1
– start-page: 1188
  year: 2014
  ident: 10.1016/j.yjbinx.2019.100057_b0035
  article-title: Distributed representations of sentences and documents
– ident: 10.1016/j.yjbinx.2019.100057_b0275
– ident: 10.1016/j.yjbinx.2019.100057_b0320
  doi: 10.3115/1620754.1620758
– start-page: 2049
  year: 2015
  ident: 10.1016/j.yjbinx.2019.100057_b0335
  article-title: Evaluation of Word Vector Representations by Subspace Alignment
– start-page: 72
  year: 2019
  ident: 10.1016/j.yjbinx.2019.100057_b0080
  article-title: Publicly available clinical BERT embeddings
– volume: 49
  start-page: 1
  issue: December
  year: 2013
  ident: 10.1016/j.yjbinx.2019.100057_b0325
  article-title: Multimodal distributional semantics
  publication-title: J. Artif. Intell. Res.
– start-page: 363
  year: 2010
  ident: 10.1016/j.yjbinx.2019.100057_b0390
  article-title: Overview of the inex 2010 xml mining track: Clustering and classification of xml documents
– ident: 10.1016/j.yjbinx.2019.100057_b0395
  doi: 10.1109/GRC.2006.1635880
– ident: 10.1016/j.yjbinx.2019.100057_b0445
– start-page: 166
  year: 2016
  ident: 10.1016/j.yjbinx.2019.100057_b0360
  article-title: How to train good word embeddings for biomedical nlp
– start-page: 1027
  year: 2007
  ident: 10.1016/j.yjbinx.2019.100057_b0385
  article-title: k-means++: The advantages of careful seeding
– year: 2010
  ident: 10.1016/j.yjbinx.2019.100057_b0300
  article-title: Semantic similarity and relatedness between clinical terms: An experimental study
– ident: 10.1016/j.yjbinx.2019.100057_b0130
– start-page: 6338
  year: 2017
  ident: 10.1016/j.yjbinx.2019.100057_b0105
  article-title: Poincaré embeddings for learning hierarchical representations
SSID ssj0011556
Score 2.5655596
SecondaryResourceType review_article
Snippet [Display omitted] •We survey methods of representing clinical text using neural networks.•We provide a “how-to” guide for training these representations on...
Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 100057
SubjectTerms Clinical data
Natural language processing
Word embeddings
Title A survey of word embeddings for clinical text
URI https://dx.doi.org/10.1016/j.yjbinx.2019.100057
https://www.proquest.com/docview/2561487008
Volume 100
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1532-0480
  dateEnd: 20210131
  omitProxy: false
  ssIdentifier: ssj0011556
  issn: 1532-0464
  databaseCode: AIEXJ
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Ra9swEBZrO8bGGFvWsW5d0WBvxcORHUt-DKOl7VgJrAO_GUmWaUJjl8Tpsv363Umyk7aUdoO9mCBsx-gul_Pd931HyKdBOTBRHJkggewhiHmigpQzHUSDIgnTUvULJe2wCX56KrIsHXko79yOE-BVJZbL9PK_mhrWwNhInf0Lc3c3hQX4DEaHI5gdjg8y_HB_vphdGds5_4lkQDNVprAtJosp7LiQiPm4Izd1pHx7lldWbdZQ8V_PZdNIG0YP5fi33IeFzsdOjGrF95FJ2EX9Ub0IRrN66nr8F7Xr0K-6TwVW9G15tj6XU0-58uUIZEBdK0d0PJlrME4IqwxBpK50YNbX3BynLhZb2dLbcd2VGCaff03UuFoiIi9FfEfoxK1vKGZ_Z0io5TyD5BSlTqMNssX4IIWgtzU8PshOujYTJFOJE9R1j9dyKy0A8PZ33ZW73PgXt6nJ2UvywtuNDp0vvCKPTNUjz9aUJnvkyTePoeiR565SSx0B7TUJhtQ5DK1Lig5DVw5Dwfa0dRiKDrNNfhwenH05CvwQjUBHWIbG-aAQpAUXIilYyGXaZyi7z0tj4HU25JqXSpYxUwljMtE8LYsYRYwEUpRlGL0hm1VdmbeEstDoRGkFrxBFnBgtpGZMiD6-kzIemh0StduTa68wj4NOLvIWSjjJ3abmuKm529QdEnRXXTqFlXvO5-3O5z5LdNlfDs5yz5UfW0PlEESxMyYrUy_mObN6uBzy4Xf_fPf35Onq97BLNpvZwnwgj_VVM57P9sgGz8Se978_SeOVcA
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+survey+of+word+embeddings+for+clinical+text&rft.jtitle=Journal+of+biomedical+informatics&rft.au=Khattak%2C+Faiza+Khan&rft.au=Jeblee%2C+Serena&rft.au=Pou-Prom%2C+Chlo%C3%A9&rft.au=Abdalla%2C+Mohamed&rft.date=2019-01-01&rft.pub=Elsevier+Inc&rft.issn=1532-0464&rft.eissn=1532-0480&rft.volume=100&rft_id=info:doi/10.1016%2Fj.yjbinx.2019.100057&rft.externalDocID=S2590177X19300563
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1532-0464&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1532-0464&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1532-0464&client=summon