Systems, methods, and computer readable media for visualization of semantic information and inference of temporal signals indicating salient associations between life science entities

Uložené v:
Podrobná bibliografia
Názov: Systems, methods, and computer readable media for visualization of semantic information and inference of temporal signals indicating salient associations between life science entities
Patent Number: 11900,274
Dátum vydania: February 13, 2024
Appl. No: 17/369757
Application Filed: July 07, 2021
Abstrakt: Disclosed systems, methods, and computer readable media can detect an association between semantic entities and generate semantic information between entities. For example, semantic entities and associated semantic collections present in knowledge bases can be identified. A time period can be determined and divided into time slices. For each time slice, word embeddings for the identified semantic entities can be generated; a first semantic association strength between a first semantic entity input and a second semantic entity input can be determined; and a second semantic association strength between the first semantic entity input and semantic entities associated with a semantic collection that is associated with the second semantic entity can be determined. An output can be provided based on the first and second semantic association strengths.
Inventors: nference, inc. (Cambridge, MA, US)
Assignees: nference, Inc. (Cambridge, MA, US)
Claim: 1. A method of generating semantic information between entities, comprising: identifying a plurality of semantic entities in one or more corpora, wherein the semantic entities include one or more of single words or multi-word phrases; identifying a plurality of semantic entity types in the one or more corpora; associating one or more semantic entity types with the semantic entities of the plurality of semantic entities; generating word embeddings for the plurality of semantic entities; determining a first set of semantic association scores between semantic entities from the plurality of semantic entities based on the word embeddings; receiving a query term; determining a query term entity type associated with the query term, wherein determining the query term entity type comprises determining if the query term corresponds to at least one semantic entity that is associated with the word embeddings; generating a first list of resulting semantic entities associated with the query term based on the first set of semantic association scores and the query term entity type; generating a second list of semantic entity collections based on the semantic entity types associated with the semantic entities of the first list of resulting semantic entities, wherein each semantic entity collection from the second list is associated with a semantic entity type; and providing an output based on the second list of semantic entity collections.
Claim: 2. The method of claim 1 , wherein the plurality of semantic entity types is identified based on one or more of: a structured database, a custom list of entity types, an output from a neural network, an output from supervised machine learning, or an output from unsupervised machine learning.
Claim: 3. The method of claim 2 , wherein the neural network architecture is one or more of: a recurrent neural network (RNN) or a Long Short Term Memory (LSTM).
Claim: 4. The method of claim 1 , wherein the method further comprises identifying semantic entities using an automatic methods of identifying one or more single words or multi-word phrases as semantic entities belonging to semantic collections.
Claim: 5. The method of claim 1 , wherein generating the second list comprises generating the second list of semantic entity collections as a function of a minimum semantic association score.
Claim: 6. The method of claim 1 , wherein generating the second list comprises generating the second list of semantic entity collections as a function a minimum number of occurrences of the resulting semantic entity in the one or more corpora.
Claim: 7. The method of claim 1 , wherein the method further comprises generating a third list of semantic association scores.
Claim: 8. The method of claim 7 , wherein the third list comprises semantic association scores between each of the resulting semantic entities from the first list and the second list.
Claim: 9. The method of claim 1 , wherein the first list comprises the semantic entities associated with the same semantic entity type as the query term entity type.
Claim: 10. The method of claim 1 , wherein the method further comprises generating one or more knowledge graphs as a function of the word embeddings.
Claim: 11. The method of claim 1 , wherein the method further comprises generating a second set of semantic association scores as a function of the second list.
Claim: 12. A system for generating semantic information between entities, comprising: a memory that stores a module; and a processor configured to run the module stored in the memory that is configured to cause the processor to: identify a plurality of semantic entities in one or more corpora, wherein the semantic entities include one or more of single words or multi-word phrases; identify a plurality of semantic entity types in the one or more corpora; associate one or more semantic entity types with the semantic entities of the plurality of semantic entities; generate word embeddings for the plurality of semantic entities; determine a first set of semantic association scores between semantic entities from the plurality of semantic entities based on the word embeddings; receive a query term; determine a query term entity type associated with the query term, wherein determining the query term entity type comprises determining if the query term corresponds to at least one semantic entity that is associated with the word embeddings; generate a first list of resulting semantic entities associated with the query term based on the a first set of semantic association scores and the query term entity type; generate a second list of semantic entity collections based on the semantic entity types associated with the semantic entities of the first list of resulting semantic entities, wherein each semantic entity collection from the second list is associated with a semantic entity type; and provide an output based on the second list of semantic entity collections.
Claim: 13. The system of claim 12 , wherein the plurality of semantic entity types is identified based on one or more of: a structured database, a custom list of entity types, an output from a neural network, an output from supervised machine learning, or an output from unsupervised machine learning.
Claim: 14. The system of claim 13 , wherein the neural network architecture is one or more of: a recurrent neural network (RNN) or a Long Short Term Memory (LSTM).
Claim: 15. The system of claim 12 , wherein the method further comprises identifying semantic entities comprises an automatic methods of identifying one or more single words or multi-word phrases as semantic entities belonging to semantic collections.
Claim: 16. The system of claim 12 , wherein generating the second list comprises generating the second list of semantic entity collections as a function of a minimum semantic association score.
Claim: 17. The system of claim 12 , wherein generating the second list comprises generating the second list of semantic entity collections as a function a minimum number of occurrences of the resulting semantic entity in the one or more corpora.
Claim: 18. The system of claim 12 , wherein the method further comprises generating a third list of semantic association scores.
Claim: 19. The method of claim 18 , wherein the third list comprises semantic association scores between each of the resulting semantic entities from the first list and the second list.
Claim: 20. The system of claim 12 , wherein the first list comprises the semantic entities associated with the same semantic entity type as the query term entity type.
Claim: 21. The system of claim 12 , wherein the method further comprises generating one or more knowledge graphs as a function of the word embeddings.
Claim: 22. The system of claim 12 , wherein the method further comprises generating a second set of semantic association scores as a function of the second list.
Patent References Cited: 7542969 June 2009 Rappaport et al.
9514405 December 2016 Chen et al.
9953095 April 2018 Scott et al.
10360507 July 2019 Aravamudan et al.
11062218 July 2021 Aravamudan
11487902 November 2022 Ardhanari et al.
11545242 January 2023 Aravamudan
20040013302 January 2004 Ma et al.
20070039046 February 2007 Van Dijk et al.
20080118150 May 2008 Balakrishnan et al.
20080243825 October 2008 Staddon et al.
20090116736 May 2009 Neogi et al.
20110255788 October 2011 Duggan et al.
20110307460 December 2011 Vadlamani et al.
20130132331 May 2013 Kowalczyk et al.
20150161413 June 2015 Calem et al.
20150254555 September 2015 Williams, Jr. et al.
20160105402 April 2016 Soon-Shiong et al.
20160247307 August 2016 Stoop et al.
20170032243 February 2017 Corrado et al.
20170061326 March 2017 Talathi et al.
20170091391 March 2017 LePendu
20180060282 March 2018 Kaljurand
20190319982 October 2019 Durand et al.
20190354883 November 2019 Aravamudan et al.
20200402625 December 2020 Aravamudan et al.
20210019287 January 2021 Prasad et al.
20210224264 July 2021 Barve et al.
20210248268 August 2021 Ardhanari et al.
20220050921 February 2022 LaFever et al.
20230051067 February 2023 Ardhanari
105938495 September 2016
2019-536178 December 2019
WO-2015084759 June 2015
WO-2015149114 October 2015
WO-2018057945 March 2018
WO-2020257783 December 2020
WO-2021011776 January 2021
WO-2021146694 July 2021
WO-2021178689 September 2021

















































Other References: Korger, Clustering of Distributed Word Representations and its Applicability for Enterprise Search, Doctoral Thesis, Dresden University of Technology, 2016, pp. 1-116 (Year: 2016). cited by examiner
Zuccon, et al., Integrating and Evaluating Neural Word Embeddings in Information Retrieval, ADCS, 2015, pp. 1-8 (Year: 2015). cited by examiner
Freitas, Schema-Agnostic Queries for Large-Schema Databases: A Distributional Semantics Approach, Doctoral Thesis, National University of Ireland, Galway, 2015, pp. 1-396 (Year: 2015). cited by examiner
Ananiadou, et al., Event Extraction for Systems Biology by Text Mining the Literature, Trends in Biotechnology vol. 28 No. 7, 2010, pp. 381-390 (Year: 2010). cited by examiner
Köpcke, et al., Frameworks for Entity Matching: A Comparison, Data & Knowledge Engineering, 2009, pp. 1-14 (Year: 2009). cited by examiner
AMD Secure Encrypted Virtualization (SEV), https://developer.amd.com/sev/, accessed Sep. 23, 2020 (5 pages). cited by applicant
Arora, S. et al., “A Simple But Tough-To-Beat Baseline for Sentence Embeddings”, ICLR, 2017 (16 pages). cited by applicant
AWS Key Management Service (KMS), https://aws.amazon.com/kms, accessed Jan. 20, 2021 (3 pages). cited by applicant
AWS Key Management Service (KMS), https://aws.amazon.com/kms, accessed Sep. 23, 2020 (6 pages). cited by applicant
Bartunov, S. et al., “Breaking Sticks And Ambiguities With Adaptive Skip-Gram”, retrieved online from URL:< https://arxiv.org/pdf/1502.07257.pdf>, [cs.CL], Nov. 15, 2015 (15 pages). cited by applicant
Bojanowski, P. et al., “Enriching Word Vectors with Subword Information”, retrieved online from URL:<https://arxiv.org/pdf/1607.04606.pdf>, [cs.CL], Jun. 19, 2017 (12 pages). cited by applicant
Confidential Computing Consortium, “What is the Confidential Computing Consortium?”, https://confidentialcomputing.io, accessed Sep. 24, 2020 (2 pages). cited by applicant
De Guzman, C.G. et al., “Hematopoietic Stem Cell Expansion and Distinct Myeloid Developmental Abnormalities in a Murine Model of the AML1-ETO Translocation”, Molecular and Cellular Biology, 22(15):5506-5517, Aug. 2002 (12 pages). cited by applicant
Desagulier, G., “A lesson from associative learning: asymmetry and productivity in multiple-slot constructions”, Corpus Linguisitic and Linguistic Theory, 12(2):173-219, 2016, submitted Aug. 13, 2015, <http://www.degruyter.com/view/j/cllt.2016.12.issue-2/cllt-2015-0012/cllt-2015-0012.XML?format=INT>. <10.1515/cllt-2015-0012>. , (32 pages). cited by applicant
Devlin, J. et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, arXiv:1810.04805v2 [cs.CL], May 24, 2019 (16 pages). cited by applicant
Divatia, A., “The Fact and Fiction of Homomorphic Encryption”, Dark Reading, www.darkreading.com/attacks-breaches/the-fact-and-fiction-of-homomorphic-encryption/a/d-id/1333691, Jan. 22, 2019 (3 pages). cited by applicant
Dwork, C., “Differential Privacy: A Survey of Results”, Lecture Notes in Computer Science, vol. 4978, pp. 1-19, 2008 (19 pages). cited by applicant
Ferraiuolo, A. et al., “Komodo: Using verification to disentangle secure-enclave hardware from software”, SOSP '17, Shanghai, China, pp. 287-305, Oct. 28, 2017 (19 pages). cited by applicant
Garten, Y. et al., “Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text”, BMC Bioinformatics, 10(Suppl. 2):S6, Feb. 5, 2009 (9 pages). cited by applicant
Genkin, D. et al., “Privacy in Decentralized Cryptocurrencies”, Communications of the ACM, 61(6):78-88, Jun. 2018 (11 pages). cited by applicant
Hageman, G.S. et al., “A common haplotype in the complement regulatory gene factor H (HF1 / CFH) predisposes individuals to age-related macular degeneration”, PNAS, 102(20):7227-7232, May 17, 2005 (6 pages). cited by applicant
Ikeda, T. et al., “Anticorresponding mutations of the KRAS and PTEN genes in human endometrial cancer”, Oncology Reports, 7: 567-570, published online May 1, 2000 (4 pages). cited by applicant
Intel, “What is Intel® SGX?”, http://www.intel.com/content/www/US/en/architecture-and-technology/software-guard-extensions.html, accessed Sep. 23, 2020 (8 pages). cited by applicant
International Search Report and Written Opinion issued by the European Patent Office as International Searching Authority in International Application PCT/US2017/053039, dated Dec. 20, 2017 (15 pages). cited by applicant
International Search Report and Written Opinion issued by the U.S. Patent and Trademark Office as International Searching Authority issued in International Application No. PCT/US21/20906, dated May 19, 2021 (10 pages). cited by applicant
International Search Report and Written Opinion issued by U.S. Patent and Trademark Office as International Searching Authority in International Application No. PCT/US20/42336, dated Sep. 30, 2020 (10 pages). cited by applicant
International Search Report and Written Opinion issued by U.S. Patent and Trademark Office as International Searching Authority, for International Application No. PCT/US20/38987, dated Nov. 9, 2020 (26 pages). cited by applicant
Joulin, A. et al., “Bag of Tricks for Efficient Text Classification”, retrieved online from URL:<https://arXiv.org/pdf/1607.01759v3.pdf>, [cs.CL], Aug. 9, 2016 (5 pages). cited by applicant
Kiros, R. et al., “Skip-Thought Vectors”, retrieved online from URL:<https://arXiv.org/abs/1506.06726v.1>, [cs.CL], Jun. 22, 2015 (11 pages). cited by applicant
Kolte, P. “Why Is Homomorphic Encryption Not Ready For Primetime?”, Baffle, https://baffle.io/blog/why-is-homomorphic-encryption-not-ready-for-primetime/, Mar. 17, 2017 (4 pages). cited by applicant
Korger, C., “Clustering of Distributed Word Representations and its Applicability for Enterprise Search”, Doctoral Thesis, Dresden University of Technology, Faculty of Computer Science, Institute of Software and Multimedia Technology, Matriculation Nr. 3703541, submitted Jul. 28, 2016 (116 pages). cited by applicant
Kutuzov, A et al., “Cross-lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models”, Proceedings of the NewsIR'16 Workshop at ECIR, Padua, Italy, Mar. 20, 2016 (6 pages). cited by applicant
Le, Q. et al., “Distributed Representations of Sentences and Documents”, Proceedings of the 31st International Conference of Machine Learning, Beijing, China, vol. 32, 2014 (9 pages). cited by applicant
Li, H. et al., “Cheaper and Better: Selecting Good Workers for Crowdsourcing,” retrieved online from URL: https://arXiv.org/abs/1502.00725v.1, pp. 1-16, Feb. 3, 2015 (16 pages). cited by applicant
Ling, W. et al., “Two/Too Simple Adaptations of Word2Vec for Syntax Problems”, retrieved online from URL:<https://cs.cmu.edu/˜lingwang/papers/naacl2015.pdf>, 2015 (6 pages). cited by applicant
Maxwell, K.N. et al., “Adenoviral-mediated expression of Pcsk9 in mice results in a low-density lipoprotein receptor knockout phenotype”, PNAS, 101(18):7100-7105, May 4, 2004 (6 pages). cited by applicant
Mikolov, T. et al., “Distributed Representations for Words and Phrases and their Compositionality”, retrieved online from URL:https://arXiv.org/abs/1310.4546.v1 [cs.CL], Oct. 16, 2013 (9 pages). cited by applicant
Mikolov, T. et al., “Efficient Estimation of Word Representations in Vector Space”, retrieved online from URL: https://arXiv.org/abs/1301.3781v3 [cs.CL] Sep. 7, 2013 (12 pages). cited by applicant
Murray, K., “A Semantic Scan Statistic for Novel Disease Outbreak Detection”, Master's Thesis, Carnegie Mellon University, Aug. 16, 2013 (68 pages). cited by applicant
Neelakantan, A., et al., “Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space,” Department of Computer Science, University of Massachusetts, (2015) (11 pages). cited by applicant
Pennington, J. et al., “GloVe: Global Vectors for Word Representation”, retrieved online from URL:<https://nlp.stanford.edu/projects/glove.pdf>, 2014 (12 pages). cited by applicant
Rajagopalan, H., et al., “Tumorigenesis: RAF/RAS oncogenes and mismatch-repair status”, Nature, 418:934, Aug. 29, 2002 (1 page). cited by applicant
Rajasekharan, “Unsupervised NER using BERT”, [accessed Jun. 10, 2021], Toward Data Science, https://towardsdatascience.com/unsupervised-ner-using-bert-2d7af5f90b8a>, Feb. 28, 2020 (27 pages). cited by applicant
Shamir, A. “How to Share a Secret”, Communications of the ACM, 22(11):612-613, Nov. 1979 (2 pages). cited by applicant
Shweta, Fnu et al., “Augmented Curation of Unstructured Clinical Notes from a Massive EHR System Reveals Specific Phenotypic Signature of Impending COVID-19 Diagnosis”, https://www.medrxiv.org/content/10.1101/2020.04.19.20067660v3.full, accessed Sep. 24, 2020 (24 pages). cited by applicant
Van Mulligen, E.M. et al., “The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships”, Journal of Biomedical Informatics, 45:879-884, published online Apr. 25, 2012 (6 pages). cited by applicant
Wieting, J. et al., “Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings”, retrieved online from URL:<https://arXiv.org/pdf/1705.00364v1.pdf>, [cs.CL], Apr. 30, 2017 (12 pages). cited by applicant
Yao, Z. et al., “Dynamic Word Embeddings for Evolving Semantic Discovery”, WSDM 2018, Marina Del Rey, CA, USA, Feb. 5-9, 2018 (9 pages). cited by applicant
Zuccon, G., et al., “Integrating and Evaluating Neural Word Embeddings in Information Retrieval”, ADCS, Parramatta, NSW, Australia, Dec. 8-9, 2015 (8 pages). cited by applicant
International Preliminary Report on Patentability issued in International Application No. PCT/US2021/020906, dated Sep. 15, 2022 (9 pages). cited by applicant
Primary Examiner: Starks, Wilbert L
Attorney, Agent or Firm: Caldwell Intellectual Property Law
Prístupové číslo: edspgr.11900274
Databáza: USPTO Patent Grants
Popis
Abstrakt:Disclosed systems, methods, and computer readable media can detect an association between semantic entities and generate semantic information between entities. For example, semantic entities and associated semantic collections present in knowledge bases can be identified. A time period can be determined and divided into time slices. For each time slice, word embeddings for the identified semantic entities can be generated; a first semantic association strength between a first semantic entity input and a second semantic entity input can be determined; and a second semantic association strength between the first semantic entity input and semantic entities associated with a semantic collection that is associated with the second semantic entity can be determined. An output can be provided based on the first and second semantic association strengths.