Deep Learning and Uncertainty Modeling in Visual Food Analysis

Gespeichert in:
Bibliographische Detailangaben
Titel: Deep Learning and Uncertainty Modeling in Visual Food Analysis
Autoren: Bolaños Solà, Marc
Weitere Verfasser: Radeva, Petia, Universitat de Barcelona. Departament de Matemàtiques i Informàtica
Quelle: Tesis Doctorals - Departament - Matemàtiques i Informàtica
Verlagsinformationen: Universitat de Barcelona
Publikationsjahr: 2021
Bestand: Dipòsit Digital de la Universitat de Barcelona
Schlagwörter: Aprenentatge automàtic, Algorismes, Adquisició del coneixement (Sistemes experts), Percepció de les formes, Xarxes neuronals convolucionals, Machine learning, Algorithms, Knowledge acquisition (Expert systems), Form perception, Convolutional neural networks
Beschreibung: [eng] The world of Machine Learning and Computer Vision has experienced a revolution since the last years. The appearance of Deep Learning algorithms and Convolutional Neural Networks, altogether with the increased processing capabilities provided by modern GPUs and the enormous amounts of annotated data publicly available, have allowed a boost in the field as never seen before. These notable improvements achieved in the Machine Learning world have led to the appearance of new fields like the Multimodal Learning, which encompasses and learns from many subfields. Additionally, new applications have taken profit of these advancements in order to reach high levels of performance. The huge results improvement of the currently available algorithms have allowed not only revolutionizing the academic world, but also bringing AI-based solutions to the market that looked like science fiction barely 10 years ago. This thesis, which is written as a papers compendium, focuses on delving deeper into the novel topic of Deep Multimodal Learning by proposing new algorithms and solutions for both already existing and newly defined problems. From the applications perspective, most of the papers presented can be divided in two areas of applicability. From the one hand, Egocentric Vision and Storytelling, which consists in acquiring images from the daily life of a person in order to analyse its behaviour patterns like social interactions, activities and events, interactions with objects, etc. And on the other hand, Food Recognition and Analysis, which consists in visually analysing and recognizing the food appearing on images in multiple contexts and with different levels of complexity, from food groups recognition to nutritional analysis. In both applications, the final purpose of the proposed papers is building tools that provide information that could lead to a better quality of life of the users. ; [spa] El mundo del Machine Learning y la Visión por Computador ha experimentado una revolución los últimos años. La aparición de ...
Publikationsart: doctoral or postdoctoral thesis
Dateibeschreibung: 260 p.; application/pdf
Sprache: English
Relation: https://hdl.handle.net/2445/177330; http://hdl.handle.net/10803/671672
Verfügbarkeit: https://hdl.handle.net/2445/177330
http://hdl.handle.net/10803/671672
Rights: (c) Bolaños Solà, Marc, 2021 ; info:eu-repo/semantics/openAccess
Dokumentencode: edsbas.F8CFDDF4
Datenbank: BASE
Beschreibung
Abstract:[eng] The world of Machine Learning and Computer Vision has experienced a revolution since the last years. The appearance of Deep Learning algorithms and Convolutional Neural Networks, altogether with the increased processing capabilities provided by modern GPUs and the enormous amounts of annotated data publicly available, have allowed a boost in the field as never seen before. These notable improvements achieved in the Machine Learning world have led to the appearance of new fields like the Multimodal Learning, which encompasses and learns from many subfields. Additionally, new applications have taken profit of these advancements in order to reach high levels of performance. The huge results improvement of the currently available algorithms have allowed not only revolutionizing the academic world, but also bringing AI-based solutions to the market that looked like science fiction barely 10 years ago. This thesis, which is written as a papers compendium, focuses on delving deeper into the novel topic of Deep Multimodal Learning by proposing new algorithms and solutions for both already existing and newly defined problems. From the applications perspective, most of the papers presented can be divided in two areas of applicability. From the one hand, Egocentric Vision and Storytelling, which consists in acquiring images from the daily life of a person in order to analyse its behaviour patterns like social interactions, activities and events, interactions with objects, etc. And on the other hand, Food Recognition and Analysis, which consists in visually analysing and recognizing the food appearing on images in multiple contexts and with different levels of complexity, from food groups recognition to nutritional analysis. In both applications, the final purpose of the proposed papers is building tools that provide information that could lead to a better quality of life of the users. ; [spa] El mundo del Machine Learning y la Visión por Computador ha experimentado una revolución los últimos años. La aparición de ...