ABSTRACT :
This thesis embarks on a captivating exploration of multimodal learning algorithms, inspired by the human brain's remarkable capacity to combine a myriad of sensory inputs. We delve into three application areas: customs fraud detection, "InfraSecure" a railway construction safety project, and computer-assisted diagnosis for osteoporosis pathology. Our aim is to develop efficient multimodal learning algorithms that elevate the performance of existing unimodal solutions and advance explainability in artificial intelligence. Our research spotlights the importance of robust fusion methods and effective modality encoders while highlighting key challenges in the field, including the integration of diverse data sources and the need for model transparency. The study concludes with the implementation of our proposed fusion method for predicting customs classification, thereby demonstrating the practical applicability of multimodal learning. Additionally, we identify areas that require further research, paving the way for future advancements in this field.
KEYWORDS : Multimodal fusion, Representation learning, Deep learning, XAI
Electronics 12 (9), 2027, 2023
Access publicationarXiv preprint arXiv:2406.04349, 2024
Access publicationProceedings Copyright 632, 639, 2024
Access publication