MMAI @ IEEE ICDM 2025

Advancing Multimodal AI Research and Applications

The 5th IEEE International Workshop on Multimodal AI (MMAI) @ IEEE ICDM 2025 focuses on advancing research and applications in multimodal artificial intelligence. This workshop continues the successful series of previous MMAI workshops, bringing together researchers and practitioners to explore the latest developments in multimodal learning, fusion, and applications.

MMAI Workshop Banner

The workshop will be held in December 2025.

In conjunction with IEEE ICDM, the workshop will be held in person. Please see our schedule for details.

About Multimodal AI

Multimodality is the most general form for information representation and delivery in a real world. Using multimodal data is natural for humans to make accurate perceptions and decisions. Our digital world is actually multimodal, combining various data modalities, such as text, audio, images, videos, touch, depth, 3D, animations, biometrics, interactive content, etc. Multimodal data analytics algorithms often outperform single modal data analytics in many real-world problems.

Multi-sensor data fusion has also been a topic of great interest in industry nowadays. In particular, such companies working on automotive, drone vision, surveillance or robotics have grown exponentially. They are attempting to automate processes by using a wide variety of control signals from various sources.

With the rapid development of Big Data technology and its remarkable applications to many fields, multimodal Artificial Intelligence (AI) for Big Data is a timely topic. This workshop aims to generate momentum around this topic of growing interest, and to encourage interdisciplinary interaction and collaboration between Natural Language Processing (NLP), computer vision, audio processing, machine learning, multimedia, robotics, Human-Computer Interaction (HCI), social computing, cybersecurity, cloud computing, edge computing, Internet of Things (IoT), and geospatial computing communities. It serves as a forum to bring together active researchers and practitioners from academia and industry to share their recent advances in this promising area.

Topics

This is an open call for papers, which solicits original contributions considering recent findings in theory, methodologies, and applications in the field of multimodal AI and Big Data. The list of topics includes, but not limited to:

  • Multimodal representations (language, vision, audio, touch, depth, etc.)
  • Multimodal data modeling
  • Multimodal data fusion
  • Multimodal learning
  • Cross-modal learning
  • Multimodal big data analytics and visualization
  • Multimodal scene understanding
  • Multimodal perception and interaction
  • Multimodal information tracking, retrieval and identification
  • Multimodal big data infrastructure and management
  • Multimodal benchmark datasets and evaluations
  • Multimodal AI in robotics (robotic vision, NLP in robotics, Human-Robot Interaction (HRI), etc.)
  • Multimodal object detection, classification, recognition, and segmentation
  • Multimodal AI safety (explainability, interpretability, trustworthiness, etc.)
  • Multimodal Biometrics
  • Multimodal applications (autonomous driving, cybersecurity, smart cities, intelligent transportation systems, industrial inspection, medical diagnosis, healthcare, social media, arts, etc.)

Confirmed Speakers

Speaker 1
Affiliation
Speaker 2
Affiliation
Speaker 3
Affiliation

Important Dates:

  • Paper submission deadline: TBD
  • Notification of acceptance: TBD
  • Camera-ready deadline: TBD

Checkout our CFP for additional details.

Organizers

*=main correspondence