Accepted Papers

Paper ID Title Authors Contact Person
S19201 TALENT: Table VQA via Augmented Language-Enhanced Natural-text Transcription Yutong Guo, Wanying Wang, Yue Wu, Zichen Miao, and Haoyu Wang Haoyu Wang
S19202 Smart Vision-Language Reasoners Denisa Olteanu Roberts and Lucas Roberts Lucas Roberts
S19203 BrAMA: A data-efficient brain-inspired architecture for semi-supervised multi-modal association Jonathan Grienay, Marina Reyboz, Martial Mermillod, Laurent Rodriguez, and Benoit Miramond Jonathan Grienay
S19204 A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content Lele Cao Lele Cao
S19205 OPTiCAL: An Abstract Positional Reasoning Benchmark for Vision Language Models Christopher Driggers-Ellis, Gabriel Ayoubi, and Christan Grant Christopher Driggers-Ellis
S19206 Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection Saroj Basnet, Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, and Marcos Zampieri Marcos Zampieri
S19207 TTNS: Dynamic Three-Tier Negative Sampling for Scalable Multi-Modal Search Ranking in Production Fengbin Chen, Liping Zhang, and Tracy King Liping Zhang
DM258 A Study on Multimodal Emotion Recognition Model Incorporating Edge Noise Optimization Chen Huang, Huijie Liu, Yan Zhang, Chao Yang, and Jianhua Song Jianhua Song
DM494 Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders Jake S. Rhodes, Adam G. Rustad, Marshall S. Nielsen, Morgan McClellan, Dallan Gardner, and Dawson Hedges Jake S. Rhodes
DM949 SVDLoRA: Data-Driven Low-Rank Adaptation via Spectral Decomposition Fanglue Zhang, Shufan Shen, Chao Bi, Li Su, Qingming Huang, and Shuhui Wang Fanglue Zhang