S19201 | TALENT: Table VQA via Augmented Language-Enhanced Natural-text Transcription | Yutong Guo, Wanying Wang, Yue Wu, Zichen Miao, and Haoyu Wang | Haoyu Wang |
S19202 | Smart Vision-Language Reasoners | Denisa Olteanu Roberts and Lucas Roberts | Lucas Roberts |
S19203 | BrAMA: A data-efficient brain-inspired architecture for semi-supervised multi-modal association | Jonathan Grienay, Marina Reyboz, Martial Mermillod, Laurent Rodriguez, and Benoit Miramond | Jonathan Grienay |
S19204 | A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content | Lele Cao | Lele Cao |
S19205 | OPTiCAL: An Abstract Positional Reasoning Benchmark for Vision Language Models | Christopher Driggers-Ellis, Gabriel Ayoubi, and Christan Grant | Christopher Driggers-Ellis |
S19206 | Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection | Saroj Basnet, Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, and Marcos Zampieri | Marcos Zampieri |
S19207 | TTNS: Dynamic Three-Tier Negative Sampling for Scalable Multi-Modal Search Ranking in Production | Fengbin Chen, Liping Zhang, and Tracy King | Liping Zhang |
DM258 | A Study on Multimodal Emotion Recognition Model Incorporating Edge Noise Optimization | Chen Huang, Huijie Liu, Yan Zhang, Chao Yang, and Jianhua Song | Jianhua Song |
DM494 | Guided Manifold Alignment with Geometry-Regularized Twin Autoencoders | Jake S. Rhodes, Adam G. Rustad, Marshall S. Nielsen, Morgan McClellan, Dallan Gardner, and Dawson Hedges | Jake S. Rhodes |
DM949 | SVDLoRA: Data-Driven Low-Rank Adaptation via Spectral Decomposition | Fanglue Zhang, Shufan Shen, Chao Bi, Li Su, Qingming Huang, and Shuhui Wang | Fanglue Zhang |