视频问答与推理(Video Question Answering and Reasoning)——论文调研


更新时间——2019.12 首稿


0. 前言

学习 VQA 的第一步——前期论文调研。 调研近几年在各大会议上的论文发表情况,来了解一下这个方向的进展,主要包括 CVPR, ICCV, ECCV,ACM MM,,AAAI。之后准备总结一下常用的数据集以及经典的方法。

1. ACM MM

ACM MM 是计算机科学与技术多媒体领域的主要国际会议,主要关注不同数字媒体产生的多角度信息整合与处理。而 VQA 隶属于其 多媒体内容理解主题里面(Understanding multimedia content)的 Vision and Language 分支。

1.1 ACM MM 2019

  • 不完全统计有 5 篇(包括Video / Visual Question Answer)
论文题目 作者
Multi-interaction Network with Object Relation for VideoQA 浙江大学
Learnable Aggregating Net with Divergent Loss for VideoQA 电子科技大学
Question-Aware Tube-Switch Network for VideoQA 中国科学技术大学
CRA-Net: Composed Relation Attention Network for Visual QA 电子科技大学
Erasing-based Attention Learning for Visual QA 中科院自动化所

1.2 ACM MM 2018

  • 不完全统计有 4 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Explore Multi-Step Reasoning in Video Question Answering 天津大学
Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering 南方科技大学
Object-Difference Attention: A Simple Relational Attention for Visual Question Answering 北京邮电大学
Enhancing Visual Question Answering Using Dropout 中科院自动化所

1.3 ACM MM 2017

  • 不完全统计有 4 篇(包括Video / Visual Question Answer)
论文题目 作者单位
VideoQA via Hierarchical Dual-Level Attention Network Learning 浙江大学
VideoQA via Gradually Refined Attention over Appearance and Motion 浙江大学

2. CVPR

CVPR 全称 Conference on Computer Vision and Pattern Recognition, 中文名为国际计算机视觉与模式识别会议,一般是每年六月左右举行。

2.1 CVPR 2019

  • 不完全统计有 12 篇(包括Video / Visual Question Answer),但是基于视频的好像就一篇
论文题目 作者单位
Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA 京东研究院
MUREL: Multimodal Relational Reasoning for Visual Question Answering
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Deep Modular Co-Attention Networks for Visual Question Answering
Visual Question Answering as Reading Comprehension
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
Cycle-Consistency for Robust Visual Question Answering
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Progressive Attention Memory Network for Movie Story Question Answering
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Explicit Bias Discovery in Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models

2.2 CVPR 2018

  • 不完全统计有 15 篇(包括Video / Visual Question Answer),但是基于视频的好像就一篇
论文题目 作者单位
Motion-Appearance Co-Memory Networks for Video Question Answering
* Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Learning Answer Embeddings for Visual Question Answering
Cross-Dataset Adaptation for Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Visual Question Generation as Dual Task of Visual Question Answering
Focal Visual-Text Attention for Visual Question Answering
Visual Question Answering With Memory-Augmented Networks
Visual Question Reasoning on General Dependency Tree
Differential Attention for Visual Question Answering
Learning Visual Knowledge Memory Networks for Visual Question Answering
IVQA: Inverse Visual Question Answering
Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

2.3 CVPR 2017

  • 不完全统计有 9 篇(包括Video / Visual Question Answer),没有基于视频的
论文题目 作者单位
Graph-Structured Representations for Visual Question Answering
Knowledge Acquisition for Visual Question Answering via Iterative Querying
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Empirical Evaluation of Visual Question Answering for Novel Objects
Multi-Level Attention Networks for Visual Question Answering
A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

3.3 CVPR 2016

  • 不完全统计有 8 篇(包括Video / Visual Question Answer),没有基于视频的,而且看起来是刚起步
论文题目 作者单位
Stacked Attention Networks for Image Question Answering
Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction
Where to Look: Focus Regions for Visual Question Answering
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources
MovieQA: Understanding Stories in Movies Through Question-Answering
Answer-Type Prediction for Visual Question Answering
Visual7W: Grounded Question Answering in Images
Yin and Yang: Balancing and Answering Binary Visual Questions

3. ICCV

ICCV 全称 International Conference on Computer Vision, 中文名为国际计算机视觉大会,每两年在全世界范围内召开一次,录用率比较低,所以在业内评价较高,是三大CV顶会中公认级别最高的。

3.1 ICCV 2019

  • 不完全统计有 5 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Compact Trilinear Interaction for Visual Question Answering
Why Does a Visual Question Have Different Answers?
Scene Text Visual Question Answering
Multi-Modality Latent Interaction Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering

3.2 ICCV 2017

  • 不完全统计有 6 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Learning to Reason: End-To-End Module Networks for Visual Question Answering
Structured Attentions for Visual Question Answering
Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering
An Analysis of Visual Question Answering Algorithms
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MarioQA: Answering Questions by Watching Gameplay Videos

3.3 ICCV 2015

  • 听名字感觉像是第一篇
论文题目 作者单位
VQA: Visual Question Answering

4. AAAI

发布了20 篇原创文章 · 获赞 1 · 访问量 514

猜你喜欢

转载自blog.csdn.net/qq_41341454/article/details/103569017
今日推荐