Call for Papers

The ability to craft and understand stories is a crucial cognitive tool used by humans for communication. According to computational linguists, narrative theorists and cognitive scientists, the story understanding is a good proxy to measure the readers' intelligence. Readers can understand a story as a way of problem-solving in which, for example, they keep focusing on how main characters overcome certain obstacles throughout the story. Readers need to make inferences both in prospect and in retrospect about the causal relationships between different events in the story.

Especially, the video story data such as TV shows and movies can serve as an excellent testbed to evaluate the human-level AI algorithms from two points of view. First, video data have different modalities such as a sequence of images, audios (including dialogue, sound effects and background music) and text (subtitles or added comments). Second, video data show various cross-sections of everyday life. Therefore, understanding video story can be thought of a significant challenge to current AI technology, which involves analyzing and simulating human vision, language, thinking, and behavior.

Towards human-level video understanding, machine intelligence needs to extract meaningful information such as events from the sequential multimodal video data, consider the causal relationships between different events, and make inferences both in prospect and in retrospect about what events will occur and how these events could occur. Story in the video is highly-abstracted information which consists of a series of events across multiple scenes in a scenario.

In this workshop, we emphasize the necessity of findings and insights from the various research domain for video story understanding. We aims to invite experts in variety of related fields, including vision, language processing, computational narratology and neuro-symbolic computing to provide a perspective on the research that exists, and initiates discussion of future challenges in data-driven video understanding. Topics of interest include but not limited to:

  • Deep learning architecture for multi-modal video story representation
  • Question answering about video story
  • Summarization and retrieval from long story video contents
  • Scene description generation for video understanding
  • Scene graph generation and relationship detection from video
  • Activity/Event recognition from video
  • Character identification & interaction modeling in video
  • Emotion recognition in video
  • Novel tasks about video understanding and challenge dataset
This workshop will invite a selected set of leading researchers in the related fields for invited talks. Also, we encourage submissions of papers as extended abstract within 4 pages.

Submission Instructions

We invite submissions of papers as extended abstract within 4 pages, excluding references or supplementary materials. All submissions must be in pdf format as a single file (incl. supplementary materials) using below templates and submitted through this CMT link. The review process is single-round and double-blind. All submissions have to be anonymized.

All accepted papers will be presented as posters during the workshop and listed on the website. Additionally, a small number of accepted papers will be selected to be presented as contributed talks.

Dual Submissions

Note that this workshop will not publish official proceedings. The accepted submission will not be counted as a publication. We encourage submissions of relevant work that has been previously published, or is to be presented at the main conference.

Important Dates

Paper Submission Deadline September 10, 2019 (GMT+9)   
Notification to Authors October 7, 2019
Paper Camera-Ready Deadline October 18, 2019
Workshop Date November 2, 2019

Schedule (Tentative)

Time Presentation
08:30 - 08:45 Welcome & Opening Talk
08:45 - 09:15 Invited Talk 1
09:15 - 09:45 Invited Paper 1:
VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability
Romain Cohendet, Claire-Hélène Demarty, Ngoc Q. K. Duong, Martin Engilberge

Invited Paper 2:
Progressive Attention Memory Network for Movie Story Question Answering
Junyeong Kim, Minuk Ma, Kyungsu Kim, Sungjin Kim, Chang D. Yoo

Invited Paper 3:
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic
09:45 - 10:15 Spotlight Talks (5 minutes each)
(1) DIFRINT: Deep Iterative Frame Interpolation for Full-frame Video Stabilization (Jinsoo Choi, In So Kweon)
(2) Adversarial Inference for Multi-Sentence Video Description (Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach)
(3) Robust Person Re-identification via Graph Convolution Networks (Guisik Kim, Dongwook Shu, Junseok Kwon)
(4) Enhancing Performance of Character Identification on Multiparty Dialogues of Drama via Multimodality (Donghwan Kim)
(5) Dual Attention Networks for Visual Reference Resolution in Visual Dialog (Gi-Cheon Kang, Jaeseo Lim, Byoung-Tak Zhang)
(6) Event Structure Frame-Annotated WordNet for Multimodal Inferencing (Seohyun Im)
10:15 - 11:25 Coffee Break & Poster Session
11:25 - 11:55 Invited Talk 2
11:55 - 12:25 Invited Talk 3
12:25 - 12:30 Closing

Invited Speakers:

Invited Speaker 1: Trevor Darrell, University of California, Berkeley (tbc)

Invited Speaker 2: Cees Snoek, University of Amsterdam

Invited Speaker 3: Leonid Sigal, University of British Columbia


Organizers

Generic placeholder image
Seongho Choi, Seoul National University
Generic placeholder image
Kyoung-Woon On, Seoul National University
Generic placeholder image
Yu-Jung Heo, Seoul National University
Generic placeholder image
Haeyong Kang, KAIST
Generic placeholder image
Krishna Mohan Chalavadi, Indian Institute of Technology Hyderabad
Generic placeholder image
Ting Han, National Institue of Advanced Industrial Science and Technology
Generic placeholder image
Chang Dong Yoo, KAIST
Generic placeholder image
Gunhee Kim, Seoul National University
Generic placeholder image
Byoung-Tak Zhang, Seoul National University

Program Committee

  • Prof. Bohyung Han (Seoul National University)
  • Prof. Byung-Chull Bae (Hongik University)
  • Dr. Eun-Sol Kim (Kakao Brain)
  • Prof. In So Kweon (KAIST)
  • Prof. Ji-Hwan Kim (Sogang University)
  • Dr. Jin-Hwa Kim (SK Telecom)
  • Dr. Jung-Woo Ha (Naver)
  • Prof. Junmo Kim (KAIST)
  • Prof. Junseok Kwon (Chung-Ang University)
  • Prof. Kristen Grauman (University of Texas at Austin)
  • Dr. Kyung-Min Kim (Naver)
  • Prof. Seon Joo Kim (Yonsei University)
  • Prof. Seong-Bae Park (Kyung Hee University)
  • Prof. Tamara L. Berg (UNC Chapel Hill)


Contact us

For more questions about the workshop and submissions, please email vttws2019@gmail.com


500x500