|
News
Keynote
Speakers
Call for Papers
Important Dates
Organizers
Program Committee
Submission
Contacts
VECTaR2011
VECTaR2010
VECTaR2009
|
Keynote Speakers
Prof. Dong Xu, Nanyang Technological University, Singapore
Dong Xu is currently an associate professor at
Nanyang Technological University in Singapore. He is leading the Visual
Computing Group working on new theories, algorithms and systems for intelligent
processing and understanding of visual data such as images and videos. He was
the coauthor of a paper that won the Best Student Paper Award at the IEEE
International Conference on Computer Vision and Pattern Recognition in 2010.
Title:
Classifying Images and Videos by Learning from
Web Data
Abstract:
Increasingly rich and massive social media data
are being posted to the photo and video sharing websites like Flickr and
YouTube. Keywords (also called tags) based search can be readily used to collect
the relevant and irrelevant Flickr images or YouTube videos, which can be used
as the positive and negative training data to learn classifiers for classifying
consumer images and videos. In this talk, I will first introduce a domain
adaptation method called Adaptive Multiple Kernel Learning (A-MKL) for video
event recognition, which can effectively cope with the considerable variation in
feature distributions between the web videos and consumer videos. Moreover, I
will also describe our approaches for text based image retrieval by using
multiple instance learning (MIL) to handle noise in the loose labels of training
images.
Dr. Tao Xiang, Queen Mary, University of London, UK
Dr Tao Xiang received the Ph.D. degree in electrical and computer engineering
from the National University of Singapore in 2002. He is currently a senior
lecturer (associate professor) in the School of Electronic Engineering and
Computer Science, Queen Mary University of London. His research interests
include computer vision, statistical learning, video processing, and machine
learning, with focus on interpreting and understanding human behaviour. He has
published over 90 papers and a book "Visual Analysis of Behaviour: From Pixels
to Semantics".
Title:
Weakly supervised learning for video tagging
Abstract:
Providing methods to support semantic interaction with growing volumes of video
data is an increasingly important challenge for computer vision and data mining.
To this end, there has been some success in recognition of simple objects and
actions in video; however most of this work requires strongly supervised
training data. The supervision cost of these approaches therefore renders them
economically non-scalable for real world applications. This talk will focus on
the problem of learning to annotate and retrieve semantic tags of actions and
events in realistic video data with sparsely provided tags of semantically
salient activities. This is challenging because of (1) the multi-label nature of
the learning problem and (2) realistic videos are often dominated by
(semantically uninteresting) background activity un-supported by any tags of
interest, leading to a strong irrelevant data problem. To address these
challenges, a new topic model based approach is introduced to video tag
annotation. The model simultaneously learns a low dimensional representation of
the video data, which dimensions are semantically relevant (supported by tags),
and how to annotate videos with tags.
Technical Program
|
9:00 - 9:05
|
Opening Remarks: Ling Shao, Jianguo Zhang or Liang Wang
|
|
|
Keynote Speech 1
Chair: Shiguang Shan |
|
9:05-9:55
|
Title:
Classifying Images and Videos by Learning from
Web Data
Speaker: Dr. Dong Xu, National Technological
University, Singapore
|
|
9:55-10:00
|
Break
|
|
|
Keynote Speech 2
Chair: Shiguang Shan |
|
10:00-10:50
|
Title:
Weakly Supervised Learning for Video Tagging
Speaker: Dr. Tao Xiang, Queen Mary, University
of London, UK
|
|
10:50-11:00
|
Break
|
|
|
Session A
Chair: Shiguang Shan
|
|
11:00-13:00
|
Atomic Action Features: A New Feature for Action
Recognition
Qiang Zhou (ADSC, Singapore), Gang Wang (NTU &
ADSC, Singapore)
Spatio-Temporal SIFT and
Its Application to Human Action Classification
Manal Alghamdi
(University of Sheffield), Lei Zhang (Harbin
Engineering University), Yoshihiko Gotoh
(University of Sheffield)
Statistics of Pairwise Co-occurring Local Spatio-Temporal
Features for Human Action Recognition
Piotr Bilinski (INRIA),
Francois Bremond (INRIA)
Visual Code-Sentences: A New Video Representation
based on Image Descriptor Sequences
Yusuke Mitarai (Canon
Inc.), Masakazu Matsugu (Canon Inc.)
|
|
13:00-14:30
|
Lunch Break
|
|
|
Session B Chair:
Jingyu Yang
|
|
14:30-16:10
|
Action Recognition Robust to Background Clutter
by using Stereo Vision
Jordi Sanchez-Riera (INRIA), Jan Cech (INRIA),
Radu Horaud (INRIA)
Recognizing Unseen Actions Across Cameras by
Exploring the Correlated Subspace
Chun-Hao Huang (Academia Sinica), Yi-Ren Yeh
(Academia Sinica), Yu-Chiang Frank Wang
(Academia Sinica)
Chinese Shadow Puppetry with an Interactive
Interface Using the Kinect Sensor
Hui Zhang (United International College), Yuhao
Song (United International College), Zhuo Chen
(United International College), Ji Cai (United
International College), Ke Lu (United
International College)
Group Dynamics and
Multimodal Interaction Modeling using a Smart
Digital Signage
Tony Tung (Kyoto University), Randy Gomez (Kyoto
University), Tatsuya Kawahara (Kyoto
University), Takashi Matsuyama (Kyoto
University)
Automated Textual Descriptions for a Wide Range
of Video Events with 48 Human Actions
Gertjan Burghouts (TNO), Patrick Hanckmann (TNO),
Klamer Schutte (TNO)
|
With the vast development of Internet capacity and speed, as well as wide adoptation of media technologies in people's daily life, it is highly demanding to efficiently process or organize video events rapidly emerged from the Internet (e.g., YouTube), wider surveillance networks, mobile devices, smart cameras, etc. The human visual perception system could, without difficulty, interpret and recognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. For a computer vision system, it has been very challenging to achieve automatic video event understanding for decades. Broadly speaking, those challenges include robust detection of events under motion clutters, event interpretation under complex scenes, multi-level semantic event inference, putting events in context and multiple cameras, event inference from object interactions, etc.
In recent years, steady progress has been made towards better models for video event categorization and recognition, e.g., from modeling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition. However, the current progress in video event analysis is still far more from its promise. It is still very difficult to retrieve or categorise a specific video segment based on their content in a real multimedia system or in surveillance applications. The existing techniques are usually tested on simplified scenarios, such as the KTH dataset, and real-life applications are much more challenging and require special attention. To advance the progress further, we must adapt recent or existing approaches to find new solutions for intelligent video event understanding.
The goal of this workshop is to provide a forum for recent research advances in the area of video event categorisation, tagging and retrieval. The workshop seeks original high-quality submissions from leading researchers and practitioners in academia as well as industry, dealing with theories, applications and databases of visual event recognition.
Depth sensors, such as Kinect, and real-world applications, such as event analysis and recognition on videos from the Internet, surveillance cameras, and mobile devices, will be the theme of this year's workshop. Topics include the following, but not limited to:
- Motion interpretation and grouping
- Human Action representation and recognition
- Abnormal event detection
- Contextual event inference
- Event recognition among a distributed camera network
- Multi-modal event recognition
- Spatial temporal features for event categorization
- Hierarchical event recognition
- Probabilistic graph models for event reasoning
- Machine learning for event recognition
- Global/local event descriptors
- Metadata construction for event recognition
- Bottom up and top down approaches for event recognition
- Event-based video segmentation and summarization
- Video event database gathering and annotation
- Efficient indexing and concepts modeling for video event retrieval
- Semantic-based video event retrieval
- On-line video event tagging
- Event recognition for depth cameras (Kinect)
- Evaluation methodologies for event-based systems
- Event-based applications (security, sports, news, etc.)
- Submission Deadline: 18 July 2012 (extended)
- Notification of Acceptance: 23 July 2012
- Camera-Ready Submission: 1 August 2012
- Workshop: 12 October 2012
- Tieniu Tan, Chinese Academy of Sciences, China
- Thomas S. Huang, University of Illinois at Urbana-Champaign, USA
Program Chairs
-
Ling Shao, The University of Sheffield, UK
- Jianguo Zhang, University of Dundee, UK
- Liang Wang, Chinese Academy of Sciences, China
- Rama Chellappa, University of Maryland, USA
- James Ferryman , University of Reading, UK
- GianLuca Foresti , University of Udine, Italy
- Shaogang Gong, Queen Mary, University of London, UK
- Ran He, Chinese Academy of Sciences
- Yu-Gang Jiang , Columbia University, USA
- Graeme A. Jones , Kingston University, UK
- Xuelong Li , Chinese Academy of Sciences, China
- Ram Nevatia , University of Southern California, USA
- Carlo Regazzoni , University of Genoa, Italy
- Shin'ichi Satoh , National Institute of Informatics, Japan
- Ling Shao , The University of Sheffield, UK
- Yan Song , University of Science and Technology of China
- Peter Sturm , INRIA, France
- Dacheng Tao , Sydney University of Technology, Australia
- Liang Wang, Chinese Academy of Sciences, China
- Qi Wang, Chinese Academy of Sciences, China
- Xin-Jing Wang , Microsoft Research Asia,
China
-
Pingkun Yan , Chinese Academy of Sciences, China
- Tao Xiang , Queen Mary University London, UK
- Dong Xu , Nanyang Technological University, Singapore
- Zhang Zhang, Chinese Academy of Sciences
- Jianguo Zhang , University of Dundee, UK
- Lei Zhang , Microsoft Research Asia
Each submission will be reviewed by at least two reviewers from program
committee members and external reviewers for originality, significance,
clarity, soundness, relevance and technical contents. Accepted papers will
be published in a volume of Springer Lecture Notes in Computer Science. High-quality papers will be invited to submit a special issue of a good computer vision journal after the conference.
|