The 2nd International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR2010)

In Conjunction with ACCV 2010

Queenstown, New Zealand, November 9th, 2010

Call for Papers

Important Dates


Program Committee




Technical Programme 

9:00 - 9:05

Opening Remarks: Timothy Hospedales


Keynote Speech: Surveillance Video Events: Future Prospects for Tagging and Retrieval
Speaker: Dr. James Orwell, Kingston University, UK

Session A Chair: Timothy Hospedales


Systematic Evaluation of Spatio-temporal Features on Comparative Video Challenges
Julian Stottinger (CVL Vienna), Bogdan Goras (University of Iashi), Thomas Ponitz (CogVis Ltd.), Allan Hanbury (IRF Vienna), Nicu Sebe (University of Trento), and Theo Gevers (University of Amsterdam)


Coffee Break


Analyzing Diving: A Database for Judging Action Quality
Kamil Wnuk (UCLA) and Stefano Soatto (UCLA)

Appearance-Based Smile Intensity Estimation by Cascaded Support Vector Machines
Keiji Shimada, (AIST Japan), Tetsu Matsukawa (UTT, Japan), Yoshihiro Noguchi (AIST, Japan) and Takio Kurita (Hiroshima University, Japan)

Detecting Frequent Patterns in Video using Partly Locality Sensitive Hashing
Koichi Ogawara, Yasufumi Tanabe, Ryo Kurazume, and Tsutomu Hasegawa (Kyushu University)

12:00-13:30Lunch Break

Session B Chair: Timothy Hospedales


Foot Contact Detection for Sprint Training
Robert Harle (Univ. of Cambridge), Jonathan Cameron ( University of Cambridge), and Joan Lasenby( University of Cambridge)

Interpreting Dynamic Meaning by Integrating Gesture and Posture Recognition System
Omer Rashid, Ayoub Al-Hamadi, and Bernd Michaelis (Institute for Electronics, Signal Processing and Communications)

Learning from Mistakes: Object Movement Classification by the Boosted Features
Shigeyuki Odashima (The University of Tokyo), Tomomasa Sato (the University of Tokyo), and Taketoshi Mori (the University of Tokyo)


Coffee Break

Modeling Multi-Object Activities in Phase Space
Ricky J. Sethi and Amit K. Roy-Chowdhury (UC Riverside)

Sparse motion segmentation using Multiple Six-Point Consistency
Vasileios Zografos, Klas Nordberg, and Liam Ellis (Linkoping University)

Two-Probabilistic Latent Semantic Model for Image Annotation and Retrieval
Nattachai Watcharapinchai (Chulalongkorn University), Supavadee Aramvith (Chulalongkorn University), and Supakorn Siddhichai ( National Electronics and Computer Technology Center (NECTEC))

Using Conditional Random Field For Crowd Behavior Analysis
Saira Saleem Pathan, Ayoub Al-Hamadi, and Bernd Michaelis (Institute for Electronics, Signal Processing and Communications)

Keynote Speech

Title: Surveillance Video Events: Future Prospects for Tagging and Retrieval

Abstract: For the video surveillance communities, the classification and retrieval of significant events is one of the principal activities. This applies both to control room practitioners for whom this is a daily occurrence, and signal processing engineers who  aim to automate the various components of this process. This talk will begin by providing a survey of current methodologies for both these communities, and go on to discuss the key challenges. These include the automated categorization of more complex and also more subtle events;  the establishment of a suitable ontology (and format) to represent them, and the creation of standards and frameworks to allow interoperability of systems. There are also surveillance-specific issues such as the role of ‘anomalous’ events and the protection of personnel privacy. 

Looking to the future, there are several significant factors: the expanding number and resolution of surveillance sensors, and also the diversity of their origin; the increasing public expectation for efficient utilisation of these resources, the convergence with broadcast and internet technology, and the growing maturity of signal processing techniques. The impact of these factors on future prospects is discussed.

Speaker: Dr. James Orwell, Kingston University, UK

Dr. James Orwell is a Reader in the Faculty of Computing, Information Systems, and Mathematics at Kingston University where he teaches programming to undergraduates and works with postgraduates on digital imaging research projects. A member of the Digital Imaging Research Centre (DIRC), his research interests include detection and tracking algorithms for visual surveillance and sports applications and the representation of extracted visual semantics.

Dr Orwell studied Physics and Philosophy at Oxford University before completing his PhD on image processing within the department of Physics at King’s College London. He has worked on numerous projects relating to image processing, including projects for the Defence Evaluation and Research Agency (at King's College), research contracts in vehicle tracking and recognition (at Kingston University), and as a Short Term Research Fellow at BTExact.

As leader of the Visual Surveillance Research Group within DIRC, Dr Orwell is responsible for maintaining the international leadership DIRC has established in both tackling the key research issues and encouraging their deployment within industry. He was principal investigator for the EU INMOVE (2002-2004) project to develop a software toolkit for developing intelligent audio-visual applications for mobile phone networks and the EU CARETAKER (2005-2008) to develop a monitoring system for town centres, railway stations or other public space using video and audio devices. Under the Grand Challenge programme, he was funded by the Ministry of Defence to evaluate DIRC visual surveillance technology for the protection of armed forces in hostile environments (2007). He has led two EPSRC funded Industrial Cases Awards with BAe Systems Ltd and Overview Ltd and two DBERR Knowledge Transfer Partnership with Pharos Ltd and Infoshare Ltd.

He is an active member of IST 37 committee and has provided contributions to MPEG standardization activities, in particular the MPEG-A part 10 (Visual Surveillance Application Format). He has provided numerous media interviews on the topic of visual surveillance including the Guardian and BBC Radio 4.

Call for Papers

One of the remarkable capabilities of human visual perception system is to interpret and recognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. As an ultimate goal of computer vision system, the interpretation and recognition of visual events is one of the most challenging problems and has increasingly become very popular for decades. This task remains exceedingly difficult because of several reasons: 1) there still remain large ambiguities in the definition of different levels of events. 2) A computer model should be capable of capturing the meaningful structure for a specific event. At the same time, the representation (or recognition process) must be robust under challenging video conditions. 3) A computer model should be able to understand the context of video scenes to have meaningful interpretation of a video event. Despite those difficulties, in recent years, steady progress has been made towards better models for video event categorisation and recognition, e.g., from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition.

The goal of this workshop is to provide a forum for recent research advances in the area of video event categorisation, tagging and retrieval. The workshop seeks original high-quality submissions from leading researchers and practitioners in academia as well as industry, dealing with theories, applications and databases of visual event recognition. Topics of interest include, but are not limited to:

  • Motion interpretation and grouping
  • Human Action representation and recognition
  • Abnormal event detection
  • Contextual event inference
  • Event recognition among a distributed camera network
  • Multimodal event recognition
  • Spatial temporal features for event categorisation
  • Hierarchical event recognition
  • Probabilistic graph models for event reasoning
  • Machine learning for event recognition
  • Global/local event descriptors
  • Metadata construction for event recognition
  • Bottom up and top down approaches for event recognition
  • Event-based video segmentation and summarization
  • Video event database gathering and annotation
  • Efficient indexing and concepts modelling for video event retrieval
  • Semantic-based video event retrieval
  • Online video event tagging
  • Evaluation methodologies for event-based systems
  • Event-based applications (security, sports, news, etc.)

Important Dates

  • Submission deadline: August 8th, 2010 (extended)
  • Notification of acceptance: September 15th, 2010
  • Camera-ready papers: September 30th, 2010
  • Workshop: November 9th, 2010

Workshop Chairs

  • Ling Shao, The University of Sheffield, UK
  • Jianguo Zhang, Queen's University Belfast, UK
  • Tieniu Tan, Chinese Academy of Sciences, China
  • Thomas S. Huang, University of Illinois at Urbana-Champaign, USA

Paper Submission

  • When submitting manuscripts to this workshop, the authors acknowledge that manuscripts substantially similar in content have NOT been submitted to another conference, workshop, or journal. However, dual submission to the ACCV main conference and VECTaR is allowed.

  • The format of a paper submission is the same as the ACCV main conference except that the page limit is 10. Please follow instructions on the ACCV 2010 website

  • For the paper submission, please go to the Submission Website (


Each submission will be reviewed by at least three reviewers from program committee members and external reviewers for originality, significance, clarity, soundness, relevance and technical contents. Accepted papers will be published together with the proceedings of ACCV 2010 by Springer. High-quality papers will be invited to submit in an extended form to an edited book or a special issue of a good computer vision journal after the conference.

Program Committee

    • Faisal Bashir, Heartland Robotics, USA
    • Xu Chen, University of Michigan, USA
    • Ling-Yu Duan, Peking University, China 
    • GianLuca Foresti, University of Udine, Italy 
    • Kaiqi Huang, Chinese Academy of Sciences, China
    • Thomas S. Huang, University of Illinois at Urbana-Champaign, USA
    • Yu-Gang Jiang, City University of Hong Kong, China
    • Graeme A. Jones, Kingston University, UK
    • Ivan Laptev, INRIA, France
    • Jianmin Li, Tsinghua University, China
    • Xuelong Li, Chinese Academy of Sciences, China
    • Zhu Li, Hong Kong Polytechnic University, China
    • Xiang Ma, IntuVision, USA
    • Paul Miller, Queen's University Belfast, UK  
    • Shin'ichi Satoh, National Institute of Informatics, Japan
    • Ling Shao, The University of Sheffield, UK
    • Peter Sturm, INRIA, France
    • Tieniu Tan, Chinese Academy of Sciences, China  
    • Xin-Jing Wang, Microsoft Research Asia, China
    • Tao Xiang, Queen Mary University London, UK
    •  Jian Zhang, Chinese Academy of Sciences, China
    • Jianguo Zhang, Queen's University Belfast, UK