Title: Surveillance Video Events: Future Prospects for Tagging and Retrieval
Abstract: For the video surveillance communities, the classification and retrieval of significant events is one of the principal activities. This applies both to control room practitioners for whom this is a daily occurrence, and signal processing engineers who aim to automate the various components of this process. This talk will begin by providing a survey of current methodologies for both these communities, and go on to discuss the key challenges. These include the automated categorization of more complex and also more subtle events; the establishment of a suitable ontology (and format) to represent them, and the creation of standards and frameworks to allow interoperability of systems. There are also surveillance-specific issues such as the role of ‘anomalous’ events and the protection of personnel privacy.Looking to the future, there are several significant factors: the expanding number and resolution of surveillance sensors, and also the diversity of their origin; the increasing public expectation for efficient utilisation of these resources, the convergence with broadcast and internet technology, and the growing maturity of signal processing techniques. The impact of these factors on future prospects is discussed.
Speaker: Dr. James Orwell, Kingston University, UK
Dr. James Orwell is a Reader in the Faculty of Computing, Information Systems, and Mathematics at Kingston University where he teaches programming to undergraduates and works with postgraduates on digital imaging research projects. A member of the Digital Imaging Research Centre (DIRC), his research interests include detection and tracking algorithms for visual surveillance and sports applications and the representation of extracted visual semantics.
Dr Orwell studied Physics and Philosophy at Oxford University before completing his PhD on image processing within the department of Physics at King’s College London. He has worked on numerous projects relating to image processing, including projects for the Defence Evaluation and Research Agency (at King's College), research contracts in vehicle tracking and recognition (at Kingston University), and as a Short Term Research Fellow at BTExact.
As leader of the Visual Surveillance Research Group within DIRC, Dr Orwell is responsible for maintaining the international leadership DIRC has established in both tackling the key research issues and encouraging their deployment within industry. He was principal investigator for the EU INMOVE (2002-2004) project to develop a software toolkit for developing intelligent audio-visual applications for mobile phone networks and the EU CARETAKER (2005-2008) to develop a monitoring system for town centres, railway stations or other public space using video and audio devices. Under the Grand Challenge programme, he was funded by the Ministry of Defence to evaluate DIRC visual surveillance technology for the protection of armed forces in hostile environments (2007). He has led two EPSRC funded Industrial Cases Awards with BAe Systems Ltd and Overview Ltd and two DBERR Knowledge Transfer Partnership with Pharos Ltd and Infoshare Ltd.He is an active member of IST 37 committee and has provided contributions to MPEG standardization activities, in particular the MPEG-A part 10 (Visual Surveillance Application Format). He has provided numerous media interviews on the topic of visual surveillance including the Guardian and BBC Radio 4.
One of the remarkable capabilities of human visual perception system is to interpret and recognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. As an ultimate goal of computer vision system, the interpretation and recognition of visual events is one of the most challenging problems and has increasingly become very popular for decades. This task remains exceedingly difficult because of several reasons: 1) there still remain large ambiguities in the definition of different levels of events. 2) A computer model should be capable of capturing the meaningful structure for a specific event. At the same time, the representation (or recognition process) must be robust under challenging video conditions. 3) A computer model should be able to understand the context of video scenes to have meaningful interpretation of a video event. Despite those difficulties, in recent years, steady progress has been made towards better models for video event categorisation and recognition, e.g., from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition.
The goal of this workshop is to provide a forum for recent research advances in the area of video event categorisation, tagging and retrieval. The workshop seeks original high-quality submissions from leading researchers and practitioners in academia as well as industry, dealing with theories, applications and databases of visual event recognition. Topics of interest include, but are not limited to:
Each submission will be reviewed by at least three reviewers from program committee members and external reviewers for originality, significance, clarity, soundness, relevance and technical contents. Accepted papers will be published together with the proceedings of ACCV 2010 by Springer. High-quality papers will be invited to submit in an extended form to an edited book or a special issue of a good computer vision journal after the conference.