Internet Video Search

Organizers: Cees G.M. Snoek (Univ. of Amsterdam, NL), Arnold W.M. Smeulders (CWI, NL)
Duration: half day
Abstract: In this half‐day tutorial we focus on the computer vision challenges in internet video search, present methods how to achieve state‐of‐the‐art performance while maintaining efficient execution, and indicate how to obtain spatiotemporal improvements in the near future. Moreover, we give an overview of the latest developments and future trends in the field on the basis of the TRECVID competition – the leading competition for video search engines run by NIST – where we have achieved consistent top‐2 performance over the years, including the 2008, 2009, 2010 and 2011 editions. This half‐day tutorial is especially meant for researchers and practitioners who are new to the field of video search (introductory), people who have started in this direction (intermediate), or people who are interested in a summary of the state‐of‐the‐art in this exciting area (general interest).

Outline:

Introduction

Problem statement: social, business, and scientific
Course organization: fundamentals, semantics, search, evaluation.

Fundamentals

Invariance: the sensory and semantic gap
Local shape: Gaussians, Gabors, and Loweans
Texture: natural image statistics, gradients, Weibulls
Color: light source, reflection, and representation
Motion: optic flow, tracking

Semantics

Descriptors: SIFT, SURF, Daisy, HOG3D, STIP, ColorSIFT
Words: hard assignment, soft‐assignment, difference coding, geometry
Similarities: nearest neighbor, histogram intersection, χ2
Classifiers: support vector machines, random forests
Localized objects: the visual extent of an object, selective search for where the object might be.

Concept and event detection: annotation efforts, crowdsourcing, detector performance
Translating queries to detectors: textual, visual, semantic, and their combination
Interacting with the user through the interface gap: browsing and learning

Evaluation

NIST TRECVID Benchmark: data, tasks, and results
Benchmark criticism: broad‐domain applicability, repeatability, VideOlympics showcase
Resources: annotations, baselines, and software
Demonstration of the MediaMill Semantic video search engine

Conclusion

Concluding remarks: achievements and discussion
Future work: challenges and opportunities for the vision community.

Material:

We will provide a web page about our short course on Internet Video Search, which will include the lecture slides, pointers to data sets, software, as well as all video’s of the demonstrators.

All lecture material will be accessible from: http://tutorials.ceessnoek.info/