Internet Video Search

Organizers: Cees G.M. Snoek (Univ. of Amsterdam, NL), Arnold W.M. Smeulders (CWI, NL)
Duration: half day
Abstract: In this half‐day tutorial we focus on the computer vision challenges in internet video search, present methods how to achieve state‐of‐the‐art performance while maintaining efficient execution, and indicate how to obtain spatiotemporal improvements in the near future. Moreover, we give an overview of the latest developments and future trends in the field on the basis of the TRECVID competition – the leading competition for video search engines run by NIST – where we have achieved consistent top‐2 performance over the years, including the 2008, 2009, 2010 and 2011 editions. This half‐day tutorial is especially meant for researchers and practitioners who are new to the field of video search (introductory), people who have started in this direction (intermediate), or people who are interested in a summary of the state‐of‐the‐art in this exciting area (general interest).


  • Introduction
    • Problem statement: social, business, and scientific
    • Course organization: fundamentals, semantics, search, evaluation.
  • Fundamentals
    • Invariance: the sensory and semantic gap
    • Local shape: Gaussians, Gabors, and Loweans
    • Texture: natural image statistics, gradients, Weibulls
    • Color: light source, reflection, and representation
    • Motion: optic flow, tracking
  • Semantics
    • Descriptors: SIFT, SURF, Daisy, HOG3D, STIP, ColorSIFT
    • Words: hard assignment, soft‐assignment, difference coding, geometry
    • Similarities: nearest neighbor, histogram intersection, χ2
    • Classifiers: support vector machines, random forests
    • Localized objects: the visual extent of an object, selective search for where the object might be.
  • Search
    • Concept and event detection: annotation efforts, crowdsourcing, detector performance
    • Translating queries to detectors: textual, visual, semantic, and their combination
    • Interacting with the user through the interface gap: browsing and learning
  • Evaluation
    • NIST TRECVID Benchmark: data, tasks, and results
    • Benchmark criticism: broad‐domain applicability, repeatability, VideOlympics showcase
    • Resources: annotations, baselines, and software
    • Demonstration of the MediaMill Semantic video search engine
  • Conclusion
    • Concluding remarks: achievements and discussion
    • Future work: challenges and opportunities for the vision community.



We will provide a web page about our short course on Internet Video Search, which will include the lecture slides, pointers to data sets, software, as well as all video’s of the demonstrators.

All lecture material will be accessible from:

  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati