Category Archives: Research

CBMI 2017 – Call For Papers

CBMI aims at bringing together the various communities involved in all aspects of content-based multimedia indexing for retrieval, browsing, management, visualization and analytics.

Authors are encouraged to submit previously unpublished research papers in the broad field of content-based multimedia indexing and applications. In addition to multimedia and social media search and retrieval, we wish to highlight related and equally important issues that build on content-based indexing, such as multimedia content management, user interaction and visualization, media analytics, etc.


  • Full/short paper submission deadline: February 28, 2017
  • Demo paper submission deadline: February 28, 2017
  • Special Session proposals submission: November 16, 2016

more information …

CrowdMM 2014 Deadline Extended!

The power of crowds – leveraging a large number of human contributors and the capabilities of human computation – has enormous potential to address key challenges in the area of multimedia research. Crowdsourcing offers a time- and resource-efficient method for collecting large volumes of input for system design and evaluation, making it possible to optimize multimedia systems more rapidly and to address human factors more effectively. At present, crowdsourcing remains notoriously difficult to exploit effectively in multimedia settings: the challenge arises from the fact that a community of users or workers is a complex and dynamic system highly sensitive to changes in the form and the parameterization of their activities.

The submission deadline has been extended to July 15, 2014

The third CrowdMM workshop takes place in Orlando, FL, right along ACM Multimedia 2014. For more information, topics and important dates visit:

CfP: ACM MMSys 2014 Dataset Track

The ACM Multimedia Systems conference ( provides a forum for researchers, engineers, and scientists to present and share their latest research findings in multimedia systems. While research about specific aspects of multimedia systems is regularly published in the various proceedings and transactions of the networking, operating system, real-time system, and database communities, MMSys aims to cut across these domains in the context of multimedia data types. This provides a unique opportunity to view the intersections and interplay of the various approaches and solutions developed across these domains to deal with multimedia data types. Furthermore, MMSys provides an avenue for communicating research that addresses multimedia systems holistically.

As an integral part of the conference since 2011 2012, the Dataset Track provides an opportunity for researchers and practitioners to make their work available (and citable) to the multimedia community. MMSys encourages and recognizes dataset sharing, and seeks contributions in all areas of multimedia (not limited to MM systems). Authors publishing datasets will benefit by increasing the public awareness of their effort in collecting the datasets.

In particular, authors of datasets accepted for publication will receive:

  • Dataset hosting from MMSys for at least 5 years
  • Citable publication of the dataset description in the proceedings published by ACM
  • 15 minutes oral presentation time at the MMSys 2014 Dataset Track

All submissions will be peer-reviewed by at least two members of the technical program committee of the MMSys 2014. Datasets will be evaluated by the committee on the basis of the collection methodology and the value of the dataset as a resource for the research community.

Submission Guidelines 

Authors interested in submitting a dataset should

(A) Make their data available by providing a public URL for download

(B) Write a short paper describing:

  1. motivation for data collection and intended use of the data set,
  2. the format of the data collected, 
  3. the methodology used to collect the dataset, and 
  4. basic characterizing statistics from the dataset.

Papers should be at most 6 pages long (in PDF format) prepared in the ACM style and written in English.

Important dates

  • Data set paper submission deadline: November 11, 2013
  • Notification: December 20, 2013
  • MMSys conference : March 19 – 21, 2014

MMsys Datasets

Previous accepted datasets can be accessed at


For further queries and extra information, please contact us at Most recent information can be found on

2013-07-07 (ml): Updated URLs and “2011”

Large image data sets with LIRE – some new numbers

People lately asked whether LIRE can do more than linear search and I always answered: Yes, it should … but you know I never tried. But: Finally I came around to index the MIR-FLICKR data set and some of my Flickr-crawled photos and ended up with an index of 1,443,613 images. I used CEDD as main feature and a hashing algorithm to put multiple hashes per images into Lucene — to be interpreted as words. By tuning similarity, employing a Boolean query, and adding a re-rank step I ended up with a pretty decent approximate retrieval scheme, which is much faster and does not loose too many images on the way, which means the method has an acceptable recall. The image below shows the numbers along with a sample query. Linear search took more than a minute, while the hashing based approach did (nearly) the same thing in less than a second. Note that this is just a sequential, straight forward approach, so no optimization has been done to the performance. Also the hashing approach has not yet been investigated in detail, i.e. there are some parameters that still need some tuning … but let’s say it’s a step into the right direction.


Why did you record this video? – An Exploratory Study on User Intentions for Video Production.

Why do people record videos and share them? While the question seems to be simple, user intentions have not yet been investigated for video production and sharing. A general taxonomy would lead to adapted information systems and multimedia interfaces tailored to the users’ intentions. We contribute (1) an exploratory user study with 20 participants, examining the various facets of user intentions for video production and sharing in detail and (2) a novel set of user intention clusters for video production, grounded empirically in our study results. We further reflect existing work in specialized domains (i.e. video blogging and mobile phone cameras) and show that prevailing models used in other multimedia fields (e.g. photography) cannot be used as-is to reason about video recording and sharing intentions.

This paper has been published and presented at WIAMIS 2012.

Authors: Mathias Lux & Jochen Huber


Difference in face detection with default training sets

Recently I posted binaries and packaged libraries for face detection based on OpenCV an OpenIMAJ here and here. Basically both employ similar algorithms to detect faces in photos. As this is based on supervised classification not only the algorithm but also the employed training set has strong influence on the actual precision (and recall) of results. So out of interest I took a look on how well the results of both libraries are correlated:

imaj_20   1.000 0.933 0.695
imaj_40   0.933 1.000 0.706
opencv_   0.695 0.706 1.000

Above table shows the Pearson correlation of the face detection algorithm with the default models of OpenIMAJ (with a minimum face size of 20 and 40 pixels) and OpenCV. As can be seen the results correlate, but are not the same. Conclusion is: make sure that you check which one to use for your aplication and eventually train one yourself (as actually recommended by the documentation of both libraries).

This experiment has been done on just 171 images, but experiments with larger data sets have shown similar results.

Two new research papers published

Two research contributions of me and my colleagues finally made their way online. The paper Adaptive Visual Information Retrieval by Changing Visual Vocabulary Sizes in Context of User Intentions by Marian Kogler, Oge Marques and me investigates how the size and generation process of visual word vocabularies influences retrieval for different degrees of intentionality, being a clear search intent, a surfing intent and a browsing intent. The paper Which Video Do You Want to Watch Now? Development of a Prototypical Intention-based Interface for Video Retrieval by Christoph Lagger, Oge Marques and me presents selected results of a large scale study on the motivations of video consumption on the internet.

Social Media, Tagging and Images Semantics

Social Media LandscapeRecently there was quite a buzz around the whole social media topic. Many researchers saw indications that the willingness or people to share and annotate content might lead to new ways of indexing, searching and consuming multimedia. The biggest problems with the buzz is … that it’s BIG 🙂 Many research groups produced even more papers and with the rising number of papers the scientific impact got smaller and smaller. However Neela Sawant, Jia Li and James Z. Wang took a close look at more than 200 papers and provide a survey on part of the topic with the journal article “Automatic image semantic interpretation using social action and tagging data” in the Multimedia Tools and Applications journal.


Contribution @ I-KNOW 09 accepted!

The contribution of Christoph Kofler and me with the title “An exploratory study on the explicitness of user intentions in digital photo retrieval” has been accepted for publication and presentation at the I-Know ’09. Here is the abstract (the full paper will follow as soon as we have prepared the camera ready version):

Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.

This work has been supported by the SOMA project.