People lately asked whether LIRE can do more than linear search and I always answered: Yes, it should … but you know I never tried. But: Finally I came around to index the MIR-FLICKR data set and some of my Flickr-crawled photos and ended up with an index of 1,443,613 images. I used CEDD as main feature and a hashing algorithm to put multiple hashes per images into Lucene — to be interpreted as words. By tuning similarity, employing a Boolean query, and adding a re-rank step I ended up with a pretty decent approximate retrieval scheme, which is much faster and does not loose too many images on the way, which means the method has an acceptable recall. The image below shows the numbers along with a sample query. Linear search took more than a minute, while the hashing based approach did (nearly) the same thing in less than a second. Note that this is just a sequential, straight forward approach, so no optimization has been done to the performance. Also the hashing approach has not yet been investigated in detail, i.e. there are some parameters that still need some tuning … but let’s say it’s a step into the right direction.
Why do people record videos and share them? While the question seems to be simple, user intentions have not yet been investigated for video production and sharing. A general taxonomy would lead to adapted information systems and multimedia interfaces tailored to the users’ intentions. We contribute (1) an exploratory user study with 20 participants, examining the various facets of user intentions for video production and sharing in detail and (2) a novel set of user intention clusters for video production, grounded empirically in our study results. We further reflect existing work in specialized domains (i.e. video blogging and mobile phone cameras) and show that prevailing models used in other multimedia fields (e.g. photography) cannot be used as-is to reason about video recording and sharing intentions.
This paper has been published and presented at WIAMIS 2012.
Authors: Mathias Lux & Jochen Huber
Recently I posted binaries and packaged libraries for face detection based on OpenCV an OpenIMAJ here and here. Basically both employ similar algorithms to detect faces in photos. As this is based on supervised classification not only the algorithm but also the employed training set has strong influence on the actual precision (and recall) of results. So out of interest I took a look on how well the results of both libraries are correlated:
imaj_20 1.000 0.933 0.695 imaj_40 0.933 1.000 0.706 opencv_ 0.695 0.706 1.000
Above table shows the Pearson correlation of the face detection algorithm with the default models of OpenIMAJ (with a minimum face size of 20 and 40 pixels) and OpenCV. As can be seen the results correlate, but are not the same. Conclusion is: make sure that you check which one to use for your aplication and eventually train one yourself (as actually recommended by the documentation of both libraries).
This experiment has been done on just 171 images, but experiments with larger data sets have shown similar results.
Two research contributions of me and my colleagues finally made their way online. The paper Adaptive Visual Information Retrieval by Changing Visual Vocabulary Sizes in Context of User Intentions by Marian Kogler, Oge Marques and me investigates how the size and generation process of visual word vocabularies influences retrieval for different degrees of intentionality, being a clear search intent, a surfing intent and a browsing intent. The paper Which Video Do You Want to Watch Now? Development of a Prototypical Intention-based Interface for Video Retrieval by Christoph Lagger, Oge Marques and me presents selected results of a large scale study on the motivations of video consumption on the internet.
Recently there was quite a buzz around the whole social media topic. Many researchers saw indications that the willingness or people to share and annotate content might lead to new ways of indexing, searching and consuming multimedia. The biggest problems with the buzz is … that it’s BIG Many research groups produced even more papers and with the rising number of papers the scientific impact got smaller and smaller. However Neela Sawant, Jia Li and James Z. Wang took a close look at more than 200 papers and provide a survey on part of the topic with the journal article “Automatic image semantic interpretation using social action and tagging data” in the Multimedia Tools and Applications journal.
The contribution of Christoph Kofler and me with the title “An exploratory study on the explicitness of user intentions in digital photo retrieval” has been accepted for publication and presentation at the I-Know ’09. Here is the abstract (the full paper will follow as soon as we have prepared the camera ready version):
Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.
This work has been supported by the SOMA project.
Although its quite some time ago that I got the acceptance mail I forgot to blog the good news: Lire (Lucene Image Retrieval) has been acccepted to be presented at the ACM Multimedia within the Open Source Contest track. As it is a contest I assume we have chances to win something?
I recently submitted Lire and LireDemo to the ACM Multimedia Open Source Software Competition 2008. As I’d really like to go there I hope it will judged as relevant contribution and a demo at the ACM Multimedia is requested. Note that I’ve integrated a new feature in LireDemo for the ACM Multimedia submission: Now its easier to test Lire by just indexing random photos from Flickr. By just hitting the “Index” button without giving a directory of images the download will start automatically.
- Submission files (~800 kB)
In the morning I was listenting to the talk of Horst Bischof on robust people detection in surveillance scenarios. In my opinion he gave a great talk: He manages to show the nature and results of their research and motivate the usefulness and significance of results based on context and related work. He points out what the achievements are and what hasn’t been touched by his research group and why. He also visualized his results using videos which was appreciated by the audience. If I findout where the videos can be found I’ll blog the link.
I’m happy to announce that there will be a special session of the Multimedia Metadata Community at the TRIPLE-I / I-Media conference in Graz in September.
CfP: I-Media Special Session on Multimedia Metadata
Studies show that sales of digital capture devices like video camera, digital photo cameras, or mobile phones with digital cameras are still rising. Therefore it can be expected that the number of created digital multimedia content will rise dramatically in future. Multimedia metadata is currently the only way to cope with problems like semantics based retrieval or organization of content and provides means to specify adaptation and delivery constraints and rules. Within this special session the importance of metadata for media technologies is discussed. We encourage the submission of high quality scientific work as well as application papers. Topics include but are not limited to:
- Multimedia technologies and metadata in the social web
- Multimedia semantics
- Annotation of multimedia content
- Metadata in pervasive multimedia computing
- Metadata for new media
- Studies and surveys in context of multimedia metadata and new media
We encourage the submission of high quality scientific work as well as application papers.
The special session is organized by the Multimedia Metadata Community which has already organized a series of 8 successful workshops in the past taking place in Aachen, Berlin, Graz and Klagenfurt. The Multimedia Metadata Community aims to extend the active and successful community and network with new members.
Submissions for the special session are handled via the submission system of the TRIPLE-I/I-Media and are reviewed by the TRIPLE-I/I-Media program committee.
14 April 2008: Submission of the full papers (4-8 pages)
31 May 2008: Notification of acceptance
30 June 2008: Camera ready version (8 pages)
3-5 September 2008: TRIPLE-I Conference
Special Session Chairs
- Michael Granitzer, Know-Center Graz
- Mathias Lux, ITEC / Klagenfurt University
Contact: mlux <at> itec <dot> uni-klu.ac.at