Category Archives: Multimedia

LIRe presentation and poster at ACM MM 2011

Just finished my presentation at ACM MM’s open source competition in 2011. Many interested researchers and developers came by to discuss ideas and developments. I’m looking forward to turning many of those idea into code 😉

For those of you interested in the poster I uploaded it here.

I also uploaded the presentation to slideshare.

Lire and Lire Demo v 0.9 released

I just released Lire and Lire Demo in version 0.9 on Basically it’s the alpha version with additional speed and stability enhancements for bag of visual words (BoVW) indexing. While this has already been possible in earlier versions I re-furbished vocabulary creation (k-means clustering) and indexing to support up to 4 CPU cores. I also integrated a function to add documents to BoVW indexes incrementally. So a list of major changes since Lire 0.8 includes

  • Major speed-up due to change and re-write of indexing strategies for local features
  • Auto color correlation and color histogram features improved
  • Re-ranking filter based on global features and LSA
  • Parallel bag of visual words indexing and search supporting SURF and SIFT including incremental index updates (see also in the wiki)
  • Added functionality to Lire Demo including support for new Lire features and a new result list view

Download and try:

Lire Demo 0.9 alpha 2 just released

Finally I found some time to go through Lire and fix several of the — for me — most annoying bugs. While this is still work in progress I have a preview with the demo uploaded to New features are:

  • Auto Color Correlogram and Color Histogram features improved
  • Re-ranking based on different features supported
  • Enhanced results view
  • Much faster indexing (parallel, use -server switch for your JVM)
  • Much faster search (re-write of the searhc code in Lire)
  • New developer menu for faster switching of search features
  • Re-ranking of results based on latent semantic analysis

You can find the updated Lire Demo along with a windows launcher here, Mac and Linux users please run it using “java -jar … ” or double click (if your windows manager supports actions like that 🙂

The source is — of course — GPL and available in the SVN.

Final Call for Papers: Special Issue on Searching Speech

ACM Transactions on Information Systems is soliciting contributions to a special issue on the topic of “Searching Speech”. The special issue will be devoted to algorithms and systems that use speech recognition and other types of spoken audio processing techniques to retrieve information, and, in particular, to provide access to spoken audio content or multimedia content with a speech track.

Submission Deadline: 1 March 2011

The field of spoken content indexing and retrieval has a long history dating back to the development of the first broadcast news retrieval systems in the 1990s. More recently, however, work on searching speech has been moving towards spoken audio that is produced spontaneously and in conversational settings. In contrast to the planned speech that is typical for the broadcast news domain, spontaneous, conversational speech is characterized by high variability and the lack of inherent structure. Domains in which researchers face such challenges include: lectures, meetings, interviews, debates, conversational broadcast (e.g., talk-shows), podcasts, call center recordings, cultural heritage archives, social video on the Web, spoken natural language queries and the Spoken Web.
We invite the submission of papers that describe research in the following areas:

  • Integration of information retrieval algorithms with speech recognition and audio analysis techniques
  • Interfaces and techniques to improve user interaction with speech collections
  • Indexing diverse, large scale collections
  • Search effectiveness and efficiency, including exploitation of additional information sources

For more information see

ACM Multimedia Call for Volunteers out …

ACM Multimedia – taking place in Firenze, Italy, last week of Oct. – still is in need for student volunteers. I personally think this is a great opportunity to learn about big conferences and the community. If you are interested ensure to send your bid until Oct. 7th 2010. Find the call here.


Converting video for flash video players to H.264/AAC

Have you ever tried to put a video online? Well actually it is quite easy if you user YouTube. No matter what codec you use you have a good chance to get a decent result. If you want to host the video yourself you basically need a flash video player (assuming that flash is the most widely spread tool on multiple platforms) like the JW FLVPlayer. Finally you’ll need to get your video file to a format flash can play using progressive download (which means you can watch it while downloading, just like on YouTube).

Since Adobe Flash Player 9 Update 3Flash can play back MP4 files with H.264 video and AAC audio streams [see here], so we can just focus on this one. First step is to get a ffmpeg version compiled with libx264 and libfaac. You might check this on the command line, just execute ffmpeg without parameters:

FFmpeg version SVN-r16573, Copyright (c) 2000-2009 Fabrice Bellard, et al.
configuration: […] –enable-libfaac –enable-libgsm –enable-libx264 […]

The bold ones should be there to support the needed codecs. I used FFmpeg Revision 16537 from this page, which works fine.

If the libraries are there you can proceed to the next step:

ffmpeg -i <inputfile> -b 1024k -vcodec libx264 \\
-acodec libfaac -ab 160k <output.mp4>

This converts your input file to the needed mp4 file. You can also change the size of the file with the switch “-s”, like for instance “-s 320×240”. Take a close look on the switches “-b” and “-ab” which define video and audio bitrate. If the sum of both bitrates is too high for the network the user will not be able to watch the video smoothly.

One might think s/he’s finished, but no … unfortunately progressive download doesn’t work with too many mp4 files. The file index (an atom == “mp4 metadata unit”) containing the file index (== the description where the video and the audio stream are located in the file and how they are stored) is at the end of the MP4 file. So the flash player has to download the whole file before starting the playback, ka-ching!

Fortunately there is an ffmpeg tool called qt-faststart (linux users will find it in the tools folder of ffmpeg) moving the index from end to start. For windows user a precompiled binary can be found here. Use this to move the metadata:

qt-faststart <infile.mp4> <outfile.mp4>

Now you are done with the file. Use for instance the JW FLVPlayer setup wizard to create an HTML snippet. Note that in height you have to add 19 pixels to your video dimensions, as this is the height of the control bar of the player 😀

Music portal reduced to max: Grooveshark

Lately it was getting more and more challenging to hear a song you want online. YouTube sorts out based on Geo-IP, samples in online stores get shorter if even there. But I was pointed to a straightforward portal: Grooveshark. You just searcfh for the song you’d like to hear, press play and there you are. If you want to listen to multiple songs, there’s a queue usable without registration. Nice!


Contribution @ I-KNOW 09 accepted!

The contribution of Christoph Kofler and me with the title “An exploratory study on the explicitness of user intentions in digital photo retrieval” has been accepted for publication and presentation at the I-Know ’09. Here is the abstract (the full paper will follow as soon as we have prepared the camera ready version):

Search queries are typically interpreted as specification of information need of a user. Typically the search query is either interpreted as is or based on the context of a user, being for instance a user profile, his/her previously undertaken searches or any other background information. The actual intent of the user – the goal s/he wants to achieve with information retrieval – is an important part of a user’s context. In this paper we present the results of an exploratory study on the interplay between the goals of users and their search behavior in multimedia retrieval.

This work has been supported by the SOMA project.


Self Organizing Multimedia Architecture (SOMA) – The Project

Currently we are working a lot on a research project funded by a Carinthian agency. The project goes on for 3 years (2.5 years left) and has enough funds to pay 4 PhDs and 1 post doc researcher. Here is the description and the link to the blog:

The project Self-organizing Multimedia Architecture (SOMA) aims to capture the whole life-cycle of multimedia content in a single architecture for large distributed multimedia information systems. In SOMA we focus on scenarios where events, which we understand as “limited time periods of special importance”, are a central concept. Examples for such scenarios are sports events stretching over time, where start, finish or critical parts of a race are possible events, or traffic monitoring, where events like traffic jams or accidents have to be reported and investigated.

via Project Description | Self Organizing Multimedia Architecture.

Video Summaries for YouTube Videos?

Applying old things to new platforms has become common in recent times, here’s my contribution. I recently developed a video summary tool based on FFMPEG and Lire for a friend … just to test if common approaches are usable in a specific domain. Video summarization – especially of small videos – is a rather easy thing. You just need to find a number of frames with maximized pairwise difference, to cover a maximized visual range of the video. I applied my tool on YouTube and got the following summaires for the “hippo bathing” video:

Based on the CEDD descriptor the most important keyframe is really chosen well – just watch the video to know what I mean 🙂

With the auto color correlogram feature the dog is not explicitely part of the picture. However the first frame chosen (the big one) gives a good impression on the “bathing” part.

With the Gabor texture feature the dog gets prominent in the first place. Noite that the result is quite the same as the result kwith the Tamura texture feature not shown here.

With the most simple feature (RGB color histogram with L2 distance) the summary also looks appealing. There is a frame featuring the dog, one showing the whole scene and one for the hippo.

All in all I think the results are quite appealing. The runtime of my implementation is a fraction of actual video play time. Perhaps I’ll find some time to present the whole thing tomorrow at the barcamp 😉