ACM Multimedia Presentation & LIRE Solr

Today I gave a talk on LIRE at the ACM Multimedia conference in the open source software competition, currently taking place in Barcelona. It gave me the opportunity to present a local installation of the LIRE Solr plugin and the possibilities thereof. Find the slides of the talk at slideshare: LIRE presentation at the ACM Multimedia Open Source Software Competition 2013

The Solr plugin itself is fully functional for Solr 4.4 and the source is available at There is a markdown document explaining what can be done with plugin and how to actually install it. Basically it can do content based search, content based re-ranking of text searches and brings along a custom field implementation & sub linear search based on hashing.

Lire 0.9.4 beta 2 released

The beta update features (i) improvements on local feature handling. i.e. stronger quantization of local feature histograms and several bug fixes, (ii) critical bug fixes for CEDD and JCD, which were not thread safe, and (iii) improvements on the ParallelExtractor and Indexor classes as well as the intermediate binary format.


LIRE 0.9.4 beta released

I’ve just uploaded LIRE 0.9.4 beta to the Google Code downloads page. This is an intermediate release that reflects several changes within the SVN trunk. Basically I put it online as there are many, many bugs solved in this one and it’s performing much, much faster than the 0.9.3 release. If you want to get the latest version I’d recommend to stick to the SVN. However, currently I’m changing a lot of feature serialization methods, so there’s no guarantee that an index created with 0.9.4 beta will work out with any newer version. Note also that the release does not work with older indexes ;)

Major changes include, but are not limited to:

  • New features: PHOG, local binary patterns and binary patterns pyramid
  • Parallel indexing: a producer-consumer based indexing application that makes heavy use of available CPU cores. On a current Intel Core i7 or considerably large Intel Xeon system it is able to reduce extraction to a marginal overhead to disk I/O.
  • Intermediate byte[] based feature data files: a new way to extract features in a distributed way
  • In-memory cached ImageSearcher: as long as there is enough memory all linear searching is done in memory without much disk I/O (cp. class GenericFastImageSearcher and set caching to true)
  • Approximate indexing based on hashing: tests with 1.5 million led to search time < 300ms (cp. GenericDocumentBuilder with hashing set to true and BitSamplingImageSearcher)
  • Footprint of many global descriptors has been significantly reduced. Examples: EdgeHistogram 40 bytes, ColorLayout 504 bytes, FCTH 96 bytes, …
  • New unit test for benchmarking features on the UCID data set.

All changes can be found in the CHANGES.txt file.

CfP: ACM MMSys 2014 Dataset Track

The ACM Multimedia Systems conference ( provides a forum for researchers, engineers, and scientists to present and share their latest research findings in multimedia systems. While research about specific aspects of multimedia systems is regularly published in the various proceedings and transactions of the networking, operating system, real-time system, and database communities, MMSys aims to cut across these domains in the context of multimedia data types. This provides a unique opportunity to view the intersections and interplay of the various approaches and solutions developed across these domains to deal with multimedia data types. Furthermore, MMSys provides an avenue for communicating research that addresses multimedia systems holistically.

As an integral part of the conference since 2011 2012, the Dataset Track provides an opportunity for researchers and practitioners to make their work available (and citable) to the multimedia community. MMSys encourages and recognizes dataset sharing, and seeks contributions in all areas of multimedia (not limited to MM systems). Authors publishing datasets will benefit by increasing the public awareness of their effort in collecting the datasets.

In particular, authors of datasets accepted for publication will receive:

  • Dataset hosting from MMSys for at least 5 years
  • Citable publication of the dataset description in the proceedings published by ACM
  • 15 minutes oral presentation time at the MMSys 2014 Dataset Track

All submissions will be peer-reviewed by at least two members of the technical program committee of the MMSys 2014. Datasets will be evaluated by the committee on the basis of the collection methodology and the value of the dataset as a resource for the research community.

Submission Guidelines 

Authors interested in submitting a dataset should

(A) Make their data available by providing a public URL for download

(B) Write a short paper describing:

  1. motivation for data collection and intended use of the data set,
  2. the format of the data collected, 
  3. the methodology used to collect the dataset, and 
  4. basic characterizing statistics from the dataset.

Papers should be at most 6 pages long (in PDF format) prepared in the ACM style and written in English.

Important dates

  • Data set paper submission deadline: November 11, 2013
  • Notification: December 20, 2013
  • MMSys conference : March 19 – 21, 2014

MMsys Datasets

Previous accepted datasets can be accessed at


For further queries and extra information, please contact us at Most recent information can be found on

2013-07-07 (ml): Updated URLs and “2011”

LIRE Lucene 4.x Performance

While I know that the performance did not skyrocket with Lucene 4.0 I finally came around to find out why. Unfortunately the field compression technique applied in Lucene 4.x compresses each and every stored field … and decompresses it upon access. This makes up for a nice overhead when reading the index in a linear way, which is excactly one of the main methods of LIRE.

CompressedFieldsThe image shows a screen shot of the CPU sampler in VisualVM. 58.7% of the CPU time go to the LZ4 decompression routine. That’s quite a lot and makes a huge difference for search. If anyone has a workaround of sort, I’d be happy :)

Update (2013-07-03): With the great help of the people from the lucene-user list I found at least a speed-up. In the current SVN version, there is a nove LireCustomCodec for stored fields, which speeds up decompression a lot. Moreover there is now an in-memory caching approach implemented in the GeneriecFastImageSearcher class, which is turned off by default, but speeds up search time (as a trade off for memory and init time) by holding image features in-memory. It has been tested with up to 1.5M images.

LireDemo 0.9.4 beta released

The current LireDemo 0.9.4 beta release features a new indexing routine, which is much faster than the old one. It’s based on the producer-consumer principle and makes — hopefully — optimal use of I/O and up to 8 cores of a system. Moreover, the new PHOG feature implementation is included and you can give it a try. Furthermore JCD, FCTH and CEDD got a more compact representation of their descriptors and use much less storage space now. Several small changes include parameter tuning on several descriptors and so on. All the changes have been documented in the CHANGES.txt file in the SVN.


Open Source Software Competition of the ACM MM 2012

The ACM Multimedia Open-Source Software Competition celebrates the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year will be the sixth year in running the competition as part of the ACM Multimedia program.

To qualify, software must be provided with source code and licensed in such a manner that it can be used free of charge in academic and research settings. For the competition, the software will be built from the sources. All source code, license, installation instructions and other documentation must be available on a public web page. Dependencies on non-open source third-party software are discouraged (with the exception of operating systems and commonly found commercial packages available free of charge). To encourage more diverse participation, previous years’ non-winning entries are welcome to re-submit for the 2013 competition. Student-led efforts are particularly encouraged.

Authors are highly encouraged to prepare as much documentation as possible, including examples of how the provided software might be used, download statistics or other public usage information, etc. Entries will be peer-reviewed to select entries for inclusion in the conference program as well as an overall winning entry, to be recognized formally at ACM Multimedia 2013. The criteria for judging all submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, student-led, no dependence on closed source, etc.).

Authors of the winning entry, and possibly additional selected entries, will be invited to demonstrate their software as part of the conference program. In addition, accepted overview papers will be included in the conference proceedings.

more information …

Important Dates

  • Open Source Software Submission Deadline: May 13, 2013
  • Notification of Acceptance: June 30, 2013

Serialization of LIRE global features updated

In the current SVN version three global features have been re-visited in terms of serialization. This was necessary as the index of the web demo with 300k images already exceed 1.5 GB.

Feature prior now
EdgeHistogram 320 bytes 40 bytes
JCD 1344 bytes 84 bytes
ColorLayout 14336 bytes 504 bytes

This significant reduction in space leads to (i) smaller indexes, (ii) reduced I/O time, and (iii) therefore, to faster search.

How was this done? Basically it’s clever organization of bytes. In the case of JCD the histogram has 168 entries, each in [0,127], so basically half a byte.Therefore, you can stuff 2 of these values into one byte, but you have to take care of the fact, that Java only supports bit-wise operations on ints and bytes are signed. So the trick is to create an integer in [0, 2^8-1] and then subtract 128 to get it into byte range. The inverse is done for reading. The rest is common bit shifting.

The code can be seen either in the file in the SVN, or in the snippet at for your convenience.