Monthly Archives: October 2011

Searching with Lire in big datasets

Having received several complaints about the slowness of Lire when searching in 100k+ documents I took my time to write a small how to to explain approaches for search in big (relatively) data sets.

Lire has the ability to create indexes with lots of different features (descriptors, like RGB color histograms or CEDD). While this opens the opportunity to flexibility at search time as we can select the feature at the time we create a query, the index tends to get bigger and bigger and searcher take longer and longer.

With a data set of 121,379 images the index created with the features selected for default in Lire Demo has a size of 14,3 GB on the disk. In contrast to that an index just storing the CEDD feature along with the image identifier has a size of 29 MB.

Due to the size of the index also linear search tends to get slower. While for the index stripped down to the CEDD feature and the identifier searching takes (on a AMD Quad-Core computer with 4GB RAM and Java 1.7) roughly 0.33 seconds, searching the big index takes 7 minutes and 3 seconds.

So if you want to index and search big data sets (> 100.000 images for instance) I recommend to

  • select which features you need,
  • create the index with a minimum set of features, and
  • eventually split the index per feature and select the index on the fly instead of the feature
  • also you can load the index into RAM

For more on loading the index to RAM and the option to use local features read on in the developer wiki.

Lire publication in the top 10 downloads of ACM SIGMM

As to be found in this month’s SIGMM record, which is the electronic SIGMM newsletter, a publication about Lire is in the top 10 downloads of the ACM special interest group on multimedia for September 2011.

I co-authored the paper with Savvas Chatzichristofis:

Mathias Lux, Savvas A. Chatzichristofis. Lire: lucene image retrieval: an extensible java CBIR library. In ACM Multimedia 2008

It’s also the paper I recommend to include in references if Lire is used within a scientific publication, so my thanks also go to the authors citing and therefore pointing to our work!

Lire and Lire Demo v 0.9 released

I just released Lire and Lire Demo in version 0.9 on sourceforge.net. Basically it’s the alpha version with additional speed and stability enhancements for bag of visual words (BoVW) indexing. While this has already been possible in earlier versions I re-furbished vocabulary creation (k-means clustering) and indexing to support up to 4 CPU cores. I also integrated a function to add documents to BoVW indexes incrementally. So a list of major changes since Lire 0.8 includes

  • Major speed-up due to change and re-write of indexing strategies for local features
  • Auto color correlation and color histogram features improved
  • Re-ranking filter based on global features and LSA
  • Parallel bag of visual words indexing and search supporting SURF and SIFT including incremental index updates (see also in the wiki)
  • Added functionality to Lire Demo including support for new Lire features and a new result list view

Download and try: