Tag Archives: performance

LIRE Lucene 4.x Performance

While I know that the performance did not skyrocket with Lucene 4.0 I finally came around to find out why. Unfortunately the field compression technique applied in Lucene 4.x compresses each and every stored field … and decompresses it upon access. This makes up for a nice overhead when reading the index in a linear way, which is excactly one of the main methods of LIRE.

CompressedFieldsThe image shows a screen shot of the CPU sampler in VisualVM. 58.7% of the CPU time go to the LZ4 decompression routine. That’s quite a lot and makes a huge difference for search. If anyone has a workaround of sort, I’d be happy 🙂

Update (2013-07-03): With the great help of the people from the lucene-user list I found at least a speed-up. In the current SVN version, there is a nove LireCustomCodec for stored fields, which speeds up decompression a lot. Moreover there is now an in-memory caching approach implemented in the GeneriecFastImageSearcher class, which is turned off by default, but speeds up search time (as a trade off for memory and init time) by holding image features in-memory. It has been tested with up to 1.5M images.

Search through a million images in less than a second

We reached a milestone here! Sebastian Kielmann reported that he used LIRe to index a million images. While that’s actually no problem he further managed to search through the images in < 1 second! Whoot!

Sebastian used the metric index, which implements the ideas of Giuseppe Amato. The approach is easy, but works out really fine. Currently CEDD is the standard descriptor, but others are integrated easily.  However, using the metric index is not trivial and requires some knowledge on the process. Also the results are approximate and might differ from the results obtained by linear search.