While I know that the performance did not skyrocket with Lucene 4.0 I finally came around to find out why. Unfortunately the field compression technique applied in Lucene 4.x compresses each and every stored field … and decompresses it upon access. This makes up for a nice overhead when reading the index in a linear way, which is excactly one of the main methods of LIRE.
The image shows a screen shot of the CPU sampler in VisualVM. 58.7% of the CPU time go to the LZ4 decompression routine. That’s quite a lot and makes a huge difference for search. If anyone has a workaround of sort, I’d be happy
Update (2013-07-03): With the great help of the people from the lucene-user list I found at least a speed-up. In the current SVN version, there is a nove LireCustomCodec for stored fields, which speeds up decompression a lot. Moreover there is now an in-memory caching approach implemented in the GeneriecFastImageSearcher class, which is turned off by default, but speeds up search time (as a trade off for memory and init time) by holding image features in-memory. It has been tested with up to 1.5M images.
I just submitted my code to the SVN and created a download for Lire 0.9.3_alpha. This version features support for Lucene 4.0, which changed quite a bit in its API. I did not have the time to test the Lucene 3.6 version against the new one, so I actually don’t know which one is faster. I hope the new one, but I fear the old one 😉
This is a pre-release for Lire for Lucene 4.0
Global features (like CEDD, FCTH, ColorLayout, AutoColorCorrelogram and alike) have been tested and considered working. Filters, like the ReRankFilter and the LSAFilter also work. The image shows a search for 10 images with ColorLayout and the results of re-ranking the result list with (i) CEDD and (ii) LSA. Visual words (local features), metric indexes and hashing have not been touched yet, beside making it compile, so I strongly recommend not to use them. However, due to a new weighting approach I assume that the visual word implementation based on Lucene 4.0 will — as soon as it is done — be much better in terms for retrieval performance.
I just checked in my latest code for LIRe and it looks like it’s nearly v0.8 release ready. Major changes include the use of Lucene 3.0.1, some bug fixes on descritors, several new test files (including one that shows how to do an LSA with image features) and of course an updated demo application. While everything needs a bit more testing as well as an documentation update, I can offer a pre-compiled demo here. All changed and added sources can be found in the SVN.