Lire - Lucene Image REtrieval
The LIRE (Lucene Image REtrieval) library a simple way to create a Lucene index of image features for content based image retrieval (CBIR). There is no complete list of features, but these are some of them:
- ScalableColor, ColorLayout and EdgeHistogram MPEG-7
- CEDD and FCTH (contributed by Savvas Chatzichristofis)
- Color histograms (HSV and RGB), Tamura & Gabor, auto color correlogram, JPEG coefficient histogram (common global descriptors)
- Visual words based on SIFT and SURF
- Visual words based on SIMPLE
- Approximate fast search based on hashing and metric indexing.
Furthermore, methods for searching the index based on Lucene are provided.
The LIRE library started out as part of the Caliph & Emir project and aimed to provide the CBIR features of Caliph & Emir to other Java projects in an easy and light weight way. In the meantime it has turned out as big and interesting project itself.
- How to create an index with Lire?
- How to search through the index with Lire?
- Frequently Asked Questions
I recommend to start with taking a look at the SimpleApplication package of LIRE, which covers the most needed stuff including indexing, search and extraction of image features for use in other applications. It's also a good idea to work on the current SVN version of LIRE: How to check out and set up LIRE in the IDEA IDE.
Note at this point that LIRE comes with Apache Ant build files, named build.xml. You can use the tasks to create the jar from the source code as soon as you have Ant installed, or you are using an IDE prepared for that, like IDEA, Eclipse or NetBeans. Apache Ant can be found at the Apache Ant Project Page
If you are searching for the Solr plugin of LIRE ... it's still under construction. Some global features are working fine and its based on Solr 4.10.2. It can be found at BitBucket. It has been reported working on distributed installations.
How does Lire actually work?
Lire employs global image features for content based image retrieval. For more information on the underlying methods and techniques you should consult the basic literature on content based images retrieval:
- Visual Information Retrieval using Java and LIRE (Lux & Marques, 2013)
- Image Retrieval: Ideas, Influences, and Trends of the New Age (Datta et al., 2008)
- Content-Based Image Retrieval at the End of the Early Years (Smeulders et al., 2000)
Further it uses the Java search engine Lucene to provide
- linear search (opening each and every indexed document and comparing it to the query feature)
- approximate indexing based on metric spaces based on the work of G. Amato
- approximate indexing based on locality sensitive hashing
Parallel indexing with the ParallelIndexer running with 8 threads on a AMD A10 with 4 cores and 4.4 GHz, Windows 7 64 bits extracting 7 features at once including hashing is down to ~180 ms per image. On a Intel Core i7, ie. the 4770K, it runs a lot faster, using an SSD then speeds up the process even more. Extracting single features with the ParallelIndexer is on a core i7 typically faster than 1 MP images can be read from a (magnetic) hard disk.
Search is a matter of index size and is down to a few ms for 100k and less images, and increases linearly.