Tag Archives: cbir

LIRE Use Case “What Anime is this?”

screenshot-animeI do not often hear of application built with LIRE, however if I do I really appreciate it. The use case of What Anime is this? is exceptional in many ways. First of all LIRE was very well applied and can really solve a problem there, and second, Soruly Ho tuned it to search through over 360 million images on a single server with incredibly reasonable response time.

The web page built by Soruly Ho provides a search interface to (re-)find frames in Anime videos. Not being into Anime myself I still know it’s a hand drawn or computer animation and it’s hugely popular among fans … and there are a lot of them.

Soruly Ho was so nice to compile background information on his project:

Thanks to the LIRE Solr Integration Project, I was able to develop the first prototype just 12 hours after I met LIRE, without touching a line of the source code! After setting up the web server and Solr, I just have to write a few scripts to put all the pieces together. To analyze the video, I use ffmpeg to extract each frame as a jpg file with the timecode as the file name. Then, the ParallelSolrIndexer analyze all these images and generate an XML file. Before loading this XML into Solr, I use a Python script to put the video path and timecode to the title field. Finally, I write a few lines of Javascript to use Solr REST API to submit the image URL to the LireRequestHandler. After some magic, it would return a list of matching images sorted by similarity, with the original video path and timecode in the title field. The idea is pretty simple. Every developer can build this.

But scaling is challenging. There are over 15,000 hours of video indexed in my search engine. Assume they are all 24 fps, there would be 1.3 billion frames in total. This is too big to fit in my server (which is just a high-end PC). Video always play forward in time, so I use a running window to remove duplicate frames. Unlike real life video, most anime are actually drawn in 12 fps or less, this method significantly reduces number of frames by 70%. Out of many feature classes supported by LIRE, I only use the Color Layout Descriptor and drop others to save space, memory and computation time for analysis. Now, each analyzed frame in my Solr index only occupies 197 Bytes. Still, solely relying on one image descriptor already achieves very high accuracy. Even after such optimization, the remaining 366 million frames are still too much that the query would often timeout. So I studied and modified a little bit of the LireRequestHandler. (It is great that LIRE is free and open source!) Instead of using the performance-killing BooleanClause.Occur.SHOULD, I search the hashes with BooleanClause.Occur.MUST one by one until a good match is found. I am only interested to images with similarity > 90%, i.e. there is at least one common hash if I select 10 out of 100 hash values at random. The search would complete in at most 10 iterations, otherwise, I assume there is no match. But random is not good because results are inconsistent, thus, cannot be cached. So I ran an analysis on the hash distribution, and always start searching from the least populated hash. So, similarity calculation is performed on a smaller set of images. The Color Layout Descriptor does not produce an evenly distributed hash on anime. Least populated hash matches only a few frames while most populated hash matches over 277 million frames. The last performance issue is keeping a 67.5GB index with just 32GB RAM, which I think can be solved with just more RAM.

The actual source I have modified and my hash distribution table, can be found on Github.

You can try What Anime is this? yourself at https://whatanime.ga/. Thanks to Soruly Ho for sharing his thoughts and building this great search engine!

LIRE 1.0b2 released

LireLogo_small

Today, the first official beta version of LIRE 1.0 has been released. After loads of internal tests we decided to pin it down to quasi-stable. There are loads of new features compared to 0.9.5 including metric spaces indexing, DocValues based searching, the SIMPLE approach for localizing global descriptors, new global ones, like CENTRIST, a lot of performance tweaks, and so on.

For those of you using the nightly build or the repository version not much changed, everyone else might check out the new APIs and possibilities staring from the SimpleApplication collection moving over to the LIRE documentation.

You will find the pre-compiled binaries on the download page, hosted by ITEC / Klagenfurt University.

 

 

LireDemo 0.9.4 beta released

The current LireDemo 0.9.4 beta release features a new indexing routine, which is much faster than the old one. It’s based on the producer-consumer principle and makes — hopefully — optimal use of I/O and up to 8 cores of a system. Moreover, the new PHOG feature implementation is included and you can give it a try. Furthermore JCD, FCTH and CEDD got a more compact representation of their descriptors and use much less storage space now. Several small changes include parameter tuning on several descriptors and so on. All the changes have been documented in the CHANGES.txt file in the SVN.

Links

Serialization of LIRE global features updated

In the current SVN version three global features have been re-visited in terms of serialization. This was necessary as the index of the web demo with 300k images already exceed 1.5 GB.

Feature prior now
EdgeHistogram 320 bytes 40 bytes
JCD 1344 bytes 84 bytes
ColorLayout 14336 bytes 504 bytes

This significant reduction in space leads to (i) smaller indexes, (ii) reduced I/O time, and (iii) therefore, to faster search.

How was this done? Basically it’s clever organization of bytes. In the case of JCD the histogram has 168 entries, each in [0,127], so basically half a byte.Therefore, you can stuff 2 of these values into one byte, but you have to take care of the fact, that Java only supports bit-wise operations on ints and bytes are signed. So the trick is to create an integer in [0, 2^8-1] and then subtract 128 to get it into byte range. The inverse is done for reading. The rest is common bit shifting.

The code can be seen either in the JCD.java file in the SVN, or in the snippet at pastebin.com for your convenience.

Canny Edge Detector in Java

With the implementation of the PHOG descriptor I came around the situation that no well-performing Canny Edge Detector in pure Java was available. “Pure” in my case means, that it just takes a Java BufferedImage instance and computes the edges. Therefore, I had to implement my own :)

As a result there is now a “simple implementation” available as part of LIRE. It takes a BufferedImage and returns another BufferedImage, which contains all the edges as black pixels, while the non-edges are white. Thresholds can be changed and the blurring filter using for preprocessing can be changed in code. Usage is dead simple:

BufferedImage in = ImageIO.read(new File("testdata/wang-1000/128.jpg")); 
CannyEdgeDetector ced = new CannyEdgeDetector(in, 40, 80); 
ImageIO.write(ced.filter(), "png", new File("out.png"));

The result is the picture below:

cannyedgeLinks

 

Pyramid Histogram of Oriented Gradients (PHOG) implemented in LIRE

Yesterday I checked in the latest LIRE revision featuring the PHOG descriptor. I basically goes along image edge lines (using the Canny Edge Detector) and makes a fuzzy histogram of gradient directions. Furthermore it does that on different pyramid levels, meaning that the image is split up like a quad-tree and all sub-images get their histogram. All histograms of levels & sub-images are concatenated and used for retrieval. First tests on the SIMPLIcity data set have shown that the current configuration of PHOG included in LIRE outperforms the EdgeHistogram descriptor.

You can find the latest version of LIRE in the SVN & in the nightly builds.

Links

  • A. Bosch, A. Zisserman & X. Munoz. 2007. Representing shape with a spatial pyramid kernel. In Proceedings of CIVR ’07 — [DOI] [PDF]

Large image data sets with LIRE – some new numbers

People lately asked whether LIRE can do more than linear search and I always answered: Yes, it should … but you know I never tried. But: Finally I came around to index the MIR-FLICKR data set and some of my Flickr-crawled photos and ended up with an index of 1,443,613 images. I used CEDD as main feature and a hashing algorithm to put multiple hashes per images into Lucene — to be interpreted as words. By tuning similarity, employing a Boolean query, and adding a re-rank step I ended up with a pretty decent approximate retrieval scheme, which is much faster and does not loose too many images on the way, which means the method has an acceptable recall. The image below shows the numbers along with a sample query. Linear search took more than a minute, while the hashing based approach did (nearly) the same thing in less than a second. Note that this is just a sequential, straight forward approach, so no optimization has been done to the performance. Also the hashing approach has not yet been investigated in detail, i.e. there are some parameters that still need some tuning … but let’s say it’s a step into the right direction.

Results-CEDD-Hashing

Updates on LIRE (SVN rev 39)

LIRE is not a sleeping beauty, so there’s something going on in the SVN. I recently checked in updates on Lucene (now 4.2) and Commons Math (now 3.1.1). Also I removed some deprecation things still left from Lucene 3.x.

Most notable addition however is the Extractor / Indexor class pair. They are command line applications that allow to extract global image features from images, put them into an intermediate data file and then — with the help of Indexor — write them to an index. All images are referenced relatively to the intermediate data file, so this approach can be used to preprocess a whole lot of images from different computers on a network file system. Extractor also uses a file list of images as input (one image per line) and can be therefore easily run in parallel. Just split your global file list to n smaller, non overlapping ones and run n Extractor instances. As the extraction part is the slow one, this should allow for a significant speed-up if used in parallel.

Extractor is run with

$> Extractor -i <infile> -o <outfile> -c <configfile>
  • <infile> gives the images, one per line. Use “dir /s /b *.jpg > list.txt” to create a compatible list on Windows.
  • <outfile> gives the location and name of the intermediate data file. Note: It has to be in a folder parent to all images!
  • <configfile> gives the list of features as a Java Properties file. The supported features are listed below the post. The properties file looks like:
    feature.1=net.semanticmetadata.lire.imageanalysis.CEDD
    feature.2=net.semanticmetadata.lire.imageanalysis.FCTH

Indexor is run with

Indexor -i <input-file> -l <index-directory>
  • <input-file> is the output file of Extractor, the intermediate data file.
  • <index-directory> is the directory of the index the images will be added (appended, not overwritten)

Features supported by Extractor:

  • net.semanticmetadata.lire.imageanalysis.CEDD
  • net.semanticmetadata.lire.imageanalysis.FCTH
  • net.semanticmetadata.lire.imageanalysis.OpponentHistogram
  • net.semanticmetadata.lire.imageanalysis.JointHistogram
  • net.semanticmetadata.lire.imageanalysis.AutoColorCorrelogram
  • net.semanticmetadata.lire.imageanalysis.ColorLayout
  • net.semanticmetadata.lire.imageanalysis.EdgeHistogram
  • net.semanticmetadata.lire.imageanalysis.Gabor
  • net.semanticmetadata.lire.imageanalysis.JCD
  • net.semanticmetadata.lire.imageanalysis.JpegCoefficientHistogram
  • net.semanticmetadata.lire.imageanalysis.ScalableColor
  • net.semanticmetadata.lire.imageanalysis.SimpleColorHistogram
  • net.semanticmetadata.lire.imageanalysis.Tamura

Lire 0.9.3 released

I just uploaded Lire 0.9.3 to the all new Google Code page. This is the first version with full support for Lucene 4.0. Run time and memory performance are comparable to the version using Lucene 3.6. I’ve made several improvements in terms of speed and memory consumption along the way, mostly within the CEDD feature. Also I’ve added two new features:

  • JointHistogram – a 64 bit RGB color histogram joined with pixel rank in the 8-neighborhood, normalized with max-norm, quantized to [0,127], and JSD for a distance function
  • Opponent Histogram – a 64 bit histogram utilizing the opponent color space, normalized with max-norm, quantized to [0,127], and JSD for a distance function

Both features are fast in extraction (the second one naturally being faster as it does not investigate the neighborhood) and yield nice, visually very similar results in search. See also the image below showing 4 queries, each with the new features. The first one of a pair is always based on JointHistogram, the second is based on the OpponentHistogram (click ko see full size).

Samples-JointHistogram

 

I also changed the Histogram interface to double[] as the double type is so much faster than float in 64 bit Oracle Java 7 VM. Major bug fix was in the JSD dissimilarity function. So many histograms now turned to use JSD instead of L1, depending on whether they performed better in the SIMPLIcity data set (see TestWang.java in the sources).

Final addition is the Lire-SimpleApplication, which provides two classes for indexing and search with CEDD, ready to compile with all libraries and an Ant build file. This may — hopefully — help those that still seek Java enlightenment 😀

Finally this just leaves to say to all of you: Merry Christmas and a Happy New Year!

News on LIRE performance

In the course of finishing the book, I reviewed several aspects of the LIRE code and came across some bugs, including one with the Jensen-Shannon divergence. This dissimilarity measure has never been used actively in any features as it didn’t work out in retrieval evaluation the way it was meant to. After two hours staring at the code the realization finally came. In Java the short if statement, “x ? y : z” is overruled by almost any operator including ‘+’. Hence,

System.out.print(true ? 1: 0 + 1) prints '1',

while

System.out.print((true ? 1: 0) + 1) prints '2'

With this problem identified I was finally able to fix the implementation of the Jensen-Shannon divergence implementation and came to new retrieval evaluation results on the SIMPLIcity data set:

map p@10 error rate
Color Histogram – JSD 0,450 0,704 0,191
Joint Histogram – JSD 0,453 0,691 0,196
Color Correlogram 0,475 0,725 0,171
Color Layout 0,439 0,610 0,309
Edge Histogram 0,333 0,500 0,401
CEDD 0,506 0,710 0,178
JCD 0,510 0,719 0,177
FCTH 0,499 0,703 0,209

Note that the color histogram in the first row now performs similarly to the “good” descriptors in terms of precision at ten and error rate. Also note that a new feature creeped in: Joint Histogram. This is a histogram combining pixel rank and RGB-64 color.

All the new stuff can be found in SVN and in the nightly builds (starting tomorrow :)