In the current SVN version three global features have been re-visited in terms of serialization. This was necessary as the index of the web demo with 300k images already exceed 1.5 GB.
Feature | prior | now |
EdgeHistogram | 320 bytes | 40 bytes |
JCD | 1344 bytes | 84 bytes |
ColorLayout | 14336 bytes | 504 bytes |
This significant reduction in space leads to (i) smaller indexes, (ii) reduced I/O time, and (iii) therefore, to faster search.
How was this done? Basically it’s clever organization of bytes. In the case of JCD the histogram has 168 entries, each in [0,127], so basically half a byte.Therefore, you can stuff 2 of these values into one byte, but you have to take care of the fact, that Java only supports bit-wise operations on ints and bytes are signed. So the trick is to create an integer in [0, 2^8-1] and then subtract 128 to get it into byte range. The inverse is done for reading. The rest is common bit shifting.
The code can be seen either in the JCD.java file in the SVN, or in the snippet at pastebin.com for your convenience.