I just uploaded Lire 0.9.3 to the all new Google Code page. This is the first version with full support for Lucene 4.0. Run time and memory performance are comparable to the version using Lucene 3.6. I’ve made several improvements in terms of speed and memory consumption along the way, mostly within the CEDD feature. Also I’ve added two new features:
JointHistogram – a 64 bit RGB color histogram joined with pixel rank in the 8-neighborhood, normalized with max-norm, quantized to [0,127], and JSD for a distance function
Opponent Histogram – a 64 bit histogram utilizing the opponent color space, normalized with max-norm, quantized to [0,127], and JSD for a distance function
Both features are fast in extraction (the second one naturally being faster as it does not investigate the neighborhood) and yield nice, visually very similar results in search. See also the image below showing 4 queries, each with the new features. The first one of a pair is always based on JointHistogram, the second is based on the OpponentHistogram (click ko see full size).
I also changed the Histogram interface to double as the double type is so much faster than float in 64 bit Oracle Java 7 VM. Major bug fix was in the JSD dissimilarity function. So many histograms now turned to use JSD instead of L1, depending on whether they performed better in the SIMPLIcity data set (see TestWang.java in the sources).
Final addition is the Lire-SimpleApplication, which provides two classes for indexing and search with CEDD, ready to compile with all libraries and an Ant build file. This may — hopefully — help those that still seek Java enlightenment 😀
Finally this just leaves to say to all of you: Merry Christmas and a Happy New Year!
Apache Commons has a nice sub project called Sanselan. It’s a pure Java image library for reading and writing images from and to PNG, PSD (partially), GIF, BMP, ICO, TGA, JPEG and TIFF. It also supports EXIF, IPTC and XMP metadata formats, read for all, write for some. Examples for reading and writing images, EXIF, guessing image formats etc. are provided in the source package. Currently Sanselan is available in version 0.9.7 and the release date of this version seems to be in 2009. I’m not sure if this counts as abandoned project, but it definitely doesn’t count as alive 🙂
While writing a scientific paper on tag recommendation I checked – just out of curiosity – the share of images tagged by their uploaders on Flickr. I found out that 4 out of five images are untagged and that less than 15% of images have 2 or more tags.
My method and detailed results: In general one would need a random sample for such an investigation, but a truly random sample is hard to obtain without access to the data base. Therefore I just grabbed 20,004 images from the RSS feed for recent uploads and counted the number of tagged images. Easy enough I also computed the confidence interval:
In my sample 3,650 images were tagged with at least one tag, that makes p1=18.25%
With alpha=0.99 p1 is in [16.84, 19.66].
That leaves more than 4 out of 5 images untagged.
Also in my sample 2,628 images were tagged with at least two tags, that makes p2=13,14%
With alpha=0.99 p2 is in [11.9, 14.37].
That means that less than 15% of the images images have more than one tag.
In the morning I was listenting to the talk of Horst Bischof on robust people detection in surveillance scenarios. In my opinion he gave a great talk: He manages to show the nature and results of their research and motivate the usefulness and significance of results based on context and related work. He points out what the achievements are and what hasn’t been touched by his research group and why. He also visualized his results using videos which was appreciated by the audience. If I findout where the videos can be found I’ll blog the link.
While I still have some things to prepare for tomorrow I will visit Horst Bischof’s keynote talk on “Robust Person Detection for Surveillance using Online Learning” and I hope I will find some time to blog about the conference.