Searching with Lire in big datasets

October 27, 2011 on 10:17 am | Tags: , , , , | In General, Java, Software | No Comments

Having received several complaints about the slowness of Lire when searching in 100k+ documents I took my time to write a small how to to explain approaches for search in big (relatively) data sets.

Lire has the ability to create indexes with lots of different features (descriptors, like RGB color histograms or CEDD). While this opens the opportunity to flexibility at search time as we can select the feature at the time we create a query, the index tends to get bigger and bigger and searcher take longer and longer.

With a data set of 121,379 images the index created with the features selected for default in Lire Demo has a size of 14,3 GB on the disk. In contrast to that an index just storing the CEDD feature along with the image identifier has a size of 29 MB.

Due to the size of the index also linear search tends to get slower. While for the index stripped down to the CEDD feature and the identifier searching takes (on a AMD Quad-Core computer with 4GB RAM and Java 1.7) roughly 0.33 seconds, searching the big index takes 7 minutes and 3 seconds.

So if you want to index and search big data sets (> 100.000 images for instance) I recommend to

  • select which features you need,
  • create the index with a minimum set of features, and
  • eventually split the index per feature and select the index on the fly instead of the feature
  • also you can load the index into RAM

For more on loading the index to RAM and the option to use local features read on in the developer wiki.

Lire and Lire Demo v 0.9 released

October 20, 2011 on 12:37 pm | Tags: , , , , , , , | In Dev, General, Java, Multimedia, Software | No Comments

I just released Lire and Lire Demo in version 0.9 on sourceforge.net. Basically it’s the alpha version with additional speed and stability enhancements for bag of visual words (BoVW) indexing. While this has already been possible in earlier versions I re-furbished vocabulary creation (k-means clustering) and indexing to support up to 4 CPU cores. I also integrated a function to add documents to BoVW indexes incrementally. So a list of major changes since Lire 0.8 includes

  • Major speed-up due to change and re-write of indexing strategies for local features
  • Auto color correlation and color histogram features improved
  • Re-ranking filter based on global features and LSA
  • Parallel bag of visual words indexing and search supporting SURF and SIFT including incremental index updates (see also in the wiki)
  • Added functionality to Lire Demo including support for new Lire features and a new result list view

Download and try:

Lire 0.8 released

March 11, 2010 on 4:15 pm | Tags: , , , , , , | In General | 2 Comments

I just released LIRe v0.8. LIRe – Lucene Image Retrieval – is a Java library for easy content based image retrieval. Based on Lucene it doesn’t need a database and works reliable and rather fast. Major change in this version is the support of Lucene 3.0.1, which has a changed API and better performance on some OS. A critical bug was fixed in the Tamura feature implementation. It now definitely performs better :) Hidden in the depths of the code there is an implementation of the approximate fast indexing approach of G. Amato. It copes with the problem of linear search and provides a method for fast approximate retrieval for huge repositories (millions?). Unfortunately I haven’t tested with millions, just with tens thousands, which proves that it works, but it doesn’t show how fast.

Links

NetBeans Community Approves NetBeans 6.7 for Prime Time Release

June 24, 2009 on 9:49 am | Tags: , , | In Development, IDE, Netbeans | No Comments

The NetBeans community acceptance survey has voted for thhe last NetBeans 6.7 RC to be stable enough to be shipped. While this sounds great there is one minor details I consider critical for the significance of the survey: Only 182 people responded. (re-engineneered from 144 people being 79%). If we go with common numbers in empirical research ~5 % of the population take part in survey like these and therefore I conclude that the size of the NetBeans community is around 3.600 people.

NetBeans is actually having quite a a hard time with Eclipse pressing from open source and Idea pressing from commercial alternatives. Also the free IDEs of Microsoft’s .NET family affect the scene. However, I still think that if NetBeans manages to advance from the “I can do all” principle to a small and lean application development environment featuring a fast and intelligent editor and a WYSIWYG gui builder there is definitely a chance.

Links

Visual VM is Part of Java 1.6 Update 7

July 14, 2008 on 1:59 pm | Tags: , | In Development, Java | 2 Comments

visualvm.pngJava 1.6 u7 was released recently by Sun. While not bringing major changes it brought along some bug fixes and solved some security issues. However there is one main addition: The VisualVM. This is a really great developer tool: It connects to running VMs and shows “some statistics” about them. Besides memory usage and threads information it also allows to do some basic profiling. In my opinion Sun did a good job on including VisualVM in the package! Not that this thing is build on the NetBeans Platform ;-)

Links:

Finding duplicate code …

July 2, 2008 on 10:58 am | Tags: , , , , | In Development, Teaching | 1 Comment

I recently found myself in a scenario, where I tried to figure out how implementation clusters have been implicitly created within a group of students. All of them were given a task (with 4 sub tasks) for a whole semester. Everyone was meant to do the task alone, but collaboration was allowed. However I needed to know who helped whom and – of course – who helped whom with source code.

A colleague had a similar problem and he pointed me to PMD CPD (= PMD Copy & Paste Detector) . This tool works lightning fast and has a GUI :) Also its open source -> respect!

Links:

Lire SVN build for Java 1.5

May 30, 2008 on 1:29 pm | Tags: , , , | In CaliphEmir, Dev, Development, Imaging, Java, Lire, LireDemo, Releases | No Comments

Due to requests I took some time and built a Java 1.5 version instead of the 1.6 versions. A simple compile with 1.5 wouldn’t help as I use the swing layout classes of NetBeans (now integrated in Java 1.6), so imports have to be re-adjusted and the library has to be added. Furthermore I created an explicit build target in Caliph to create a 1.5 version of the cbir jar file. This snapshot works fine with MacOS (as far as I’ve heard) and on Windows.

Files:

Lire development: a big next step ..

May 29, 2008 on 9:12 am | Tags: , , , , | In Dev, Development, General, Imaging, Java, Lire, LireDemo, Multimedia, OpenSource, Releases | No Comments

While it has been quiet for some time around Lire, recently development has been pushed forward. I switched to SVN for development and integrated simple RGB color histograms as a feature for comparison with the MPEG-7 features. Savvas Chatzichristofis (or on facebook, his image search engine) contributed the CEDD feature, which works great! Marko Keuschnig and Christian Penz contributed implementations for the Gabor texture feature and the Tamura texture features, where the latter is already in the SVN. I also integrated the new features in LireDemo. A new version – already compiled – can be downloaded here: liredemo-svn-2008-05-29-jdk16.tar.bz2 Note that Java 1.6 is required.

NetBeans 6.1 Released

April 30, 2008 on 12:51 pm | Tags: , , , | In Development, General, Java, Netbeans, Releases | No Comments

The new NetBeans IDE 6.1 has been released 2 days ago. Changes are more incremental than fundamental, but it features now support for JavaScript and code completion for JavaDoc. Furthermore support for MySQL has been added. Release notes can be found here.

Computer Games: Parallax Scrolling & Sprites

April 8, 2008 on 1:33 pm | Tags: , , , , | In Development, Games, General, Java | No Comments

Currently I’m preparing for giving my talk next Friday in the computer games lesson on multimedia issues in games. To underline my words and slides with some code I also coded some easy little Java program visualizing sprite animation and some star field background. The coding was great fun – the third scrolling shooter I coded … always a pleasure :)

However there is one thing I found out while coding: Ready to use sprite animation image stripes are hard to get. There is a little tool called simple explosion maker” that came handy and the SpriteLib of Flying Yogi is rather cool, but I miss the great deal of online creative commons content. Perhaps one could point me there :)

Related links:

Next Page »

© 2004-2010 by Mathias Lux
>> Contents of this page are licensed under the Creative Commons Attribution-Share Alike 3.0 Austria License license <<