This shows you the differences between two versions of the page.
|
lire:manydocs [2011/10/27 03:57] mlux created |
lire:manydocs [2011/10/27 04:10] (current) mlux |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== How to search in big data sets ====== | ====== How to search in big data sets ====== | ||
| - | Lire has the ability to create indexes with lots of different features (descriptors, like RGB color histograms or CEDD). While this opens the opportunity to flexibility at search time as we can select the feature at the time we create a query, the index tends to get bigger and bigger. | + | Lire has the ability to create indexes with lots of different features (descriptors, like RGB color histograms or CEDD). While this opens the opportunity to flexibility at search time as we can select the feature at the time we create a query, the index tends to get bigger and bigger and searcher take longer and longer. |
| + | |||
| + | ===== Reducing the number of features ===== | ||
| With a data set of 121,379 images the index created with the features selected for default in Lire Demo has a size of 14,3 GB on the disk. In contrast to that an index just storing the CEDD feature along with the image identifier has a size of 29 MB. | With a data set of 121,379 images the index created with the features selected for default in Lire Demo has a size of 14,3 GB on the disk. In contrast to that an index just storing the CEDD feature along with the image identifier has a size of 29 MB. | ||
| Line 14: | Line 16: | ||
| - create the index with a minimum set of features, and | - create the index with a minimum set of features, and | ||
| - eventually split the index per feature and select the index on the fly instead of the feature | - eventually split the index per feature and select the index on the fly instead of the feature | ||
| + | - load the index into RAM (see below) | ||
| + | |||
| + | ===== Loading the index to RAM ===== | ||
| + | If your index is small enough you can load it into the main memory of your computer to further reduce access time. Lucene offers an easy way by the following statement: | ||
| + | |||
| + | <code java> | ||
| + | IndexReader reader = IndexReader.open(new RAMDirectory(FSDirectory.open(new File(indexPath)))); | ||
| + | </code> | ||
| + | |||
| + | Please note that the index has to fit into the physical free main memory. Otherwise the approach will lead to swapping done by your operating system and everything will get even slower. | ||
| + | |||
| + | ===== Using local features ===== | ||
| + | Lire now supports visual word based approaches so if you plan to use SURF or likewise features you might [[lire:bovw|use the bag of visual words approach]]. Search time for the abovementioned index of 121,379 images and a size of 14,3 GB is ~ 0.033 seconds, so even faster than search in the stripped down index. Please note that this is an approach for local features, not global ones. The results will be very different to the results achieved with global features. | ||
| + | |||