Category Archives: Tagging

How to get a lot of photos …

I’m currently testing a new implementation of an approximate search index for content based image retrieval. Especially the performance tests have become interesting as I didn’t have access to a real big data size. So what to do?

Actually I programmed a lot of spiders and grabbers before, so I knew that there is a lot of data available on Flickr 🙂 But I was still searching for an easy way. Now here is my approach (using of course bash):

wget -q -O - http://api.flickr.com/services/feeds/photos_public.gne?format=atom | grep -o .............static.*m.jpg | wget -i -

Why should this work?

  • The first wget command gets a list of recent photos as atom feed.
  • The grep command gets out all the medium sized (suffix “m.jpeg”) pictures
  • The lot of dots and the static are just a nice trick to get the right ones, the real image content.
  • Finally the second wget downloads the images from the server.

Issuing this command one should get ~ 25 photos in one go. Using a bash loop or a cronjob you can get of course a lot more in an unattended way 🙂

Less than 20% of Flickr images tagged …

While writing a scientific paper on tag recommendation I checked – just out of curiosity – the share of images tagged by their uploaders on Flickr. I found out that 4 out of five images are untagged and that less than 15% of images have 2 or more tags.

My method and detailed results: In general one would need a random sample for such an investigation, but a truly random sample is hard to obtain without access to the data base. Therefore I just grabbed 20,004 images from the RSS feed for recent uploads and counted the number of tagged images. Easy enough I also computed the confidence interval:

  • In my sample 3,650 images were tagged with at least one tag, that makes p1=18.25%
    • With alpha=0.99 p1 is in [16.84, 19.66].
    • That leaves more than 4 out of 5 images untagged.
  • Also in my sample 2,628 images were tagged with at least two tags, that makes p2=13,14%
    • With alpha=0.99 p2 is in [11.9, 14.37].
    • That means that less than 15% of the images images have more than one tag.

Del.icio.us Tag Co-Occurrence Demo App

You might all know del.icio.us, the social bookmarking service. As I use this a lot and also did some research in this direction recently (see e.g. here) I wanted to try out more 🙂 Within my preparations for the Multimedia Information Systems course this semester I checked in how far LSA (latent semantic analysis) can be applied to tags and made also a small demo application. The demo fetches the RSS feed from a given user name (leave the name field blank for random names, separate multiple names by commas) and computes a co-occurrence matrix after a latent semantic analysis. Note that only the last ~30 entries are in the feed. You might then select a tag from the combo box and find the related ones.

The tool can be accessed via Java Web Start here. Drop me a line whether you like it or not.

Related links: