While writing a scientific paper on tag recommendation I checked – just out of curiosity – the share of images tagged by their uploaders on Flickr. I found out that 4 out of five images are untagged and that less than 15% of images have 2 or more tags.
My method and detailed results: In general one would need a random sample for such an investigation, but a truly random sample is hard to obtain without access to the data base. Therefore I just grabbed 20,004 images from the RSS feed for recent uploads and counted the number of tagged images. Easy enough I also computed the confidence interval:
- In my sample 3,650 images were tagged with at least one tag, that makes p1=18.25%
- With alpha=0.99 p1 is in [16.84, 19.66].
- That leaves more than 4 out of 5 images untagged.
- Also in my sample 2,628 images were tagged with at least two tags, that makes p2=13,14%
- With alpha=0.99 p2 is in [11.9, 14.37].
- That means that less than 15% of the images images have more than one tag.
- Ajax mosaic builder – The blog entry describing the Ajax way of mosaic creation
- Lire Demo – The offline way for image mosaic creation (also featuring colored images and formats different from BMP)
You might all know del.icio.us, the social bookmarking service. As I use this a lot and also did some research in this direction recently (see e.g. here) I wanted to try out more 🙂 Within my preparations for the Multimedia Information Systems course this semester I checked in how far LSA (latent semantic analysis) can be applied to tags and made also a small demo application. The demo fetches the RSS feed from a given user name (leave the name field blank for random names, separate multiple names by commas) and computes a co-occurrence matrix after a latent semantic analysis. Note that only the last ~30 entries are in the feed. You might then select a tag from the combo box and find the related ones.
The tool can be accessed via Java Web Start here. Drop me a line whether you like it or not.
While Adobe Flash (former Macromedia Flash) managed to take the top position of WWW based tools for watching video it was stuck with these flv video files for a long time. With version 9 now H.264 (or AVC as it is called in context of MPEG-4) encoded videos are supported along with popular container formats like mp4, mov or 3gp. Or in short words: Flash can now play iPod and PSP videos.
Now for the meat: How to get your AVC files into a web page?
- Encode your video file. Use MediaCoder or any other tool to create an iPod or PSP compatible video. Take care that you select AVC for your target codec. It’s called MPEG-4 AVC in the PSP as well as in the iPod extension of Mediacoder for example (see screenshot).
- Download the FLV Mediaplayer and unzip it somewhere.
- Create a website where your video should be played and move the FLV Mediaplayer along with associated files and the video in a subdirectory of the page’s directory.
- Use the setup wizard to create the code for embedding your player.
- Select “Mediaplayer with a single FLV”
- Source: [relative subdirectory]/mediaplayer.flv
- Fill in width & height (you chose it in your encoding tool)
- Put down the path to the .mp4 file
- Copy the resulting code to your web page and test it in a browser
I already blogged about my presentation at the barcamp, but here is a followup report on the first day: The first glance at the registration desk showed that this was organized perfectly. The welcome session started well scheduled and was moderated by Georg Holzer. Right after that Monika Meurer talked about running a blog for a living. She and Achim Meurer had a whole lot of knowledge to share on this topic (see for instance here and here). After Monika’s talk I did my own presentation and went to lunch.
The afternoon started for me with the WikiQuiz presentation of Andreas Augustin. This is a collaborative game supporting user generated as well as semi-automatically generated questions for a Trivial Pursuit style game. After that I talked to a whole lot of people while drinking coffee and missed some sessions. I got back onto track right time for seeing Zemanta, a machine learning based tool for blog post link, image and tag recommendations. The tool is impressive, but also the startup story was fascinating. The Zemanta guys talked about Seedcamps and venture capitalists, things & concepts far away from my current environment 🙂
After that some more discussion followed and the last two tracks were the Flex Intro from Trinitec and the Lightning Talk session, where each presenter only had 3 minutes to perform.
Today the barcamp senza confini started. The organization was a charm, everything went perfect (kudos go to Georg Holzer and the whole team). The first day was great and I heard of a lot of interesting ideas and talked to a lot of interesting people. For myself I chose a short slot and I did the presentation on “Popularity, Power Laws and Interestingness” mentioned in the last post. Although a bit technical I hope I managed to describe some basic concepts as well as my idea. However for later review, here are my slides:
Barcamp in Klagenfurt takes place on Feb. 2nd-3rd
8th 2008. That means to me: Time to think about possible topics. As I had some lively discussions about Wikipedia lately and my last Barcamp talk was about wisdom of the crowds I’ve got several ideas in this direction:
- Who’s right? Case James Surowiecki vs. Scott Adams … While Surowiecki wrote a book about the “Wisdom of the Crowds” Scott Adams said: “You cannot underestimate general stupidity“.
- Wisdom of the Krauts: Is the German Wikipedia better? Some always say the Germans are better at generating truthful content. This would be a shoot-out 🙂
- My Internet is always right or “The Citing of the Herds” … Why you should should not rely on the internet as single information source. Unlike the intuitive bet this is not (only) aimed at Wikipedia, but includes also Google Scholar, Springerlink, ACM and Computer.org.
Searching for statistics on video usage on the internet I found rather interesting stats for web based video usage at comscore. In their latest press release focusing on video stats they state that in the U.S. 28.3 % of watched videos in September 2007 are delivered by Google sites (mainly YouTube).
Some other facts from the press release: In September 2007 9 billions of videos were watched by internet users in the U.S. 3 of 4 U.S. internet users watched at least 1 online video and they watched 181 minutes in average.
While this is surely interesting I’d like to see some figures from other areas and countries too.
Google announced lately that their messenger built into the GMail web interface supports AIM. No words of ICQ there, but be assured it works fine 🙂 As AIM is somewhat related to ICQ (I’ll spare you the details) you can just use your ICQ # and password and it works fine. Meebo won’t like this as half of its functions are now integrated in GMail. Have fun chatting!
Update: How to use the new feature? Use Settings -> Chat “Sign into a different AIM account” or the “Options” combo box in the chat window (see also screenshot, the AIM icon denotes an AIM contact).
With the emergence of large scale social network communities such as flickr, myspace and youtube, we are witnessing media use and production on an unprecedented scale. The purpose of this special issue is to address the technical challenges that emerge through the use of media in large user communities. Communities who use media as part of a network can impact content analysis (e.g. detection of emergent semantics), multimedia systems (e.g. network optimization due to knowledge of relationships among people) and application research (e.g. novel group authoring). Ubiquitous use of multimedia can also impact the way communities form. We believe that a systematic analysis of community-generated media will reveal new insights about how people interact – social dynamics, the evolution of topics and trends, groups and communities. We believe that the research can reveal new synergies between multimedia content, systems and application research areas and computational social analysis. The focus of this special issue shall be novel computational aspects of shared media among multiple people