Applying old things to new platforms has become common in recent times, here’s my contribution. I recently developed a video summary tool based on FFMPEG and Lire for a friend … just to test if common approaches are usable in a specific domain. Video summarization – especially of small videos – is a rather easy thing. You just need to find a number of frames with maximized pairwise difference, to cover a maximized visual range of the video. I applied my tool on YouTube and got the following summaires for the “hippo bathing” video:
Based on the CEDD descriptor the most important keyframe is really chosen well – just watch the video to know what I mean 🙂
With the auto color correlogram feature the dog is not explicitely part of the picture. However the first frame chosen (the big one) gives a good impression on the “bathing” part.
With the Gabor texture feature the dog gets prominent in the first place. Noite that the result is quite the same as the result kwith the Tamura texture feature not shown here.
With the most simple feature (RGB color histogram with L2 distance) the summary also looks appealing. There is a frame featuring the dog, one showing the whole scene and one for the hippo.
All in all I think the results are quite appealing. The runtime of my implementation is a fraction of actual video play time. Perhaps I’ll find some time to present the whole thing tomorrow at the barcamp 😉
Today I found a compression test on different H.264 (and one MPEG-4 ASP) encoders. They tested the compression to visual quality ratio and found out that the MainConcept encoder (German company, recently acquired by DivX) performs best. The difference to the next rank however is marginal and x264 is second. Deciding based on a compression to price ratio will therefore be very easy 🙂
Have you ever wondered why there is no title or description in your AVI files just like you expect it to be there in your MP3 files? Well AVI does not really support these things. However the Moving Pictures Expert Group has defined the MP4 container format for audiovisual information (which is very much like the MOV container) and did not forget about metadata: They defined a way to put it in there. However as it always comes with standards, where a whole lot of people talk and try to find common ground, there is not one simple way (like ID3), but several complicated ways to choose from. Therefore applications like VLC or Winamp do not support MP4 metadata out of the box. Due to the string relation to the MOV format however iTunes supports MP4 metadata.
Thanks to Markus Waltl I got some links for tools reading and manipulating MP4 metadata. They all have in common that they are rather slow:
AtomicParsley – A command line tool for reading and setting metadata in MPEG-4 files. Has also a good explanation of the MP4 atoms here.
I found it rather late, but it’s still interesting: YouTube brought a round up of the 10 most popular videos in 2007 (actually I only found 9) based on comments, views, reponses, etc. Especially interesting in my opinion is the Battle at Kruger video: The video length is 8:24, so its a rather long video and the actual storyline is rather boring in the beginning. How could this video ever be buzzing? I’d have stopped it at 0:30 at last, so popularity is an interesting concept in my opinion 🙂
However I grabbed the stats and prepared the following visualizations giving you an impression on the popularity of those videos in quite impressive numbers (click to see them full size):
Searching for statistics on video usage on the internet I found rather interesting stats for web based video usage at comscore. In their latest press release focusing on video stats they state that in the U.S. 28.3 % of watched videos in September 2007 are delivered by Google sites (mainly YouTube).
Some other facts from the press release: In September 2007 9 billions of videos were watched by internet users in the U.S. 3 of 4 U.S. internet users watched at least 1 online video and they watched 181 minutes in average.
While this is surely interesting I’d like to see some figures from other areas and countries too.
Checking the proceedings after the ACM Multimedia reminded me of the las slide of Mike Gleicher. His paper was best paper nominee and dealt with post processing of videos to enhance the camera motion quality (straight motion and zoom, remove shaking etc.). He presented notable results and at the end of his talk he gave some answers:
I don’t know.
No, we don’t introduce cuts.
The details are in the paper, send me email if its not clear.
Friends in industry say they can do the camera motion estimation robustly, in real time.
Yes, I would like to go to Oktoberfest Friday.
Our in-painter builds a 4 second mosaic for each frame.
Logarithms and exponents of 3×3 matrices can be computed robustly and efficiently with iterative methods.
Yes, this slide is an old joke –but I haven’t used it in years.
Ref. Michael Gleicher and Feng Liu. Re-Cinematography: Improving the Camera Dynamics of Casual Video. ACM Multimedia 2007, best paper nominee. September 2007. [on his homepage]
Currently listening to the last talk of this day for me I’ll try to put my impressions together to a full image of the first two days of the ACM Multimedia 2007. The location is quite charming: Augsburg is nice and the university here has a lot of nice ‘landscape’ (~green nothingness) around 🙂
The picture to the right shows W. Wahlster doing his keynote yesterday. While it was impressive how many parallel and interconnected research activities can do with such an amount of funding the content of the talk was more like selling an EU IP 😀
Todays first keynote was rather cool: R. Fageth from CeWe talked about their company and gave lots of impressive figures how people submit photos for printing. They for instance receive 4 TB of digital photos via upload for printing each day, while the mean image maximum age of an order is ~80 days.
The overall impression on the conference is: Many good contributions and lots of demos & posters to see. Many – but not all – present novelties or interesting ideas. Also interestingis the mere number of posters/demos/papers dealing with social media. Seems to hit the research 🙂
As already noted in the last post, LireDemo – the Swing based demonstrator of the capabilities of the LIRe library – includes now a mosaic creation option. So what actually is the mosaic? Let’s explain it this way: You have one input image, which should be resented through multiple other images (in the index). The mosaic image tries to look like the input image but replaces segments of the input image with images from the index. The example in above image shows the input image left and the mosaic on the right hand side (click for a larger version). Special thanks got to Lukas Esterle and Manuel Warum, who contributed the mosaic-engine!
So how can one make such an mosaic image: (i) A first step is to select an input image. (ii) Then configure the number of tiles per row and column of the mosaic. (iii) Click “Start” to run the mosaic engine. (iv) After some processing time, which can be rather long depending on the number of tiles and the size of the input image you will see the result. (v) Save using the button on the bottom right corner.
Having tested the YouTube Mobile portal I wanted to get the videos downloaded for my mobile phone. Unfortunately two facts hindered me from doing this: (i) Getting the video from an RTP stream is not that trivial and (ii) YouTube has only parts of its video converted for mobile access. Therefore I needed a way to get YouTube videos to my mobile phone 😀
First of all you need to get the actual video from Youtube. Surf to the video you like and dowÅ„load it using e.g. YouTube Downloader. Do not forget to rename yourvideo to .flv.
Then you will have to convert it to 3gpp. FFMPEG will do the trick. On Linux you might have to compile it yourself (as no AMR and AAC encoder are built in in many precompiled packages). On windows you might use precomiled binaries, e.g. from here.
Then let ffmpeg do the job for you:
Windows: ffmpeg.exe -i .mov -acodec aac -s 352×288 .3gp
Linux: ffmpeg -i .mov -acodec aac -s 352×288 .3gp
This command line shown above encodes the video to H263 with a bitrate of 200 kbps and AAC with a bitrate of 64 kbps. The video needs to follow the constraints of 3gpp and is therefore changed to resolution 352×288. This is the maximum possible with 3gpp. Alternatively one can also use the AMR audio codec. But this one has a rather low quality.
A ‘really strange approach’ for those of you running Linux, but not willing to compile: You can run the precompiled Windows with Wine ;-D