Intelligent algorithms for data mining and information retrieval are the key technology to cope with the information need challenges in our media-centered society. Methods for text-based information retrieval receive special attention, which results from the important role of written text, from the high availability of the World Wide Web, and from the enormous impact of Web communities and social media on our life.
The development of advanced information retrieval solutions requires the understanding and the combination of methods from different research areas, including machine learning, data mining, computer linguistics, artificial intelligence, user interaction and modeling, Web engineering, or distributed systems. This workshop provides a common platform for presenting and discussing new solutions, novel ideas, or specific tools focusing on text-based information retrieval. The following list organizes classic and ongoing topics for which contributions are welcome, but are not limited to:
Theory. Retrieval models, language models, similarity measures, formal analysis
Web Search. Ranking, indexing, semantic search, query classification and segmentation, relevance feedback, vertical search
Personalization and User Mining. Just-in-time retrieval, personalized retrieval, context detection, profile mining
Multilinguality. Cross-language retrieval, machine translation, language identification
Evaluation. Corpus construction, experiment design, performance measures
Text Mining and Classification. Web mining, text reuse, topic identification, sentiment analysis
NLP. Information extraction, text summarization and simplification, named entity recognition, question answering
Social Media Analysis. Community mining, social network analysis, trend analysis, information diffusion
Information Quality. Text quality assessment, quality-based ranking, readability assessment, trust and author reputation
Big Data Text Analytics. Parallel and distributed retrieval, online algorithms, scalability
Semantic Web. Meta data analysis and tagging, knowledge extraction, inference, maintenance
The workshop is held for the eleventh time. In the past, it was characterized by a stimulating atmosphere, and it attracted high quality contributions from all over the world.
Accepted papers will appear in the proceedings of DEXA’14 Workshops published by the Conference Publishing Services (CPS) of IEEE Computer Society.
Submissions to TIR 2014 must be original, unpublished contributions.
Papers are limited to 5 pages in IEEE format (two columns, A4) and must be written in English.
Submission is made electronically in PDF format using our conference management systemConfDriver.
Submitted papers will be peer-reviewed by at least three experts from the related field.
At least one author of each accepted paper is required to register for the DEXA’14 conference, attend the workshop, and present the paper.
April 24, 2014: Deadline for paper submission (24:00 CET)
May 12, 2014: Notification to authors
May 20, 2014: Camera-ready copy due
September 1 – 5, 2014: DEXA’14 conference
Maik Anderka (Co-Chair), University of Paderborn, Germany
Michael Granitzer (Co-Chair), University of Passau, Germany
As an integral part of the ACM MMSys conference since 2011, the Dataset Track provides an opportunity for researchers and practitioners to make their work available (and citable) to the multimedia community. MMSys encourages and recognizes dataset sharing, and seeks contributions in all areas of multimedia (not limited to MM systems). Authors publishing datasets will benefit by increasing the public awareness of their effort in collecting the datasets.
Submission deadline is Nov. 11th 2013! Make sure not to miss it! See also the Call for Papers
The Solr plugin itself is fully functional for Solr 4.4 and the source is available at https://bitbucket.org/dermotte/liresolr. There is a markdown document README.md explaining what can be done with plugin and how to actually install it. Basically it can do content based search, content based re-ranking of text searches and brings along a custom field implementation & sub linear search based on hashing.
The new LIRE web demo is based on Apache Solr and features and index of the MIRFLICKR data set. The new architecture allows for extremely fast retrieval. Moreover, there’s a new walk through video with some short peeks behind the screen. The source of the plugin will be released in the near future.
The beta update features (i) improvements on local feature handling. i.e. stronger quantization of local feature histograms and several bug fixes, (ii) critical bug fixes for CEDD and JCD, which were not thread safe, and (iii) improvements on the ParallelExtractor and Indexor classes as well as the intermediate binary format.
The ACM Multimedia Systems conference (http://www.mmsys.org) provides a forum for researchers, engineers, and scientists to present and share their latest research findings in multimedia systems. While research about specific aspects of multimedia systems is regularly published in the various proceedings and transactions of the networking, operating system, real-time system, and database communities, MMSys aims to cut across these domains in the context of multimedia data types. This provides a unique opportunity to view the intersections and interplay of the various approaches and solutions developed across these domains to deal with multimedia data types. Furthermore, MMSys provides an avenue for communicating research that addresses multimedia systems holistically.
As an integral part of the conference since 2011 2012, the Dataset Track provides an opportunity for researchers and practitioners to make their work available (and citable) to the multimedia community. MMSys encourages and recognizes dataset sharing, and seeks contributions in all areas of multimedia (not limited to MM systems). Authors publishing datasets will benefit by increasing the public awareness of their effort in collecting the datasets.
In particular, authors of datasets accepted for publication will receive:
Dataset hosting from MMSys for at least 5 years
Citable publication of the dataset description in the proceedings published by ACM
15 minutes oral presentation time at the MMSys 2014 Dataset Track
All submissions will be peer-reviewed by at least two members of the technical program committee of the MMSys 2014. Datasets will be evaluated by the committee on the basis of the collection methodology and the value of the dataset as a resource for the research community.
Authors interested in submitting a dataset should
(A) Make their data available by providing a public URL for download
(B) Write a short paper describing:
motivation for data collection and intended use of the data set,
the format of the data collected,
the methodology used to collect the dataset, and
basic characterizing statistics from the dataset.
Papers should be at most 6 pages long (in PDF format) prepared in the ACM style and written in English.
Data set paper submission deadline: November 11, 2013
While I know that the performance did not skyrocket with Lucene 4.0 I finally came around to find out why. Unfortunately the field compression technique applied in Lucene 4.x compresses each and every stored field … and decompresses it upon access. This makes up for a nice overhead when reading the index in a linear way, which is excactly one of the main methods of LIRE.
The image shows a screen shot of the CPU sampler in VisualVM. 58.7% of the CPU time go to the LZ4 decompression routine. That’s quite a lot and makes a huge difference for search. If anyone has a workaround of sort, I’d be happy
Update (2013-07-03): With the great help of the people from the lucene-user list I found at least a speed-up. In the current SVN version, there is a nove LireCustomCodec for stored fields, which speeds up decompression a lot. Moreover there is now an in-memory caching approach implemented in the GeneriecFastImageSearcher class, which is turned off by default, but speeds up search time (as a trade off for memory and init time) by holding image features in-memory. It has been tested with up to 1.5M images.
The ACM Multimedia Open-Source Software Competition celebrates the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year will be the sixth year in running the competition as part of the ACM Multimedia program.
To qualify, software must be provided with source code and licensed in such a manner that it can be used free of charge in academic and research settings. For the competition, the software will be built from the sources. All source code, license, installation instructions and other documentation must be available on a public web page. Dependencies on non-open source third-party software are discouraged (with the exception of operating systems and commonly found commercial packages available free of charge). To encourage more diverse participation, previous years’ non-winning entries are welcome to re-submit for the 2013 competition. Student-led efforts are particularly encouraged.
Authors are highly encouraged to prepare as much documentation as possible, including examples of how the provided software might be used, download statistics or other public usage information, etc. Entries will be peer-reviewed to select entries for inclusion in the conference program as well as an overall winning entry, to be recognized formally at ACM Multimedia 2013. The criteria for judging all submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, student-led, no dependence on closed source, etc.).
Authors of the winning entry, and possibly additional selected entries, will be invited to demonstrate their software as part of the conference program. In addition, accepted overview papers will be included in the conference proceedings.
The LIRE web demo now includes an RGB color histogram as well as the MPEG-7 edge histogram implementation. The color histogram works well for instance for line art, such as this query.The edge histogram works fine for clear, gloabl edge distributions like queries such as this one. However, it’s performing different from PHOG. An example for the difference is this PHOG query compared to the according edge histogram query. The image below shows both queries.