SLP Header

Interactive and Informative Community Question Answering with Multimedia Support

IJCSEC Front Page

Generally community us to seek information for our queries and doubts. It enables the community members to post questions and answers to that. However, existing community question answering forums usually provide only textual answers, which are not informative enough for many questions. To improve the understanding and provide additional information we propose a novel scheme which allows community members to post the multimedia content i.e., Images and Videos. Unlike several MMQA researches, this scheme deals with more complex questions rather than answering to question with image and video data.
Keywords:Community question answering, Answer medium selection, Multimedia search, Multimedia data selection and presentation.
Question-answering (QA) is a technique to answer question posted in natural language automatically. In keyword-based search systems, it greatly facilitates the communication between humans and computer by naturally stating users' intention in plain sentences. It also avoids tedious browsing of a vast quantity of information contents returned by search engines for the correct answers. However, fully automated question answering still faces challenges that are not easy to tackle, such as the deep understanding of complex questions and the sophisticated syntactic, semantic and contextual processing to generate answers. It is found that, in most cases, automated approach cannot obtain results that are as good as those generated by human intelligence. Community question answering has emerged as a popular alternative to acquire information online, owning to the following facts. First, information seekers are able to post their specific questions on any topic and obtain answers provided by other participants. By leveraging community efforts, they are able to get better answers than simply using search engines. Second, in comparison with automated question answering systems, community question answering usually receives answers with better quality as they are generated based on human intelligence. Third, over times, a tremendous number of question answering pairs have been accumulated in their repositories, and it facilitates the preservation and search of answered questions. Despite their great success, existing community question Answering forums mostly support only textual answers, as shown in Figure 1. Unfortunately, textual answers may not provide sufficient natural and easy-to-grasp information. Figure 1 (a) and (b) illustrate two examples. For the questions "What is Bluetooth and how does it work" and "Do anybody know how to make pizza”, the answers are described by long sentences. Clearly, it will be much better if there are some accompanying videos and images that visually demonstrate the process or the object. Therefore, the textual answers in community question answering can be significantly enhanced by adding multimedia contents, and it will provide answer seekers more comprehensive information and better experience. In fact, users usually post URLs that link to supplementary images or videos in their textual answers. For example, for the questions in Figure 1 (c) and (d), the best answers on Y!A both contain video URLs. It further confirms that multimedia contents are useful in answering several questions. But existing community question answering forums do not provide ad- equate support in using media information. In this paper, we propose a novel scheme which can enrich community-contributed textual answers in community question answering with appropriate media data. Figure 2 shows the schematic illustration of the approach. It contains three main components: (1) Answer Medium Selection, (2) Multimedia Search, (3) Multimedia data selection and presentation. (1) Answer medium selection. Given a question paper pair, it predicts whether the textual answer should be enriched with media information, and which kind of media data should be added. Specifically, we will categorize it into one of the four classes: text, text + image, text + video, and text + image + video. It means that the scheme will automatically collect images, videos, or the combination of images and videos to enrich the original textual answers. (2) Multimedia search. In order to collect multimedia data, we need to generate informative queries. Given a question answering pair, this component extracts three queries. The most informative query will be selected by a three-class classification model. (3) Multimedia data selection and presentation. Based on the generated queries, we vertically collect image and video data with multimedia search engines. We then perform reranking and duplicate removal to obtain a set of accurate and representative images or videos to enrich the textual answers. It is worth mentioning that there already exist several re- search efforts dedicated to automatically answering questions with multimedia data, i.e., the so-called Multimedia Question Answering (MMQA). For example, Yang et al proposed a technology that supports factoid question answer in news video. Yeh et al. presented a photo-based question answer system for finding information about physical objects. Li et al proposed an approach that leverages YouTube video collections as a source to automatically find videos to describe cooking techniques. But these approaches usually work on certain narrow domains and can hardly be generalized to handle questions in broad domains. This is due to the fact that, in order to accomplish automatic MMQA, we first need to understand questions, which is not an easy task. Our proposed approach in this work does not aim to directly answer the questions, and instead, we enrich the community- contributed answers with multimedia contents. Our strategy splits the large gap between question and multimedia answer into two smaller gaps, i.e., the gap between question and textual answer and the gap between textual answer and multimedia answer. In our scheme, the first gap is bridged by the crowd-sourcing intelligence of community members, and thus we can focus on solving the second gap. Therefore, our scheme can also be viewed as an approach that accomplishes the MMQA problem by jointly exploring human and computer. Figure 3 demonstrates the difference between the conventional MMQA approaches and an MMQA framework based on our scheme. It is worth noting that, although the proposed approach is automated, we can also further involve human interactions. For example, our approach can provide a set of candidate images and videos based on textual answers, and answerers can manually choose several candidates for final presentation.


  1. [1] D. Mollá and J. L. Vicedo, “Question answering in restricted domains: An overview,” Computat. Linguist., vol. 13, no. 1, pp. 41–61, 2007.
  2. [2] Haubold, A. Natsev, A. ; Naphade, M.R., “ Semantic Mutimedia Retrival using Lexical Query Expansion and Model - Based Reranking” Multimedia and Expo, 2006 IEEE International Conference on Multimedia,9-12 July 2006,1761 - 1764.
  3. [3] M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang, “Towards a relevant and diverse search of social images,” IEEE Trans. Multimedia, vol. 12, no. 8, pp. 829–842, 2010.
  4. [4] S. A. Quarteroni and S. Manandhar, “Designing an interactive open domain question answering system,” J. Natural Lang. Eng., vol. 15, no. 1, pp. 73–95, 2008.
  5. [5] Y.-C. Wu and J.-C. Yang, “A robust passage retrieval algorithm for video question answering,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 10, pp. 1411–1421, 2008.
  6. [6] Y.-C. Wu and J.-C. Yang, “A robust passage retrieval algorithm for video question answering,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 10, pp. 1411–1421, 2008.
  7. [7]Y.-C.Wu, C.-H. Chang, and Y.-S. Lee, “Cross-Language Video Question/Answering System,” in Proc. IEEE Int. Symp. Multimedia Software Engineering, 2004, pp. 294–301.
  8. [8] Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang, “Joint multi-label multi-instance learning for image classification,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1–8