Interactive and Informative Community Question Answering with Multimedia Support

Full Text Download |
Abstract
Generally community us to seek information for our queries and doubts. It enables the community members to
post questions and answers to that. However, existing community question answering forums usually provide only textual
answers, which are not informative enough for many questions. To improve the understanding and provide additional
information we propose a novel scheme which allows community members to post the multimedia content i.e., Images
and Videos. Unlike several MMQA researches, this scheme deals with more complex questions rather than answering to
question with image and video data.
Keywords:Community question answering, Answer medium selection, Multimedia search, Multimedia data selection and presentation.
I.Introduction
Question-answering (QA) is a technique to answer question posted in natural language automatically. In keyword-based
search systems, it greatly facilitates the communication between humans and computer by naturally stating users' intention
in plain sentences. It also avoids tedious browsing of a vast quantity of information contents returned by search engines
for the correct answers. However, fully automated question answering still faces challenges that are not easy to tackle,
such as the deep understanding of complex questions and the sophisticated syntactic, semantic and contextual processing
to generate answers. It is found that, in most cases, automated approach cannot obtain results that are as good as those
generated by human intelligence.
Community question answering has emerged as a popular alternative to acquire information online, owning to the
following facts. First, information seekers are able to post their specific questions on any topic and obtain answers
provided by other participants. By leveraging community efforts, they are able to get better answers than simply using
search engines. Second, in comparison with automated question answering systems, community question answering usually
receives answers with better quality as they are generated based on human intelligence. Third, over times, a tremendous
number of question answering pairs have been accumulated in their repositories, and it facilitates the preservation and
search of answered questions.
Despite their great success, existing community question Answering forums mostly support only textual answers, as
shown in Figure 1. Unfortunately, textual answers may not provide sufficient natural and easy-to-grasp information.
Figure 1 (a) and (b) illustrate two examples. For the questions "What is Bluetooth and how does it work" and "Do anybody
know how to make pizza”, the answers are described by long sentences. Clearly, it will be much better if there are some
accompanying videos and images that visually demonstrate the process or the object. Therefore, the textual answers in
community question answering can be significantly enhanced by adding multimedia contents, and it will provide answer
seekers more comprehensive information and better experience. In fact, users usually post URLs that link to supplementary
images or videos in their textual answers.
For example, for the questions in Figure 1 (c) and (d), the best answers on Y!A both contain video URLs. It further
confirms that multimedia contents are useful in answering several questions. But existing community question answering
forums do not provide ad- equate support in using media information. In this paper, we propose a novel scheme which can
enrich community-contributed textual answers in community question answering with appropriate media data. Figure 2
shows the schematic illustration of the approach. It contains three main components: (1) Answer Medium Selection, (2)
Multimedia Search, (3) Multimedia data selection and presentation.
(1) Answer medium selection. Given a question paper pair, it predicts whether the textual answer should be enriched
with media information, and which kind of media data should be added. Specifically, we will categorize it into one of the
four classes: text, text + image, text + video, and text + image + video. It means that the scheme will automatically collect
images, videos, or the combination of images and videos to enrich the original textual answers.
(2) Multimedia search. In order to collect multimedia data, we need to generate informative queries. Given a question
answering pair, this component extracts three queries. The most informative query will be selected by a three-class
classification model.
(3) Multimedia data selection and presentation. Based on the generated queries, we vertically collect image and video data
with multimedia search engines. We then perform reranking and duplicate removal to obtain a set of accurate and
representative images or videos to enrich the textual answers. It is worth mentioning that there already exist several re- search
efforts dedicated to automatically answering questions with multimedia data, i.e., the so-called Multimedia Question
Answering (MMQA). For example, Yang et al proposed a technology that supports factoid question answer in news video.
Yeh et al. presented a photo-based question answer system for finding information about physical objects. Li et al proposed
an approach that leverages YouTube video collections as a source to automatically find videos to describe cooking
techniques. But these approaches usually work on certain narrow domains and can hardly be generalized to handle
questions in broad domains. This is due to the fact that, in order to accomplish automatic MMQA, we first need to
understand questions, which is not an easy task.
Our proposed approach in this work does not aim to directly answer the questions, and instead, we enrich the
community- contributed answers with multimedia contents. Our strategy splits the large gap between question and
multimedia answer into two smaller gaps, i.e., the gap between question and textual answer and the gap between textual
answer and multimedia answer. In our scheme, the first gap is bridged by the crowd-sourcing intelligence of community
members, and thus we can focus on solving the second gap. Therefore, our scheme can also be viewed as an approach that
accomplishes the MMQA problem by jointly exploring human and computer. Figure 3 demonstrates the difference between
the conventional MMQA approaches and an MMQA framework based on our scheme. It is worth noting that, although the
proposed approach is automated, we can also further involve human interactions. For example, our approach can provide a set
of candidate images and videos based on textual answers, and answerers can manually choose several candidates for final
presentation.
References:
- [1] D. Mollá and J. L. Vicedo, “Question answering in restricted domains: An overview,” Computat. Linguist., vol. 13, no. 1, pp. 41–61, 2007.
- [2] Haubold, A. Natsev, A. ; Naphade, M.R., “ Semantic Mutimedia Retrival using Lexical Query Expansion and Model - Based Reranking” Multimedia and Expo, 2006 IEEE International Conference on Multimedia,9-12 July 2006,1761 - 1764.
- [3] M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang, “Towards a relevant and diverse search of social images,” IEEE Trans. Multimedia, vol. 12, no. 8, pp. 829–842, 2010.
- [4] S. A. Quarteroni and S. Manandhar, “Designing an interactive open domain question answering system,” J. Natural Lang. Eng., vol. 15, no. 1, pp. 73–95, 2008.
- [5] Y.-C. Wu and J.-C. Yang, “A robust passage retrieval algorithm for video question answering,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 10, pp. 1411–1421, 2008.
- [6] Y.-C. Wu and J.-C. Yang, “A robust passage retrieval algorithm for video question answering,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 10, pp. 1411–1421, 2008.
- [7]Y.-C.Wu, C.-H. Chang, and Y.-S. Lee, “Cross-Language Video Question/Answering System,” in Proc. IEEE Int. Symp. Multimedia Software Engineering, 2004, pp. 294–301.
- [8] Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang, “Joint multi-label multi-instance learning for image classification,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 1–8