0% found this document useful (0 votes)
81 views5 pages

A Design of Faceted Search Engine - A Review

A Design of Faceted Search Engine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views5 pages

A Design of Faceted Search Engine - A Review

A Design of Faceted Search Engine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Engineering & Technology, 7 (3.

20) (2018) 489-493

International Journal of Engineering & Technology


Website: www.sciencepubco.com/index.php/IJET

Research paper

A Design of Faceted Search Engine – a Review


Mohammed Najah Mahdi1, Roslan Ismail2, Abdul Rahim Ahmad2 Kavintheran Thambiratnam3
Mohammed Abdulameer Mohammed4
1,2
Institute of Informatics and Computing in Energy, Universiti Tenaga Nasional, Selangor, Malaysia.
1,2
College of Computer Science & Information Technology, Universiti Tenaga Nasional, Selangor, Malaysia.
3
Photonics Research Centre, University of Malaya, Kuala Lumpur, Malaysia.
najah.mahdi@uniten.edu.my
4
Al-Rafidain University College, Baghdad, Iraq
*Corresponding Author Email: mhmd@coalrafidain.edu.iq

Abstract

The World Wide Web (WWW) allows the people to share information and data from large database repositories globally. The amount of
information is already in the billions of databases. We need to search the information with specialize tools known generically as search
engine (SE). With the huge data that needs to be handled, search engines need to retrieve meaningful information intelligently, whereby
only information of interest to the searcher needs to be returned. Facets (the particular aspect or feature of something being searched) can
play an important role in helping the user understand an information space better. Queries techniques within faceted search will make the
search results immediate and the interaction between searcher and search engine uninterrupted and focused. They can contribute to the
user‟s understanding of the researched terms or topics. Furthermore, they are more fun and interesting to use because users directly ma-
nipulate the search controls and the results can be displayed through choices of presentation such as text displays, transition animations,
graphs etc. which bring the process closer to an experience in game playing. This paper review the design of faceted search engine.

Keywords: Information Retrieval; Search Engine; Exploratory Search; Faceted Search Engine.

1. Introduction 2. Motivation
Since the advent of the WWW, people have been increasingly We have many SEs that gives information according to the rank
using the Internet as the medium to find, discover/encounter, ex- retrieval (rank list) model. Generally, query response and results
plore, exchange, and make sense of information. Because of this, representing the output are arranged in a rank based on some scor-
people now rely heavily on online resources to fulfill many kinds ing functions that combine different characteristics produced by
of information needs [1]. There has been a shift from only using the documents and queries. However, there are still some con-
the Web for single query-based searches to using it for more com- straints of conventional SE which demands further study as de-
plex and exploratory search to satisfy their information needs. scribed in the question as follows:
However, online SE and other search tools have been primarily “Results are represented according to their rank, one of the main
limited to retrieve information in the form of a set of rank docu- problems is how to rank the results returned by a SE or a combina-
ments for a given query in an effective and efficient manner [2]. tion of SEs? How do searchers think differently about their search
One important aspect that is beyond the present scope of SE is strategy when categorized overviews are available to augment the
analyzing the underlying search process of each user specifically result list and how to achieve a better accuracy in the search [4, 5].
performing an information search task. Although SE have evolved
in smarter ways to keep track of user search history and prefer-
ences to effectively suggest queries and personalize the search
3. Related Work
results, they do not focus on the user‟s information search task.
Thus, they fail to provide search path suggestions such as what This section illustrates some relative works about Infor-
query to execute next, which queries to exclude, which Web pages mation Retrieval, Defining Relevance, Set Retrieval,
offer useful information for their task, or what information to con- Ranked Retrieval.
sider as relevant to achieve the user task goal. Traditional faceted
navigation styles allow one to drill down into a subject matter to 3.1. Information Retrieval
find very specific documents. One limitation to this, however, is
the possibility to obtain a very “narrow” view of the issue, which Information Retrieval (IR) is the process of searching within a
is recognized in Kules and Shneiderman's study [3]. document collection for a particular information need which is
called a query [6]. It is finding materials (usually documents) of
unstructured nature (usually text) that satisfies an information
need from within large collections (usually stored on computers).
Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
490 International Journal of Engineering & Technology

IR typically seeks to find documents in a given collection that are In order to circumvent the difficulties of the Boolean set model, an
about a given topic or that satisfy given information need. The interesting compromise consists of ranking the search results. The
topic or information need is expressed by a query, generated from query could remain fairly loose, but the results returned could be
the user[7]. Documents that satisfy the given query in the judg- ranked according to some metric. In that case a user looking for
ment of the user are said to be relevant. Documents that are not books may enter some keywords related to the book and have
about the given topic are said to be non-relevant. In this section them ordered by popularity, price or location. In the following
we survey related work in classifying and understanding designing section we will cover these called ranked retrieval models.
search interfaces and in techniques to augment search results, but
first we need to know what is considered relevance, in order to 3.4. Ranked Retrieval
focus on improving the relevancy.
The first one is the vector space model approach developed by
3.2. Defining Relevance Salton, Wong, and Yang (1975)[13]. In the vector space model,
each document is represented by a vector. Each index in the vector
Probably the first notion to be defined is the notion of relevance of corresponds to a word or term found in the document collection.
an IR system. That is what it means for a SE to retrieve documents Each component of the vector is a numerical value which reflects
that are relevant to the user [7]. The notion of relevance itself has the importance or the weight of the term in the document. The
been the source of intense debates amongst researchers often disa- query becomes a vector which is then compared to all the other
greeing on how to measure it [8, 9]. However, the general consen- vectors document in the set. A similarity measure, usually the
sus has been to characterize relevance either through a purely cosine angle between vectors, is used to match the query against
cognitive point of view or solely through a benchmarking ap- the documents. The results are then ranked according to how close
proach. The former, which will be addressed in the later, naturally they are to the user‟s query. However, the question of properly
leads to the design of search user interfaces and to evaluation weighting each term within the document and the collection still
methods that favors user studies. In this setting, precision and remains.
recall provide a natural metric of relevance. Another major contribution to ranked retrieval and to the vector
space model is the work on by Sparck Jones [14].
3.3. Set Retrieval Stands for term frequency multiply by inverse document
frequency. Let us assume we have a document collection D of
In a Boolean set retrieval model [10], a user enters a query made documents each containing terms . The term frequency
up of Boolean operators such as AND, NOT, OR and gets docu- ( ) of a term within a document is the number of times
ments that match that query. The documents are returned with an appears in divided by the total number of terms in .
unordered set and the precision, and/or recall, depends on the us-
er‟s ability to write complex Boolean queries. Boolean search ( )
|* +|
⌊ ⌋
(1)
systems could additionally be extended with field operators to
search within specific fields of the document collection. For ex-
ample, a user can find terms within the title, text body, author, and Where * + and | | denotes set definition and cardinality of a set
other areas of the documents of interest. There has been excellent respectively. A high term frequency indicates that a term is more
documentation of the difficulty the general public has with using representative of the document content. On the other hand, we can
Boolean search models [11]. In practice, set retrieval suffers from define the document frequency ( ) of a term within a doc-
a clear trade-off between high precision and high recall. Because ument collection as the ∞document frequency is the logged
the documents returned lacked any ordering, a user can either reciprocal of this expression.
achieve very high precision by formulating a very restrictive query
| |
or, high recall by choosing a very loose one. Users usually have to ( ) ( ( ) ) (2)
|* |
be experts in formulating complex Boolean queries in order to
retrieve the most relevant set. It is important to note, however, that
The inverse document frequency emphasizes rare terms over
if the ranking of documents returned is not required due to the
common ones. The ( ) of a term within a docu-
nature of those documents, and when the domain of interest is
reserved to experts, set retrieval could be a fine approach to search. ment in the collection is the term frequency multiply by the
For example, PubMed (www.Pubmed.com) from the United inverse document frequency.
States National Library of Medicine offers an advanced search
( ) ( ) ( ) (3)
feature to help users build queries made of Boolean expressions.
The user is able to create complex queries restricted to specific
fields and made of AND, OR, NOT operators see Figure 1. This Intuitively, a term with high is a term which is repre-
advanced search feature is helpful to non-expert users, considering sentative of the document content while not being too popular on
that PubMed ranks the articles found by dates only. the whole corpus. This measure will then favour frequent but rare
terms in the document specific terms. The terms in the vector
space model can now be weighted by and a similarity
measure can then be used in order to rank each document accord-
ing to the user‟s query. The vector space model and
proved to be highly successful for ranking results in a set of doc-
uments which had no explicit connections with respect to each
other.
However, with the advent of the WWW and hypertext collections,
researchers started to develop ranking methods based on a notion
of document authority. For example, a hypertext collection could
be modeled as a graph with links as edges and documents as nodes.
That graph can then be harnessed in order to rank documents
based on a certain notion of authority, and independently of the
Fig. 1: The Boolean Search Interface of PubMed[12]
user‟s query. In this respect Jon Kleinberg‟s HITS algorithm [15]
and Larry Page and Sergey Brin‟s PageRank [16] were the two
International Journal of Engineering & Technology 491

most notable measures of authority see Figure 2. The latter meas- The lookup-based model has been identified as best suited for
ure was on the basis of Google‟s search engine. question answering tasks and fact finding [21]. In fact, the process
must start with a carefully specified query, and should end with
precise results. But the results returned, together with their poten-
tial relationships, are not intended to be further analyzed with
more scrutiny.

In the look-up based model, the answer is assumed to be found in


the matched documents, not necessarily in the search results them-
selves. The query represents a one shot summary of the user‟s
underlying information needed. However, given today‟s reality of
information overload, the lookup-based model appears to fall short
in adequately answering the user‟s insatiable thirst for new infor-
mation and knowledge. This has led researchers to go beyond this
paradigm, and look into a new class of information seeking,
known as exploratory search [24].

Fig. 2: PageRanks of Simple Network of Websites [17]


5. Designing Search Interfaces
Today the ranking algorithms are much more complex, and Pag-
eRank, for example, is just one more signal amongst many others This paper takes the perspectives of information seeking improve
used. Numerous other measures of document relevance should usability tasks likes Rocha, Zhang and González but Nielsen [25]
also be noted such as F-score, Mean Average Precision (MAP) or Described five usability goals of a user interface in details we can
Normalized Discounted Cumulative Gain (NDCG) [18]. Machine put as, namely: learnability, efficiency, memorability, errors, and
learning techniques could be used to train different rankers opti- satisfaction.
mized on a given performance measure. The ranking models pro-
duced could even be combined or ensemble in order to achieve Table 1: Designing Steps
greater performance [19]. Furthermore, with the advent of the Usability Goals Detail
social web, search is now sought to be personalized to a specific Learnability Relates to the facility with which first-time users
user‟s need and profile. successfully complete initial jobs using the inter-
face.
Efficiency Pertains to the rapidity with which users accom-
4. Exploratory Search Engine plish their tasks once the initial interface functions
are understood.
Memorability Relates to the ability of a user to return to profi-
Current commercial SE use a process known as the query and ciency following a period of non-use.
response. The user issues a query, and receives, as a response, a Errors Are essential to gain understanding from the user
set of potentially relevant documents. The process has been for- interface perspective. We intend to determine the
malized by [20] in the lookup-based model. As shown in Figure 3, types of errors made, their frequencies, and whether
the model is comprised of four main elements. On the left hand users can surmount them, and ultimately, become
side, the documents are processed in a summarized form under- successful in using the interface.
standable by the user, known as the document surrogates. On the Logically Errors and the other aforementioned interface as-
right hand side, the user‟s underlying information need is reduced pects affect user satisfaction.
to a query statement. This later usually takes the form of a set of
keywords together with Boolean operators. A match occurs when We need to understand clearly the manners in which users are
the document surrogates fit in the user‟s query. The user then in- satisfied (or dissatisfied) and to what degree. We can now explain
vestigates the surrogates, and if appropriate, delves into the docu- the process of designing an interface in detail by keeping the five
ments of interest. The process may repeat itself, with the user aforementioned usability principles in mind.
attempting to find the right query which will yield the right set of
documents. 5.1. Designing Process

At present, web interfaces follow a user-centered approach in


design. This process involves a series of steps as outlined in Fig-
ure 4 in which the user is constantly solicited [26].

Fig. 4: User-Centered Design Approach

Fig. 3: The Lookup-Based Model According to Bates [20] The series of steps of the designing process are shown in Table 2
in detail:
492 International Journal of Engineering & Technology

Table 2: Designing Steps


Step No Detail
User needs Usually consists of developing a user needs assessment.
assessment This may involve repeated interviews with a variety of
users in order to fully understand who they are and what
goals they have.
Task analysis The designer must understand what tasks are necessary Access unstructured data with full text
for the user to achieve its goal. This step is called task search
analysis [22] and involves that a designer chooses the
user goals and tasks which will be supported by the
interface. These steps can take the form of working
Order the search results
scenarios that typify anticipated tasks.
Prototype and Involves the creation of a prototype that will be infor-
Assessment of mally tested by a set of target users. This step is repeat-
usability ed by revising the prototype until the designer and the
users satisfy the desired usability goals. This process is
time-consuming and costly, and therefore, the designer
Refining by type
may select as few user participants as reasonably possi-
ble. The latter principle is sometimes referred to as
discount usability testing [23].

5.2. Small Details and Aesthetic Design


Fig. 5: Faceted Searches at Amazon.com for the Query “Video Games”
The presented design guidelines are useful. However, attention to
small details can make a significant difference between a success- 6.1. Organizing Facets
ful and a failed interface. For example, the amount of space visu-
ally presented to a user in a query box can influence the length of In this subsection, we turn our attention to organizing facets and
the query. Users seeing a wide entry area will be encouraged to their respective values. We merely provided common practices
type long queries [27]. and recommendations, but by no means do we claim to be exhaus-
Aesthetics has an important role in user interface experience. The tive, as shown in figure below:
impression generated by the appearance of a design tends to corre-
late with user impression of its quality and user satisfaction [28].
However, although they provide users with a positive impression
of relevance, pages with aesthetic design may actually be less
useful than pages with basic design [29], In previous work [30] it
was uncovered that experiential aspects such as prejudices, evoked
memories, expectations can influence on how blind users experi-
ence the accessibility of a Website. Hsieh and Cheng [31] Worked Fig. 6: Facets Organizing
on the usability of “human-computer interaction”, and the users‟
the users‟ experiences to the integration of the design and aesthetic The series of steps of the designing Facets Organizing process are
interaction principles required for the experiences of aesthetic shown in Table 3 in detail:
interaction so as to make up the past shortcomings. Hotchkiss [32]
Table 3: Facets Organizing process
interviewed a Google vice president and reported that an extensive
Facets Or- Details
list of details would be carefully considered in the design of the ganizing
search result page. In the upper left corner, also known as the Static Order- The first approach in organizing facets simply involves
“sweet spot”, Google ensures that the ads placed are not only rele- ing keeping their location constant throughout the use of the
vant but also merge attractively within the search results. interface. This organization is called “static ordering”,
and has the advantage of reinforcing the user‟s mental
model of the interface. By simply keeping each feature of
6. A Survey of Faceted Search the interface static or constant, a user always knows
where to expect these features of the interface. The
A combination of faceted navigation and full-text search leads to a drawback of this approach with respect to FS is that
FS, as indicated in Figure 5. The structured information, or several facets may not be relevant to user query, and
metadata, is browsed using a faceted navigation interface. The therefore, may not be useful when shown.
remaining unstructured data (or full text) are accessed using a Dynamic In contrast to the static ordering of facets, dynamic order-
Ordering ing places facets in a specified order based on ranking
simple search box. After a search is performed, the user can im- algorithms that estimate the utility function of facets with
mediately see into which facets the results fell in. This step pro- respect to user query. This approach is particularly useful
vides further guidance for subsequent searches and refinements. when a potentially large number of facets are possible.
Similar to faceted navigation, FS provides guidance through the Therefore, this approach is advantageous when only a
space of possible queries and their results. However, these facets few and most relevant facets apply to user query.
generally always portray the same look and feel. They are typical- Grouping Another design option involves grouping related facets
ly represented as a hierarchical directory of choices. Interfaces that Ordering based on some notion of similarity. A simple example
attempt to represent facets and their values with an appropriate relates to academic journal search. Users may wish to
search according to authors, reviewers, name of institu-
look and feel are rare. For example, a user may want to see the
tions, advisers, and so on. We can create an individual
location of a product on a map rather than as a list of countries or facet for each of these items, but alternatively, we can
cities. The user may also be interested in relating different facets simply group them into a facet called “people” From this
to draw insights from the data. As indicated in next section. The grouping, we can organize the aforementioned elements
subsequent review is important given that FS should be imple- into sub-facets. This method is a useful means to add
mented with a clear understanding of potential issues and chal- several facets in a manner that sensibly facilitates the
lenges that may arise. development and refinement of user query while preserv-
ing static ordering.
International Journal of Engineering & Technology 493

Creating a hierarchy is a similar solution for presenting facets [8] S. Mizzaro, "Relevance: The whole history," JASIS, vol. 48, pp.
(static, dynamic, and grouping) that can achieve static ordering 810-832, 1997.
while simultaneously ranking the facet. Hierarchical facet values [9] T. Saracevic, "Relevance: A review of the literature and a frame-
work for thinking on the notion in information science. Part III: Be-
can be used in grouping even for facets that initially lack order.
havior and effects of relevance," Journal of the American Society
For example, a tree that displays the location of facet values can for Information Science and Technology, vol. 58, pp. 2126-2144,
be formed. The designer can create and enforce any number of 2007.
hierarchical values that are deemed useful. [10] A. Singhal, "Modern information retrieval: A brief overview,"
IEEE Data Eng. Bull., vol. 24, pp. 35-43, 2001.
6.2. Exploration of Various Faceted Search Approaches [11] D. Wolfram, A. Spink, B. J. Jansen, and T. Saracevic, "Vox populi:
The public searching of the web," JASIST, vol. 52, pp. 1073-1074,
FS allows users to explore or navigate within the document collec- 2001.
[12] "PubMed," 2017.
tion. However, most mainstream search systems only feature a [13] G. Salton, A. Wong, and C.-S. Yang, "A vector space model for
fixed mode of interaction. For example, search results are most automatic indexing," Communications of the ACM, vol. 18, pp.
often depicted as a list of text with minimal interactions, such as 613-620, 1975.
sorting or paging. To obtain new understanding of data, allowing [14] K. Sparck Jones, "A statistical interpretation of term specificity and
for multiple interaction modes is necessary. According to White its application in retrieval," Journal of documentation, vol. 28, pp.
and Roth [24], Exploratory Search Engine should increase user 11-21, 1972.
responsibility and control. This feature should include letting the [15] J. M. Kleinberg, "Authoritative sources in a hyperlinked environ-
user select how the data is visualized depending on the task of ment," Journal of the ACM (JACM), vol. 46, pp. 604-632, 1999.
[16] S. Brin and L. Page, "The anatomy of a large-scale hypertextual
interest. web search engine," Computer networks, vol. 56, pp. 3825-3833,
1998.
7. Summary [17] J. Park and S.-H. Yook, "Bayesian Inference of Natural Rankings in
Incomplete Competition Networks," Scientific Reports, vol. 4, p.
6212, 08/28/online 2014.
In this paper, we discusses exploratory search and then focus on [18] K. Järvelin and J. Kekäläinen, "Cumulated gain-based evaluation of
faceted search and beyond the traditional Faceted Search interface. IR techniques," ACM Transactions on Information Systems (TOIS),
First, we review the rank retrieval and their Exploratory Search vol. 20, pp. 422-446, 2002.
Engine. To promote exploration, the interface should provide in- [19] R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes, "Ensem-
stant feedback on the user‟s potential actions. Also provided a ble selection from libraries of models," in Proceedings of the twen-
design of search user interfaces because the user interface of a SE ty-first international conference on Machine learning, 2004, p. 18.
[20] M. J. Bates, "The design of browsing and berrypicking techniques
forms the first and last impressions made on a user and it is a criti-
for the online search interface," Online review, vol. 13, pp. 407-424,
cal focal point for all users experience at every stage of the search. 1989.
It is through the interface that the queries are formed and convert- [21] G. Marchionini, "Exploratory search: from finding to understand-
ed into informative answers. The recommendations made in this ing," Communications of the ACM, vol. 49, pp. 41-46, 2006.
paper can be a guide for creating an interface that fosters im- [22] E. Goodman, M. Kuniavsky, and A. Moed, "Observing the user ex-
provements to all aspects and stages of the user search. Better perience," Burlington, Massachusetts: Morgan Kaufmann, 2012.
interface designs assist users in articulating better queries, help [23] J. Nielsen, "Usability 101: Introduction to usability," ed, 2003.
them understand the results and facilitate query modifications if [24] R. W. White and R. A. Roth, "Exploratory search: beyond the que-
ry-response paradigm (Synthesis lectures on information concepts,
necessary. FS combines faceted navigation with full text search to
retrieval & services)," Morgan and Claypool Publishers, vol. 3,
help users to work with contents that are semi-structured whilst 2009.
full text search is for non-structured contents. [25] J. Nielsen, "Guerrilla HCI: Using discount usability engineering to
penetrate the intimidation barrier," Cost-justifying usability, pp.
245-272, 1994.
Acknowledgment [26] S. Ben and P. Catherine, "Designing the user inter-
face,")^(Eds.):„Book Designing the user interface‟(Reading, Mass.:
This research was sponsored and supported under the Universiti Addison Wesley Longman, 1998, edn.), 2005.
Tenaga Nasional (UNITEN) internal grant no J510050783 (2018). [27] K. Franzen and J. Karlgren, "Verbosity and interface design," SICS
Many thanks to the Innovation & Research Management Center Research Report, 2000.
(iRMC), UNITEN who provided their assistance and expertise [28] M. Hassenzahl, "The interplay of beauty, goodness, and usability in
interactive products," Human-computer interaction, vol. 19, pp.
during the research.
319-349, 2004.
[29] T. Ben-Bassat, J. Meyer, and N. Tractinsky, "Economic and subjec-
References tive measures of the perceived value of aesthetics and usability,"
ACM Transactions on Computer-Human Interaction (TOCHI), vol.
13, pp. 210-234, 2006.
[1] J. Curran, N. Fenton, and D. Freedman, Misunderstanding the in- [30] A. Aizpurua, M. Arrue, and M. Vigo, "Prejudices, memories, ex-
ternet: Routledge, 2016. pectations and confidence influence experienced accessibility on
[2] A. Selcuk, C. Örencik, and E. Savas, "Private search over big data the Web," Computers in Human Behavior, vol. 51, pp. 152-160,
leveraging distributed file system and parallel processing," 2015. 2015.
[3] B. Kules and B. Shneiderman, "Users can change their web search [31] H. C. L. Hsieh and N. C. Cheng, "A Theoretical Model for the De-
tactics: Design guidelines for categorized overviews," Information sign of Aesthetic Interaction," in International Conference on Hu-
Processing & Management, vol. 44, pp. 463-484, 2008. man-Computer Interaction, 2016, pp. 178-187.
[4] D. Bakrola and S. Gandhi, "Enhancing Web Search Results Using [32] G. Hotchkiss, T. Sherman, R. Tobin, C. Bates, and K. Brown,
Aggregated Search," in Proceedings of International Conference on "Search engine results: 2010," Enquiro Search Solutions, pp. 1-61,
ICT for Sustainable Development, 2016, pp. 675-688. 2010.
[5] N. Ibrahim, A. H. Chaibi, and H. B. Ghézala, "Scientometric re-
ranking approach to improve search results," Procedia Computer
Science, vol. 112, pp. 447-456, 2017.
[6] A. N. Langville and C. D. Meyer, Google's PageRank and beyond:
The science of search engine rankings: Princeton University Press,
2011.
[7] C. D. Manning, P. Raghavan, and H. Schütze, "Introduction to in-
formation retrieval," ed: Cambridge University Press, 2008.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy