0% found this document useful (0 votes)
44 views7 pages

V3i608 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

V3i608 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Volume 3, Issue 6, June-2016, pp.

314-320 ISSN (O): 2349-7084

International Journal of Computer Engineering In Research Trends


Available online at: www.ijcert.org

Context Based XML Data and


Diversification for Keyword Search Queries
1Mr.
RAHUL HON , 2 Mrs. N.SUJATHA
1
Pursuing M.Tech(CSE)from Jagruti Institute of Engineering and Technology
2
Associate Professor, Department of Computer Science and Engineering,
Jagruti Institute of Engineering and Technology, Telangana State, India.

Abstract In searching process user enter particular candidate searching keyword and with the help of searching
algorithm respective searching query is executed on targeted dataset and result is return as an output of that
algorithm. In this case it is expected that meaningful keyword has to be entered by user to get appropriate result set.
In case of confusing bunch of keywords or ambiguity in it or short and indistinctness in it causes an irrelevant
searching result. Also searching algorithms works on exact result fetching which can be irrelevant in case problem in
input query and keyword. This problem statement is focused in this system. By considering the keyword and its
relevant context in XML data , searching should be done using automatically diversification process of XML keyword
search. In this way system may satisfy user, as user gets the analytical result set based on context of searching
keywords. For more efficiency and to deal with big data, HADOOP platform is used. baseline efficient algorithms are
proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Compare
selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have
to cover maximal number of distinct results on real and synthetic data sets demonstrates the effectiveness
diversification model and the efficiency of algorithms

Keywords Data Mining, Search Engine Optimization, XML Dataset, Baseline Algorithm, Candidate Keyword
,XML Keyword search, feature selection, diversification process.

I. INTRODUCTION accuracy, does not giving a correct answer,


require large time for searching and large amount
Keyword search is the most important
of storage space for data storage. Data mining or
information discovery technique because the user
information retrieval is the process to retrieve
does not need to know either a query language or
data from large database and transform it to user
the underlying structure of the data. Large in un-derstandable form easily gets that
number of techniques is used in XML search information. One important advantages of
system. Keyword search is the technique use for
keyword search is user does not require a proper
the retrieving data or information. Keyword knowledge of database queries. User easily
search can be implementing on machine learning inserts a keyword for searching and gets a result
databases, also it possible on graph structure
related to that keyword. Keyword search on
which combines relational, HTML and XML data.
relational databases find the answer of the tuples
Keyword search use number of techniques and
which are connected to database keys like
algorithm for storing and retrieving data, less

2016, IJCERT All Rights Reserved Page | 314


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

primary key and foreign keys. So this system also ranking phases , top k important query
present which comparative techniques used for processing is discussed. Different data models
keyword search like DISCOVER, BANKS, such as XML , graph-structured data is discussed.
BLINKS, EASE, and SPARK. Existing techniques Application of these concepts is also discussed in
for information retrieval on real world databases which keyword based search is having prime
and also experimental result indicate that existing importance. In this paper some problems like
search techniques are not capable of real world Diverse Data Models, Query Forms: Complexity
information retrieval and data mining task. Data versus Expressive Power , Search Quality
mining is finding insights which are statistically Improvement , Evaluation are also discussed [2].
reliable from data, identification of records which
XRANK system is discussed in this paper.
does not match the usual patterns might be
Ranked search technique over XML data is
interesting that require further investigation.
considered here. In this paper space saving and
Association searches for relationships various
performance gaining techniques such as index
attributes like milk and bread along with jam. So
structure and query evaluation are also focused.
providing a good discount on combination can
XRANK can help in searching for HTML as well
enhance the sales. Process of grouping together
as XML documents. Disadvantage: For instance,
values in the data that have similar patterns but
authors have currently taken a document-centric
these patterns are not known in advance.
view, where they assume that query results are
Analysing the data we make clusters of employee
strictly hierarchical. Index maintenance is major
who reach the target more than ten times per
problem for effective search and which is
week and other who make less than 10
bottleneck area [3].
transactions. It is the process of grouping the data
into different classed on the basis of previously In this SLCA-based keyword search approach is
known structures. For example we make discussed. Queries called the Multiway - SLCA
classification for example student percentage approach (MS) is helpful to promote the keyword
above 70% as distinction, between 60 to 70% search beyond and old methods like AND / OR.
percentage first class and below 60% average. After LCA analysis improved algorithms are put
Regression attempts to find a function which to solve search problems based on keywords [4].
models the data with the least error fits the data
onto the function so that one value can be In this Indexed Lookup Eager and Scan Eager,
derived from another. algorithms are discussed. XML search based on
keyword according to SLCA semantics is prime
II. LITERATURE SURVEY topic of discussion and for this these algorithm
are used. Instant search result is the beauty of
In this by considering the keyword and its
theses algorithm. XKSearch architecture
relevant context in XML data , searching should
implementation is discussed in it. The XKSearch
be done using automatically diversification
system inputs a list of keywords and returns the
process of XML keyword search is the major area
set of Smallest Lowest Common Ancestor nodes
of concern [1].
[5].
In this for structured and semi-structured data,
Query and information relevance is calculated so
various state-of-the-art techniques are discussed
that unnecessary checks are avoided and effective
for keyword search. In this query optimization ,
search is achieved. Hence effective text retrieval

2016, IJCERT All Rights Reserved Page | 315


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

and summarization is achieved. The Maximal query logs are analyzed in this paper from search
Marginal Relevance (MMR) achieves the engine [10].
stopping of redundancy. This approach provides
In this single swap and multi swap algorithms
very much relevant data in terms of search result
are used in this paper. On structured data
to the end user by effectively minimizing the
differentiation of search results is carried out.
redundancy [6].
Degree of difference is quantified so that it
In this paper Risk of dissatisfaction of user is represents the accuracy of search result. Features
major area of concern. To minimize it systematic from the search result are traced and this result is
approach to diversifying results is discussed in it. prominently considered in calculation [11].
For this several techniques such as NDCG, MRR,
In this by considering query result and its
and MAP are discussed in detail in it. A Greedy
redundancy, new scheme named re-ranking
Algorithm for Diversification used in it. Among
query interpretations is discussed to diversify the
the search result user should find most relevant
search result. For sub-topics and relevance new
data is the aim of diversification. Also another
proposed technique such as propose -n DCG-W
aim of this paper is to minimize the rank of best
and WS-recall is promoted in it. Algorithm
fitted result [7].
named as Diversification algorithm is used in it.
This paper also uses greedy approach. Different For database query search query similar measure
datasets are considered in this to get approach and greedy algorithm is used to obtain
tested thoroughly and relevant document in diversified query interpretation and its relevance
terms of search result is expected as search result [12].
[8].
III.METHODOLOGY
In this using test collection based on TREC
question answering track this paper discussed
Data Mining Search Engine: Search Engine
the framework which achieves novelty and Optimization is the procedure of improving the

diversity. In this approach document is linked visibility of a website or webpage in search


engine unpaid searched results by increasing
with the relevant information in it. Chunk of
Search Engine Results Page ranking.
information is in this way get attached with
Optimization may target different types of search
document and which is helpful in at time of
like image search, local search, video search,
search. This piece of information is having
content as well as document properties. The academic search, new search, industry specific
vertical search .It can also be define as the process
major drawback of this approach is that unusual
of affecting the visibility of a website or webpage
features of document may cause judging error.
Some raw data related with the document may in search engine. XML is an immense, huge and
dynamic data collection that includes infinite
delay the search result [9].
hyperlinks and volumes of data usage
Using past query and its analysis provides information-hence requires effective data mining.
proper direction for diversification. Past query But huge data is still a challenge in knowledge
reformulation provides exact query related discovery. Web pages have dynamic data and do
behavior of user. Client data request, his ranked not follow any uniform structure. Web pages
structure and query is observed and analysed at contains huge amount of raw data that is not
client side for proper diversified result. Large indexed therefore searching in web data has

2016, IJCERT All Rights Reserved Page | 316


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

become more complex; time consuming and documents available all over. Crawler based
difficult. Web not only contains static data but search engines are those that use automated
also data that requires timely updating such as software read the information on the actual
news, stock markets, live channels etc. People website.
from different communities have different
backgrounds and use internet for different usage
IV.SYSTEM STUDY
purposes. Many have different interests and lack EXISTING SYSTEM:
knowledge of internet usage. Hence user gets lost
within huge amount of data. A given user The problem of diversifying
generally focuses on only a tiny portion of the
keyword search is firstly studied in
Web, dismissing the rest as uninteresting data
that serves only to swamp the desired search IR community. Most of them

results perform diversification as a post-

processing or reranking step of


Keyword-based search: This includes search
which use keyword indices or manually built document retrieval based on the
directories to find documents with specified analysis of result set and/or the query
keywords or topics.e.g engines such as Google or
logs. In IR, keyword search
Yahoo
diversification is designed at the
Querying deep Web sources: Where information
topic or document level.
such as amazon.coms book data and
realtor.coms realestate data, hides behind Liu et al. is the first work to measure
searchable database query formsthat, unlike the the difference of XML keyword
surface web, cannot be accessed through static
search results by comparing their
URL links.
feature sets. However, the selection
Random Surfing: That follows web linkage
of feature set is limited to metadata
pointers
in XML and it is also a method of
Query based search engines: Programming post-process search result analysis.
search database and internet sites for the
documents containing keywords specified by a DISADVANTAGES OF EXISTING SYSTEM:
user, primary function is providing a search for
gathering and reporting information available on When the given keyword query only
the internet or a portion of the internet.
contains a small number of vague
Communication to the search engines
keywords, it would become a very
requirements so that can recommend most
relevant websites related to search. Searched by challenging problem to derive the
the requirement that is being given the text on the users search intention due to the
page and titles and description that are given.
high ambiguity of this type of
When we use the search engine in relation to the
XML usually referring to the actual search forms keyword queries.
that search through databases of XML HTML

2016, IJCERT All Rights Reserved Page | 317


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

Although sometimes user To address the existing limitations


involvement is helpful to identify and challenges, we initiate a formal
search intentions of keyword queries, study of the diversification problem

a users interactive process may be in XML keyword search, which can


time-consuming when the size of directly compute the diversified

relevant result set is large. results without retrieving all the


It is not always easy to get this useful relevant candidates.

taxonomy and query logs. In Towards this goal, given a keyword


addition, the diversified results in IR query, we first derive the co-related

are often modeled at document feature terms for each query

levels. keyword from XML data based on


A large number of structured XML mutual information in the probability

queries may be generated and theory, which has been used as a

evaluated. criterion for feature selection. The


There is no guarantee that the selection of our feature terms is not

structured queries to be evaluated limited to the labels of XML

can find matched results due to the elements.

structural constraints; Each combination of the feature

The process of constructing terms and the original query

structured queries has to rely on the keywords may represent one of


metadata information in XML data. diversified contexts (also denoted as

specific search intentions). And then,


PROPOSED SYSTEM:
we evaluate each derived search
To address the existing issues, we intention by measuring its relevance
will develop a method of providing to the original keyword query and
diverse keyword query suggestions the novelty of its produced results.
to users based on the context of the To efficiently compute diversified
given keywords in the data to be keyword search, we propose one
searched. By doing this, users may baseline algorithm and two
choose their preferred queries or improved algorithms based on the
modify their original queries based observed properties of diversified
on the returned diverse query keyword search results.
suggestions.

2016, IJCERT All Rights Reserved Page | 318


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

ADVANTAGES OF PROPOSED SYSTEM: We get that our proposed


diversification algorithms can return
Reduce the computational cost.
qualified search intentions and
Efficiently compute the new SLCA
results to users in a short time.
results

V. SYSTEM ARCHITECTURE:

Search Engine optimization is procedure of information scores represents a search intention.


improving the visibility of webpage in search The aggregated mutual information score of the
engine natural results by increasing search each search intention represents to some extent
engine page ranking may target different types of the confidence of the context of the query
search like image hyperlinks, HTML, XML, video keywords without other knowledge to generate
industry search defines as the process of affecting the search intentions and then click the
the visibility of a webpage in search engine. corresponding queries in descending order by
Database is huge and dynamic collection includes aggregated mutual information scores. 20 feature
highlighting points volumes of data usage terms for each query keyword and then generate
information hence requires effective mining is all the possible search intentions from which we
challenge in knowledge discovery. XML pages further identify the top k qualified and
are more complex than text data do not follow diversified queries the original query. Baseline
any uniform structure that contains raw data that algorithm retrieves the pre-computed feature
is not indexed therefore searching in web data terms of the given keyword query from the XML
has become more complex time consuming and data T and then generate all the possible intended
difficult The procedure of generating a query queries based on the retrieved features terms at
from the original keyword data to be searched, last compute the SLCAs as keyword search
keyword query q first retrieve the corresponding results for each query and measure its
feature terms for each query keyword and the diversification score. Different traditional XML
construct matrix are sorted based on mutual keyword search to detect and remove the

2016, IJCERT All Rights Reserved Page | 319


RAHUL HON et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 06, June-2016, pp. 314-320

duplicated results by comparing the generated [5] F. Radlinski and S. T. Dumais, Improving
results may cover multiple. personalized web search using result
diversification, in Proc. SIGIR, 2006, pp. 691
VI. CONCLUSION 692.

This work is presented a method to search


[6] Z. Liu, P. Sun, and Y. Chen, Structured
diversified analysis of keyword query from XML search result differentiation, J. Proc. VLDB
data based on the contexts of the query keywords Endowment, vol. 2, no. 1, pp. 313 324, 2009.
in the data. The diversification of the contexts
was measured by exploring their relevance to the [7] E. Demidova, P. Fankhauser, X. Zhou, and W.
original query and the novelty of their results. Nejdl, DivQ:Diversification for keyword search
Furthermore, framework is efficient algorithms over structured databases, inProc. SIGIR, 2010,
based on the observed properties of XML pp. 331338.
keyword search analysis. Our comparative study,
[8] N. Sarkas, N. Bansal, G. Das, and N. Koudas,
demonstrated the efficiency of proposed
Measure-driven keyword-query expansion, J.
algorithms by running substantial number of
Proc. VLDB Endowment, vol. 2,no. 1, pp. 121
queries over XMark datasets. At the same time,
132, 2009.
we also verified the effectiveness of our
diversification model by analyzing the returned [9] N. Bansal, F. Chiang, N. Koudas, and F. W.
search intentions for the given keyword queries Tompa, Seeking stable clusters in the
over DBLP dataset and search intentions and logosphere, in Proc. 33rd Int. Conf. Very Large
results to users in a short time. Data Bases, 2007, pp. 806817.

REFERENCES [10] S. Brin, R. Motwani, and C. Silverstein,


Beyond market baskets:Generalizing association
[1] J. G. Carbonell and J. Goldstein, The use of
rules to correlations, in Proc. SIGMOD Conf.,
MMR, diversitybased reranking for reordering
1997, pp. 265276.
documents and producing summaries,in Proc.
SIGIR, 1998, pp. 335336. [11] W. DuMouchel and D. Pregibon, Empirical
bayes screening for multi-item associations, in
[2] R. Agrawal, S. Gollapudi, A. Halverson, and
Proc. 7th ACM SIGKDD Int. Conf.
S. Ieong, Diversifying search results, in Proc.
2nd ACM Int. Conf. Web Search Data Mining, ABOUT THE AUTHORS
2009, pp. 514.
Mr.RAHUL HON is pursuing M.Tech degree in,
[3] H. Chen and D. R. Karger, Less is more: Computer Science and Engineering from Jagruti
Institute of Engineering and Technology, Telangana
Probabilistic models for retrieving fewer relevant
State, India.
documents, in Proc. SIGIR, 2006, pp. 429436.
Mrs.N.SUJATHA is presently
[4] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. working as Associate Professor in,
Department of computer science and
Vechtomova, A. Ashkan, S. Buttcher, and I.
engineering, Telangana
MacKinnon, Novelty and diversity in State,India.She has published several
information retrieval evaluation, in Proc. SIGIR, research papers in both International and National
conferences and Journals.
2008, pp. 659 666.

2016, IJCERT All Rights Reserved Page | 320

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy