V3i608 PDF
V3i608 PDF
Abstract In searching process user enter particular candidate searching keyword and with the help of searching
algorithm respective searching query is executed on targeted dataset and result is return as an output of that
algorithm. In this case it is expected that meaningful keyword has to be entered by user to get appropriate result set.
In case of confusing bunch of keywords or ambiguity in it or short and indistinctness in it causes an irrelevant
searching result. Also searching algorithms works on exact result fetching which can be irrelevant in case problem in
input query and keyword. This problem statement is focused in this system. By considering the keyword and its
relevant context in XML data , searching should be done using automatically diversification process of XML keyword
search. In this way system may satisfy user, as user gets the analytical result set based on context of searching
keywords. For more efficiency and to deal with big data, HADOOP platform is used. baseline efficient algorithms are
proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Compare
selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have
to cover maximal number of distinct results on real and synthetic data sets demonstrates the effectiveness
diversification model and the efficiency of algorithms
Keywords Data Mining, Search Engine Optimization, XML Dataset, Baseline Algorithm, Candidate Keyword
,XML Keyword search, feature selection, diversification process.
primary key and foreign keys. So this system also ranking phases , top k important query
present which comparative techniques used for processing is discussed. Different data models
keyword search like DISCOVER, BANKS, such as XML , graph-structured data is discussed.
BLINKS, EASE, and SPARK. Existing techniques Application of these concepts is also discussed in
for information retrieval on real world databases which keyword based search is having prime
and also experimental result indicate that existing importance. In this paper some problems like
search techniques are not capable of real world Diverse Data Models, Query Forms: Complexity
information retrieval and data mining task. Data versus Expressive Power , Search Quality
mining is finding insights which are statistically Improvement , Evaluation are also discussed [2].
reliable from data, identification of records which
XRANK system is discussed in this paper.
does not match the usual patterns might be
Ranked search technique over XML data is
interesting that require further investigation.
considered here. In this paper space saving and
Association searches for relationships various
performance gaining techniques such as index
attributes like milk and bread along with jam. So
structure and query evaluation are also focused.
providing a good discount on combination can
XRANK can help in searching for HTML as well
enhance the sales. Process of grouping together
as XML documents. Disadvantage: For instance,
values in the data that have similar patterns but
authors have currently taken a document-centric
these patterns are not known in advance.
view, where they assume that query results are
Analysing the data we make clusters of employee
strictly hierarchical. Index maintenance is major
who reach the target more than ten times per
problem for effective search and which is
week and other who make less than 10
bottleneck area [3].
transactions. It is the process of grouping the data
into different classed on the basis of previously In this SLCA-based keyword search approach is
known structures. For example we make discussed. Queries called the Multiway - SLCA
classification for example student percentage approach (MS) is helpful to promote the keyword
above 70% as distinction, between 60 to 70% search beyond and old methods like AND / OR.
percentage first class and below 60% average. After LCA analysis improved algorithms are put
Regression attempts to find a function which to solve search problems based on keywords [4].
models the data with the least error fits the data
onto the function so that one value can be In this Indexed Lookup Eager and Scan Eager,
derived from another. algorithms are discussed. XML search based on
keyword according to SLCA semantics is prime
II. LITERATURE SURVEY topic of discussion and for this these algorithm
are used. Instant search result is the beauty of
In this by considering the keyword and its
theses algorithm. XKSearch architecture
relevant context in XML data , searching should
implementation is discussed in it. The XKSearch
be done using automatically diversification
system inputs a list of keywords and returns the
process of XML keyword search is the major area
set of Smallest Lowest Common Ancestor nodes
of concern [1].
[5].
In this for structured and semi-structured data,
Query and information relevance is calculated so
various state-of-the-art techniques are discussed
that unnecessary checks are avoided and effective
for keyword search. In this query optimization ,
search is achieved. Hence effective text retrieval
and summarization is achieved. The Maximal query logs are analyzed in this paper from search
Marginal Relevance (MMR) achieves the engine [10].
stopping of redundancy. This approach provides
In this single swap and multi swap algorithms
very much relevant data in terms of search result
are used in this paper. On structured data
to the end user by effectively minimizing the
differentiation of search results is carried out.
redundancy [6].
Degree of difference is quantified so that it
In this paper Risk of dissatisfaction of user is represents the accuracy of search result. Features
major area of concern. To minimize it systematic from the search result are traced and this result is
approach to diversifying results is discussed in it. prominently considered in calculation [11].
For this several techniques such as NDCG, MRR,
In this by considering query result and its
and MAP are discussed in detail in it. A Greedy
redundancy, new scheme named re-ranking
Algorithm for Diversification used in it. Among
query interpretations is discussed to diversify the
the search result user should find most relevant
search result. For sub-topics and relevance new
data is the aim of diversification. Also another
proposed technique such as propose -n DCG-W
aim of this paper is to minimize the rank of best
and WS-recall is promoted in it. Algorithm
fitted result [7].
named as Diversification algorithm is used in it.
This paper also uses greedy approach. Different For database query search query similar measure
datasets are considered in this to get approach and greedy algorithm is used to obtain
tested thoroughly and relevant document in diversified query interpretation and its relevance
terms of search result is expected as search result [12].
[8].
III.METHODOLOGY
In this using test collection based on TREC
question answering track this paper discussed
Data Mining Search Engine: Search Engine
the framework which achieves novelty and Optimization is the procedure of improving the
become more complex; time consuming and documents available all over. Crawler based
difficult. Web not only contains static data but search engines are those that use automated
also data that requires timely updating such as software read the information on the actual
news, stock markets, live channels etc. People website.
from different communities have different
backgrounds and use internet for different usage
IV.SYSTEM STUDY
purposes. Many have different interests and lack EXISTING SYSTEM:
knowledge of internet usage. Hence user gets lost
within huge amount of data. A given user The problem of diversifying
generally focuses on only a tiny portion of the
keyword search is firstly studied in
Web, dismissing the rest as uninteresting data
that serves only to swamp the desired search IR community. Most of them
V. SYSTEM ARCHITECTURE:
duplicated results by comparing the generated [5] F. Radlinski and S. T. Dumais, Improving
results may cover multiple. personalized web search using result
diversification, in Proc. SIGIR, 2006, pp. 691
VI. CONCLUSION 692.