Term Paper: Anaging Database
Term Paper: Anaging Database
MANAGING DATABASE
SUBMITTED TO:
SUBMITTED BY:
Respected ANKUR
SINGH
Gargi mam
RE3801A29
CAP 200
Problem statement
To achieve this goal, the databases should allow for the addition of
extendible data types.When new data types exist in a DBMS, new
operators for these types may be needed. For example, if a DBMS is
extended with the data type “box”, a user may want to issue a query
to find all boxes that overlap one another. Therefore, an “overlap”
operator is appropriate for this cause. In addition to extensible
operators, built-in access methods for native data types using
existing data structures (e.g. B-trees, hash tables) may not be
suitable to store the user-defined data types. For example, in
Geographic Information Systems (GIS) that require data types such
as
regions and lines, queries that use intersection and existence
operators cannot use B-
Trees as an efficient or useful access method. In this situation, it may
be appropriate to use an R-tree or KBD tree data structures. When
extensible data types use these new data structures in their access
methods, the problem of query optimization comes into play.
Therefore, a DBMS that allows the extension of data types should
also pass relevant performance information to the query optimizer.
• Given:
o A core DBMS with built-in data types, operators, access
methods, and a query plan optimizer
• Find:
o A framework for adding user-defined data types; along with
relevant operators
o access methods, and statistic estimation techniques for
query plan optimization
• Objective:
o Minimize the amount of work for implementing new data
types
o Possibility of re-using existing data structures (e.g., B-Tree)
in access methods for user-defined data types
• Constraints:
o Possible safety loopholes when implementing new access
methods
o Performance (e.g., of transaction management, query plan)
of DBMS using new data types
Major contributions
• Access methods: the author describes how new access paths can
be implemented to efficiently support extensible data types.
Motivation
Key Concepts
In this example, length is a fixed amount of space that the data type
will occupy, while the input and output properties define routines that
will convert the data type to and from character strings for storage.
Operator Definition
As a space requirement, we assume that the reader has a basic
understanding of an
operator in relation to a DBMS. Such operators could be any of the set
{=, <, >}. To define an operator for a user-defined type, the
method described in the paper involves a similar structure to the
type definitions.
Here, the operator definition encompasses both right and left operand
types, along with precedence level if multiple operators exist. The
file attribute stores the procedure that performs the operator logic.
Access Methods
Access methods are the routines for managing access to disk-based
data structures
supported by the system. An example of such a data structure is a
B+-Tree. In a B+
tree, all data is saved at the leaf level, while the internal nodes only
contain search keys and tree pointers. The leaf nodes are also stored
as a linked list, making range queries easy .
Image courtesy
A B+-Tree would work very well in this case since the operator (OPR) is
‘<=’. The access
method would start at the root node and follow the leftmost pointer to
the node pointing to data values d1, d2, and d3. A B+ Tree works well
for the integer data type. However, if the extended data type is a box,
the access methods may require a different data structure, such as an
R-Tree that is more suited for spatial data. To extend access methods,
the paper defines access method templates. Each template defines an
access method, along with the operator information necessary to
implement that access method. The paper gives an example of a
template for a B- Tree.
In this template, only the <= operator is required (reading from the
opt column, it is the only value of “req”) since it is the only operator
necessary to implement a B-Tree. Other columns in this template
define the left and right operands as well as the result for a given
operator.Along with this template, an access method table must also
be in place, which defines a collection of operators that satisfy the
template. This table also contains values that the query processor
may use to estimate the number of tuples that satisfy the operator
qualification, and the number of pages touched when using the
operator to compare a key field to a constant. The paper gives an
example of such a table in the context of regular integer operators
for a B-Tree, along
with “box” operators (AE – area equal, AL – area less-than, AG – area
greater-than) that are used in a B-Tree access method.
In this case, both the box (defined as the area-op class) and
integer (defined as the int-ops class) operators are defined for use
with a B-Tree. The paper also defines a “using class” clause to
change a relation to use a particular access method. For instance,
if a user wanted a relationstoring “box” information to use the
operators AE, AL, and AG within the B-Tree access method,they
would issue the command:
Query Optimization
Query optimization is a function of many database
management systems that examinesmultiple query plans
for satisfying a particular query. Most optimizers consider
statistics when analyzing query plans. The statistical
categories are usually in the area of CPU cost and disk
storage service time. The optimizer also examines
different query paths by looking at the indexes available
and relational table join techniques to choose an optimal
query path. As a simple example, consider the query
Select employee.name
From employee
Where employee.level = 5
In this case, they query optimizer will want to find the cheapest
way to find all employees with the level of 5. The query could
scan all tuples in the employee relation to find the employees
with level equal to 5. However, if an index exists on the
employee level column, the number of operations will be
greatly reduced as the query can use this index to scan only a
subset of employee records (i.e., employees with level 5). In
the case of join ordering, consider three tables A, B, and C that
must be joined to satisfy a query. Table A contains 50 records,
while B and C contain 400,000 records. The job of the query
optimizer is to find the optimal join order and join method
which will optimize the query performance. In this case, if
table B is first joined with table C, then the result is joined with
table A, this plan can take several orders of magnitude more
than a plan that first joins tables A and C [5]. Also, if hash join
is a feasible strategy for joining A and C, the optimizer may
choose this option over a nested-loop join. In this case, hash-
join is
appealing since table A is small enough to fit in memory,
resulting in a one-pass join algorithm.
Stups:
o estimation of the number of records satisfying the
clause Where rel- name.field-name OPR value
Selectivity factor S: the expected number of records
which satisfies the clause:
o Where relname-1.field-1 OPR relname-2.field-2
o Whether merge-sort is feasible for the operator
o Whether hash-join is a feasible joining strategy for this
operator
Assumptions
Rewrite
In general, this paper is very well organized and its ideas are
presented in a succinct manner. If we were to rewrite the paper
today, we would focus on improving the following points:
.
References
[3] “B+ Trees.” Wikipedia, The Free Encyclopedia. 17 Sep 2006, 10:55
UTC. Wikimedia
Foundation, Inc. 10 Aug 2004 < http://en.wikipedia.org/wiki/B
%2B_tree>.