0% found this document useful (0 votes)
44 views15 pages

Term Paper: Anaging Database

This document discusses including new data types in relational database systems. It proposes a framework that allows for defining new data types, operators for those types, access methods to efficiently store and query the new types, and integrating the new types and operators into the query optimizer. The framework addresses limitations of only supporting built-in types by enabling applications in other domains like engineering and science to leverage database systems. It presents defining new types and operators using a simple syntax, templates for extending access methods, and accounting for new types and operators in query planning. The goal is to minimize work needed to implement new types while reusing existing structures when possible.

Uploaded by

Ankur Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views15 pages

Term Paper: Anaging Database

This document discusses including new data types in relational database systems. It proposes a framework that allows for defining new data types, operators for those types, access methods to efficiently store and query the new types, and integrating the new types and operators into the query optimizer. The framework addresses limitations of only supporting built-in types by enabling applications in other domains like engineering and science to leverage database systems. It presents defining new types and operators using a simple syntax, templates for extending access methods, and accounting for new types and operators in query planning. The goal is to minimize work needed to implement new types while reusing existing structures when possible.

Uploaded by

Ankur Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

Term Paper

MANAGING DATABASE
SUBMITTED TO:
SUBMITTED BY:

Respected ANKUR
SINGH
Gargi mam
RE3801A29

CAP 200

Inclusion of New Types in Relational Data


Base Systems

Problem statement

The needs of business processing applications were the impetus for


many of the built-in data types (e.g. floating point, money, date, etc.)
and operators (e.g. +, -, etc.) found in commercial database
management systems. However, these built-in types are of little use
for a wider range of applications in areas such as engineering and
scientific research. Applications used for scientific research, for
example, require a database to store large complex structures and
have the ability to make efficient queries on this data. Geographic
applications usually require data types such as points, lines, and
polygons. Other current examples include storage of images
and other multimedia data. Thus, a database management system
needs to have extendible data types to serve a wider community of
users and applications that use these systems.

To achieve this goal, the databases should allow for the addition of
extendible data types.When new data types exist in a DBMS, new
operators for these types may be needed. For example, if a DBMS is
extended with the data type “box”, a user may want to issue a query
to find all boxes that overlap one another. Therefore, an “overlap”
operator is appropriate for this cause. In addition to extensible
operators, built-in access methods for native data types using
existing data structures (e.g. B-trees, hash tables) may not be
suitable to store the user-defined data types. For example, in
Geographic Information Systems (GIS) that require data types such
as
regions and lines, queries that use intersection and existence
operators cannot use B-
Trees as an efficient or useful access method. In this situation, it may
be appropriate to use an R-tree or KBD tree data structures. When
extensible data types use these new data structures in their access
methods, the problem of query optimization comes into play.
Therefore, a DBMS that allows the extension of data types should
also pass relevant performance information to the query optimizer.

The query optimizer should be aware of the cost of user-defined


operations, know how to optimize these new operations, and select
the best execution plans. To summarize, a DBMS that allows
extensible data types should provide the following four features:

1) A method for defining new data types


2) A method for defining operators for these new data types
3) A method for implementing access paths for these new data types
4) A method for allowing the query optimizer to process new
commands for new data
types and operators

The formal problem statement this paper addresses is as follows:

• Given:
o A core DBMS with built-in data types, operators, access
methods, and a query plan optimizer

• Find:
o A framework for adding user-defined data types; along with
relevant operators
o access methods, and statistic estimation techniques for
query plan optimization

• Objective:
o Minimize the amount of work for implementing new data
types
o Possibility of re-using existing data structures (e.g., B-Tree)
in access methods for user-defined data types
• Constraints:
o Possible safety loopholes when implementing new access
methods
o Performance (e.g., of transaction management, query plan)
of DBMS using new data types

Major contributions

This paper discusses a complete framework for implementing user-


defined data types.
It presents a solution addressing the four main areas mentioned in the
previous section. To the best of our knowledge, the contributions
presented here encompass the first comprehensive solution for
extendible data types in a relational database management system.
Portions of the framework (namely solutions to points 1 and 2 in the
previous section) come from a previous work by the author [2], but
are present in this paper to provide a complete picture of the
extensible data type solution. The major contributions, therefore,
categorically address the four needs when implementing extensible
data types. Each of these contributions will be discussed in
the next section on key concepts.

• Definition of abstract data types (ADT): the author offers a method


for defining extensible data types within a DBMS

• Definition of ADT operators: the author offers a method for


defining operators for the new extensible data types

• Access methods: the author describes how new access paths can
be implemented to efficiently support extensible data types.

• Query optimization: the author describes how query optimization


takes place inside the DBMS when extensible data types are
present.

Motivation

The needs of business processing applications were the impetus for


many of the built-in data types (e.g. floating point, money, date, etc.)
and operators (e.g. +, -, etc.) found in commercial database
management systems. However, these built-in types are of little use
for a wider range of applications in areas such as engineering and
scientific research. Applications used for scientific research, for
example, require a database to store large complex structures and
have the ability to make efficient queries on this data. Geographic
applications usually require data types such as points, lines, and
polygons. Other current examples include storage of images and other
multimedia data. Thus, a database management system needs to
have extendible data types to serve a wider community of users and
applications that use these systems.

Key Concepts

Data type definition


As a space requirement, we assume that the reader understands the
concept of native
types in relation to a DBMS or programming language. If a database
allows for extendible data types, the method described in this paper
involves a simple syntax to define the data type.

Define type-name length=value,


Input = file-name,
Output = file-name

In this example, length is a fixed amount of space that the data type
will occupy, while the input and output properties define routines that
will convert the data type to and from character strings for storage.

Operator Definition
As a space requirement, we assume that the reader has a basic
understanding of an
operator in relation to a DBMS. Such operators could be any of the set
{=, <, >}. To define an operator for a user-defined type, the
method described in the paper involves a similar structure to the
type definitions.

Define operator token = value,


Left-operand = type-name,
Right-operand = type-name,
Result = type-name,
Precedence-level like operator-2,
File = file name

Here, the operator definition encompasses both right and left operand
types, along with precedence level if multiple operators exist. The
file attribute stores the procedure that performs the operator logic.

Access Methods
Access methods are the routines for managing access to disk-based
data structures
supported by the system. An example of such a data structure is a
B+-Tree. In a B+
tree, all data is saved at the leaf level, while the internal nodes only
contain search keys and tree pointers. The leaf nodes are also stored
as a linked list, making range queries easy .

Image courtesy

The paper describes a method to extend access methods to either re-


use existing datastructures or make use of completely new data
structures depending on the properties of the user-defined data type.
For instance, if a user were to issue the query [4]:

retrieve (target-list) where relation.key <= 3

A B+-Tree would work very well in this case since the operator (OPR) is
‘<=’. The access
method would start at the root node and follow the leftmost pointer to
the node pointing to data values d1, d2, and d3. A B+ Tree works well
for the integer data type. However, if the extended data type is a box,
the access methods may require a different data structure, such as an
R-Tree that is more suited for spatial data. To extend access methods,
the paper defines access method templates. Each template defines an
access method, along with the operator information necessary to
implement that access method. The paper gives an example of a
template for a B- Tree.

In this template, only the <= operator is required (reading from the
opt column, it is the only value of “req”) since it is the only operator
necessary to implement a B-Tree. Other columns in this template
define the left and right operands as well as the result for a given
operator.Along with this template, an access method table must also
be in place, which defines a collection of operators that satisfy the
template. This table also contains values that the query processor
may use to estimate the number of tuples that satisfy the operator
qualification, and the number of pages touched when using the
operator to compare a key field to a constant. The paper gives an
example of such a table in the context of regular integer operators
for a B-Tree, along
with “box” operators (AE – area equal, AL – area less-than, AG – area
greater-than) that are used in a B-Tree access method.

In this case, both the box (defined as the area-op class) and
integer (defined as the int-ops class) operators are defined for use
with a B-Tree. The paper also defines a “using class” clause to
change a relation to use a particular access method. For instance,
if a user wanted a relationstoring “box” information to use the
operators AE, AL, and AG within the B-Tree access method,they
would issue the command:

modify box to B-Tree on desc using area-op

The actual implementation of the access methods come though


implementing procedure calls which will use the access method
information previously defined. Two examples of these procedure
calls are:

Open(relation-name) – returns a pointer to a structure


containing information about the relation Get-first(descriptor,
OPR, value) – return first record which satisfies the “where key
OPR value” clause.

In the case of extensible data types, new access methods may


have to handle tasks such as logging, concurrency control, and
buffer management. In the case of logging, if a DBMS supports
logical logging, then the access methods must implement
REDO and UNDO methods when a log manager rolls forward or
rolls backward log events. In the case of concurrency control,
the access method may have to make use of system calls (e.g.,
read, begin, abort, etc.) to a DBMS scheduler that will in turn
respond with yes/no/abort response for each request. Finally,if
buffer management is a concern for access method designers,
the author suggests that a set of procedures (e.g., get, fix,
unfix, put, order) must be made available so the access
method may perform buffer manipulation.

Query Optimization
Query optimization is a function of many database
management systems that examinesmultiple query plans
for satisfying a particular query. Most optimizers consider
statistics when analyzing query plans. The statistical
categories are usually in the area of CPU cost and disk
storage service time. The optimizer also examines
different query paths by looking at the indexes available
and relational table join techniques to choose an optimal
query path. As a simple example, consider the query

Select employee.name
From employee
Where employee.level = 5

In this case, they query optimizer will want to find the cheapest
way to find all employees with the level of 5. The query could
scan all tuples in the employee relation to find the employees
with level equal to 5. However, if an index exists on the
employee level column, the number of operations will be
greatly reduced as the query can use this index to scan only a
subset of employee records (i.e., employees with level 5). In
the case of join ordering, consider three tables A, B, and C that
must be joined to satisfy a query. Table A contains 50 records,
while B and C contain 400,000 records. The job of the query
optimizer is to find the optimal join order and join method
which will optimize the query performance. In this case, if
table B is first joined with table C, then the result is joined with
table A, this plan can take several orders of magnitude more
than a plan that first joins tables A and C [5]. Also, if hash join
is a feasible strategy for joining A and C, the optimizer may
choose this option over a nested-loop join. In this case, hash-
join is
appealing since table A is small enough to fit in memory,
resulting in a one-pass join algorithm.

When user-defined types and operators are present in a DBMS,


the query optimizer must have a way to estimate the
selectivity and join methods available for tables containing
these new types in order to make decisions as described
above. Otherwise, optimization becomes daunting (if not
impossible) task. This paper proposes that four pieces of
information must be available when defining an extensible data
type operator [4]:

 Stups:
o estimation of the number of records satisfying the
clause Where rel- name.field-name OPR value
 Selectivity factor S: the expected number of records
which satisfies the clause:
o Where relname-1.field-1 OPR relname-2.field-2
o Whether merge-sort is feasible for the operator
o Whether hash-join is a feasible joining strategy for this
operator

With this information in place, the query optimizer has enough


information to produce a more optimal query path than random
selection when a query is issued on user-defined data types.
Validation

The author mainly provides a general framework to add user-


defined types to the database. As mentioned previously, the
methods for defining extensible data types and their operators
were presented in [2] and implemented in the INGRES DBMS at UC-
Berkeley. For the discussion on access paths and query
optimization, the author does not mention if these methods had
been implemented in a DBMS. Therefore, he seems only to be
discussing the vision and rationale of how to implement the
constructs for access methods and query optimization for
extensible data types.

The actual implementation of access methods and query


optimization was probably beyond the scope of this paper.
Therefore, the ideas could not be validated through
experimental evidence. However, in the sections 3 and 4, the
author provides good case studies (through examples) when
discussing his proposals for implementing access methods and
performing query optimization in the context of user-defined
types and operators.

Assumptions

The author discusses performance of extensible data types in the


context of implementation on commercial systems by writing, “An
‘industrial strength’ implementation might choose to specify the
user types which an installation wants at the time the DBMS is
installed” [4]. This is an alternative to dynamically linking user-
defined routines for the extensible data types. While this would
certainly be a performance benefit, the author does not discuss if
this could actually happen in a
commercial setting. It seems that commercial database vendors
would want keep user-defined code away from the native code.

The author also implicitly assumes that creating constructs (i.e.,


data types and operators) types is empirically better than using
built-in data types to model these non-standard types. In other
words, he is assuming that this custom work (coupled with long-
term support) outweighs the problem of query logic complexity (as
presented in section

o when using native data-types.


When discussing the implementation of access methods, the
author
limits his discussion to support for single key fields.
Furthermore, the author also assumes single-dimension access
methods. These two assumptions seem valid given the scope
of the paper, as it discusses a whole framework for extensible
types, operators, access methods,and query optimization.
Making these assumptions allows the author to cover each
topic,rather than covering one particular topic (e.g., access
methods) in-depth while glossing over the other topics.

Rewrite

In general, this paper is very well organized and its ideas are
presented in a succinct manner. If we were to rewrite the paper
today, we would focus on improving the following points:

• Add a discussion on query rewrite in query optimization section.


The author did not discuss query rewrite in the context of user-
defined types

• Actual implementation of access methods in a DBMS (such as


INGRES) may bebeyond the scope of this paper. However, there
could be simulation data and a larger discussion of performance
drawbacks for extensible data types

• Add more discussion on how this proposal, along with the


proposals in provide a complete solution to adding extensible data
types to a DBMS

.
References

[1] Hellerstein, J. and Stonebraker M., “Anatomy of a Database


Sytem.” Readings in Database Sytems, Cambridge, Mass.: MIT Press,
2005. 42-95.

[2] Stonebraker, M. et. al., “Application of Abstract Data Types and


Abstract Indices to CAD Data,” Proc. Engineering Applications Stream
of Database Week/83, San Jose, Ca., May 1983.

[3] “B+ Trees.” Wikipedia, The Free Encyclopedia. 17 Sep 2006, 10:55
UTC. Wikimedia
Foundation, Inc. 10 Aug 2004 < http://en.wikipedia.org/wiki/B
%2B_tree>.

[4] M.Stonebraker, “Inclusion of New Types in Relational Data Base


Systems.”, Proceedings of ICDE, 1986.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy