0% found this document useful (0 votes)
72 views9 pages

Rijul Research Paper

This document summarizes emerging research fields in database management systems. It discusses automatic database administration and tuning, data mining from data streams, information retrieval from both structured and unstructured data, and information extraction from medical reports. The document outlines challenges in modeling and representing changes in data streams, developing mining methods for streams, and interactively exploring changes. It also discusses challenges in data integration like supporting interoperability between different data models and sources, and approaches for modeling source contents and queries like local-as-view and global-as-view.

Uploaded by

Rijul Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views9 pages

Rijul Research Paper

This document summarizes emerging research fields in database management systems. It discusses automatic database administration and tuning, data mining from data streams, information retrieval from both structured and unstructured data, and information extraction from medical reports. The document outlines challenges in modeling and representing changes in data streams, developing mining methods for streams, and interactively exploring changes. It also discusses challenges in data integration like supporting interoperability between different data models and sources, and approaches for modeling source contents and queries like local-as-view and global-as-view.

Uploaded by

Rijul Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Emerging research fields across database

management systems
Rijul Chauhan
Department of Computer Science Engineering
Delhi Technological University, New Delhi, India
rijulchauhan4@gmail.com

Abstract
The database community is exploring administration. There are many research
continuously multidisciplinary research fields so problems unsolved in this area. First, very
general-purpose databases can now support little work has been done in automatically
multiple data models, extend capabilities such tuning system parameters and it is
as spatial and graph, and support data challenging to predict the system
virtualization, distributed storage, and in-memory performance after changing such
storage. These new research avenues become parameters. Second, very little is known on
more evident after studying the papers written how to adjust the systems with the change
by various doctoral students and professors in workload. Third, knowing various features
from various universities worldwide. This paper to tune, it remains challenging to identify
surveys the emerging fields published by various systems bottleneck.
universities and professors. 1.2 Data Mining Changes from Data
Streams
The science of discovering meaningful
1.SURVEY OF TOPICS knowledge in data, has always been a core
area of research. A large number of
Our survey divides the topic into different emerging applications such as network
research domains. We start with automatic flowing analysis, e-business and stock
database administration and tuning followed by market online analysis, have to handle
data mining changes from data streams and various data streams. It is demanding to
information retrieval and extraction. conduct advanced analysis and data mining
over fast and large data streams to capture
1.1 Automatic Database Administration the trends, patterns, and exceptions.
and Tuning Recently, some interesting results have
Today’s database systems all have been reported for modelling and handling
numerous features, making it very difficult to data streams such as monitoring statistics
choose these features toward the need of over streams and query answering. Previous
the specific applications using them. For studies (e.g., [21, 29]) argue that mining
example building indexes and data streams is challenging in the following
performance on a given query workload, but two respects. On the one hand, random
it is often very difficult to select the access to fast and large data streams may
necessary indexes and views because such be impossible. Thus, multi-pass algorithms
decisions depends on how these queries are (i.e., ones that load data items into main
executed. As the cost of hardware has memory multiple times) are often infeasible.
dropped dramatically, thus the cost for Another problem in the area of data mining
human to tune and manage the database is frequent subgraph mining. Interesting
systems often dominates the cost of research problems on mining changes in
ownership. To reduce such cost, it is data streams can be divided into three
desirable to automate database tuning and
categories: modelling and representation of
changes, mining methods, and interactive 1.4 Information extraction
exploration of changes. The solutions Information extraction, in its widest sense, is
toward these problems are outlined, the extraction of structured data from
together with the preliminary experimental unstructured data. One domain where large
validation, focusing on query optimizations corpora of unstructured text would
and time complexity. To the best of our particularly benefit from information
knowledge, the above problems have not extraction is the medical domain. A medical
been researched systematically so far. By report is a natural language description of
no means is the above list complete. We diagnoses, treatments or medications,
believe that thorough studies on these together with structured information about
issues will bring about many challenges, the patient. The goal is to extract a
opportunities, and benefits to stream data chronology of events from the reports. Such
processing, management, and analysis. chronologies can then be used to review a
1.3 Information retrieval patient’s history or to gather statistical data
Recent years have seen an opening up of about the effectiveness or consequences of
the border between the database research medical treatments. The task is challenging,
and information retrieval. Storing data and because the reports use medical jargon and
querying data go hand in hand, and this colloquial temporal expressions (“two days
applies to both structured data and ago”). The paper conducts two initial case
unstructured data. In many scenarios, studies: In the first, machine learning on
queries can be classified into “lookup medical reports is used to determine
searches” and “exploratory searches”. In whether patients qualify for Leukemia trials.
lookup searches, users “look up” details on In the second, a bio-specimen repository is
topics known to them; in exploratory augmented with data from medical reports.
searchers they “explore” new information. This additional data facilitates the
One area of Information Retrieval that is classification of tissue probes and also
gaining more and more attention lately is the information retrieval on the specimen
exploitation of the deep Web. Tjin-Kam-Jet database. Intensive research has been
proposes to address this challenge in a conducted on challenges that arise in data
distributed environment. Given the fact that integration. The first challenge is how to
the deep Web is up to two orders of support interoperability of sources, which
magnitude larger than the surface Web, they have different data models (relational, XML,
argue that distribution might be the key to etc.), schemas, data representations, and
scalability. This work proposes to querying interfaces. Wrapper techniques
automatically convert free-text queries to have been developed to solve these issues.
structured queries for complex web forms in The second challenge is how to model
order to make the deep web more easily source contents and user queries, and two
searchable. Challenges include developing approaches have been widely adopted. In
a formal query description syntax, the local-as-view (LAV) approach, a
translating queries with correct collection of global predicates are used to
interpretation, bridging the gap between describe source contents as views and
user expectations and system capabilities, formulate user queries. Given a user query,
adapting query description for resource the mediation system decides how to
selection, ranking top k resources, merging answer the query by synthesizing source
results from resources to maximize views, called answering queries using views.
precision, recall and suggestion ranking for Many techniques have been developed to
users with respect to resources. He aims to solve this problem and these techniques can
evaluate this solution with a prototype also be used in other database applications
system and user studies, criteria being such as data warehousing and query
processing time and user satisfaction. optimization. Another approach to data
integration, called the global-as-view (GAV),
assumes that user queries are posed network can be straightforwardly
directly on global views that are defined on constructed: each observed variable is
source relations. In this approach, a query represented as a “node” in the network, and
plan can be generated using a view- any pair of nodes with a “1” in the adjacency
expansion process. Researchers mainly matrix is given an “edge” or connection
focus on efficient query processing in this between them. Note that the choice of
case. The third challenge is how to process threshold is a controversial one and could
and optimize queries when sources have have a significant effect on the structure of
limited query capabilities. For instance, the the resultant network.81 The choice of
Amazon.com source can be viewed as a threshold may depend on several factors:
database that provides book information. the size of the sample from which the data
However, we cannot easily download all its was drawn, the choice of type I error rate,
books. Instead, we can query the source by the density of the resulting network, and the
filling out Web search forms and retrieving domain from which the data was drawn.
the results. Studies have been conducted on Network metrics should ideally be applied
how model and compute source capabilities, across a range of thresholds to demonstrate
how to generate plans for queries, and how the result is not based on an arbitrary
to optimize queries in the presence of limited threshold determination. Fortunately, most
capabilities. network scientists are sensitive to this issue,
1.5 Network Analysis and many networks have been observed to
Network analysis is a set of techniques have robust community structure across a
derived from network theory, which has range of thresholds. Network
evolved from computer science to Analysis in GIS is based on the
demonstrate the power of social network mathematical sub-disciplines of graph theory
influences. Using network analysis in and topology. Any network consists of a set
domain analysis can add another layer of of connected vertices and edges. Graph
methodological triangulation by providing a theory describes, measures, and compares
different way to read and interpret the same graphs or networks. Topological properties
data. The use of network analysis in of networks are: connectivity, adjacency,
knowledge organization domain analysis is and incidence. These properties serve as a
recent and is just evolving. The visualization basis for analysis. A simple example of a
technique involves mapping relationships network in GIS can be streets, power lines,
among entities based on the symmetry or or city centerlines.
asymmetry of their relative proximity. 1.6 Role of Database management
Network analysis can also be illustrated in a system in GIS
series of steps: choosing a threshold, Since the early ‘90, Geographical
applying the threshold to a correlation matrix Information System (GIS) has become a
to produce an adjacency matrix, and sophisticated system for maintaining and
producing the network from the adjacency analysing spatial and thematic information
matrix. Like factor analysis, network analysis on spatial objects. DBMSs are increasingly
can begin with a correlation matrix of important in GIS, since DBMSs are
associations among a set of observed traditionally used to handle large volumes of
variables.In the first and second step, a data and to ensure the logical consistency
threshold is chosen and applied to “binarize” and integrity of data, which also have
or “dichotomize” the correlation matrix, become major requirements in GIS. Today
creating an adjacency matrix. Correlations spatial data is mostly part of a complete
with an absolute value above the threshold work and information process. In many
are given a “1” and those below are given a organisations there is a need to implement
“0.” (The binarization process is optional; an GIS functionality as part of a central
alternative, although a computationally more Database Management System (DBMS), at
complex option, is to construct a weighted least at the conceptual level, in which spatial
network). From the adjacency matrix, a data and alphanumerical data are
maintained in one integrated environment.
Consequently DBMS occupies a central
place in the new generation GIS
architecture. A lot of progress is observed in
the management of spatial and non-spatial
information for objects in one integrated
DBMS environment, called a geo-DBMS.
The OpenGeospatial Consortium (OGC)
largely contributed to this progress. The
OpenGeospatial Consortium adopted the
ISO 19107 international standard (ISO
2001) as Topic 1 of the Abstract
Specifications: Feature Geometry99. These
Abstract Specifications provide conceptual
schemas for describing the spatial
characteristics of spatial objects (geographic
Cardcop launches GIS data management tools
features, in OGC terms) and a set of spatial
operations consistent with these schemas
and with vector geometry and topology up to
three dimensions embedded in 3D space.
According to the specifications, the spatial
object is represented by two structures, i.e.,
geometrical structure, i.e., simple feature,
and topological structure, i.e., complex
feature. While the geometrical structure
provides direct access to the coordinates of
individual objects, the topological structure
encapsulates information about their spatial
relationships. Currently, no 3D primitive is
implemented. However, most DBMSs,
including Oracle, Postgres, IBM, Ingres,
Informix, support the storage of simple
features in 3D space. In general, it is
possible to store for example a polygon in
3D. 3D volumetric objects can be stored in a
geometrical model as polyhedrons using 3D
polygons, i.e., a body with flat faces, in two Topological structure in the spatial DBMS of the Netherlands
ways: as a set of polygons or as
multipolygon.
 
Techniques, Implementations and
Applications. MIT Press, June 1999.
[9] K. S. Candan, W.-S. Li, Q. Luo, W.-P.
Hsiung, and D. Agrawal. Enabling
2. ACKNOWLEDGEMENT dynamic content caching for
Author wishes to sincerely acknowledge the databasedriven web sites. In Proc. of
guidance and support of mentors, Ms. Ayushi the 2001 ACM SIGMOD Intl. Conf. on
Vijhani and Professor KC Tiwari, Management of Data, Santa Barbara,
Multidisciplinary Centre for Geoinformatics, California, USA, May 2001.
Delhi Technological University, New Delhi. [10] S. Saltenis, C. Jensen, S. Leutenegger,
and M. Lopez. Indexing the positions of
3. REFERENCES continuously moving objects. SIGMOD,
2000.
[1] A. Rajaraman, Y. Sagiv, and J. D. [11] M. Hadjieleftheriou, G. Kollios, and V.
Ullman. Answering queries using Tsotras. Performance evaluation of
templates with binding patterns. spatio-temporal selectivity techniques.
[2] Surajit Chaudhuri and Vivek R. SSDBM, 2003.
Narasayya. Index merging. In [12] J. Gehrke, F. Korn, and D. Srivastava.
ICDE’1999, pages 296–303, 1999. On computing correlated aggregates
[3] ] P. Domingos and G. Hulten. Mining over continuous data streams. In Proc.
high-speed data streams. In Proc. 2000 2001 ACM-SIGMOD Int. Conf.
ACM SIGKDD Int. Conf. Knowledge Management of Data (SIGMOD’01),
Discovery in Databases (KDD’00), pages 13–24, Santa Barbara, CA, May
pages 71–80, Boston, MA, Aug. 2000. 2001.
[4] D. Zhang, A. Markowetz, V. J. Tsotras, [13] M. Vazirgiannis, Y. Theodoridis, and T.
D. Gunopulos, and B. Seeger. Efficient Sellis. Spatio-temporal composition and
computation of temporal aggregates indexing for large multimedia
with range predicates. In ACM applications. Multimedia Systems,
International Symposium on Principles 6(4):284–298, 1998.
of Database Systems (PODS), 2001. [14] D. Quass, A. Gupta, I. S. Mumick, and J.
[5] D. Zhang, D. Gunopulos, V. J. Tsotras, Widom. Making views self-maintainable
and B. Seeger. Temporal aggregation for data warehousing. In Proc. of the
over data streams using multiple 1996 Intl. Conf. on Parallel and
granularities. In Proceedings of Distributed Information Systems, pages
International Conference on Extending 158–169, December 1996.
Database Technology (EDBT), 2002. [15]https://www.google.com
[6] P. B. Gibbons and Y. Matias. Synopsis
[16]https://www.ieee.org/conferences/pu
data structures for massive data sets.
DIMACS Series in Discrete Mathematics blishing/templates.html
and Theoretical Computer Science:
Special Issue on External Memory
Algorithms and Visualization, A:39–70,
1999.
[7] A. Nica and A. S. Varde, editors.
Proceedings of the Third Ph.D.
Workshop in CIKM, PIKM 2010,
Nineteenth ACM Conference on
Information and Knowledge
Management, CIKM 2010, Toronto,
Canada, Oct. 2010. ACM.
[8] Ashish Gupta and Inderpal Singh
Mumick, editors. Materialized Views:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy