0% found this document useful (0 votes)

115 views8 pages

What Is Centralized Database?

Distributed databases store data across multiple physical locations controlled by a central database management system (DBMS). Centralized databases store all data in a single location. Maintaining consistency across distributed databases is more complex than centralized databases due to replication and duplication processes required. Distributed databases avoid bottlenecks but require additional software for maintenance while centralized databases are easier to maintain and update.

Uploaded by

Kanchan Mujumdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views8 pages

What Is Centralized Database?

Uploaded by

Kanchan Mujumdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Distributed Database vs Centralized Database Centralized database is a database in which data is stored and maintained in a single location.

This is the traditional approach for storing data in large enterprises. Distributed database is a database in which data is stored in storage devices that are not located in the same physical location but the database is controlled using a central Database Management System (DBMS). What is Centralized Database? In a centralized database, all the data of an organization is stored in a single place such as a mainframe computer or a server. Users in remote locations access the data through the Wide Area Network (WAN) using the application programs provided to access the data. The centralized database (the mainframe or the server) should be able to satisfy all the requests coming to the system, therefore could easily become a bottleneck. But since all the data reside in a single place it easier to maintain and back up data. Furthermore, it is easier to maintain data integrity, because once data is stored in a centralized database, outdated data is no longer available in other places. What is Distributed Database? In a distributed database, the data is stored in storage devices that are located in different physical locations. They are not attached to a common CPU but the database is controlled by a central DBMS. Users access the data in a distributed database by accessing the WAN. To keep a distributed database up to date, it uses the replication and duplication processes. The replication process identifies changes in the distributed database and applies those changes to make sure that all the distributed databases look the same. Depending on the number of distributed databases, this process could become very complex and time consuming. The duplication process identifies one database as a master database and duplicates that database. This process is not complicated as the replication process but makes sure that all the distributed databases have the same data. What is the difference between Distributed Database and Centralized Database? While a centralized database keeps its data in storage devices that are in a single location connected to a single CPU, a distributed database system keeps its data in storage devices that are possibly located in different geographical locations and managed using a central DBMS. A centralized database is easier to maintain and keep updated since all the data are stored in a single location. Furthermore, it is easier to maintain data integrity and avoid the requirement for data duplication. But, all the requests coming to access data are processed by a single entity such as a single mainframe, and therefore it could easily become a bottleneck. But with distributed databases, this bottleneck can be avoided since the databases are parallelized making the load balanced between several servers. But keeping the data up to date in distributed database system requires additional work, therefore increases the cost of maintenance and complexity and also requires additional software for this purpose. Furthermore, designing databases for a distributed database is more complex than the same for a centralized database.

OLTP Current data Short database transactions Online update/insert/delete Normalization is promoted High volume transactions Transaction recovery is necessary

OLAP Current and historical data Long database transactions Batch update/insert/delete Denormalization is promoted Low volume transactions Transaction recovery is not necessary

OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments

and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). - OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multidimensional schemas (usually star schema). The following table summarizes the major differences between OLTP and OLAP system design.

OLTP System Online Transaction Processing (Operational System)

Source of data Operational data; OLTPs are the original source of the data. Purpose of data What the data Inserts and Updates Queries To control and run fundamental business tasks Reveals a snapshot of ongoing business processes Short and fast inserts and updates initiated by end users Relatively standardized and simple queries Returning relatively few records

OLAP System Online Analytical Processing (Data Warehouse)

Consolidation data; OLAP data comes from the various OLTP Databases To help with planning, problem solving, and decision support Multi-dimensional views of various kinds of business activities Periodic long-running batch jobs refresh the data Often complex queries involving aggregations Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Typically de-normalized with fewer tables; use of star and/or snowflake schemas Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method

Processing Speed

Typically very fast

Space Requirements DatabaseDesign

Can be relatively small if historical data is archived

Highly normalized with many tables Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability

Backup and Recovery

source: www.rainmakerworks.com

Semi Join vs Bloom Join Semi join and Bloom join are two joining methods used in query processing for distributed databases. When processing queries in distributed databases, data needs to be transferred between databases located in different sites. This could be an expensive operation depending on the amount of data that needs to be transferred. Therefore, when processing queries in a distributed database environment, it is important to optimize the queries to minimize the amount of data transferred between sites. Semi join and bloom join are two methods that can be used to reduce the amount of data transfer and perform efficient query processing. What is Semi Join? Semi join is a method used for efficient query processing in a distributed database environments. Consider a situation where an Employee database (holding information such as employees name, department number she is working for, etc) located at site 1 and a Department database (holding information such as department number, department name, location, etc) located at site 2. For example if we want to obtain the employee name and department name that she is working for (only of departments located in New York), by executing a query at a query processor located at site 3, there are several ways that data could be transferred between the three sites to achieve this task. But when transferring data, it is important to note that it is not necessary to transfer the whole database between the sites. Only some of the attributes (or tuples) that are required for the join need to be transferred between the sites to execute the query efficiently. Semi join is a method that can be used to reduce the amount of data shipped between the sites. In semi join, only the join column is transferred from one site to the other and then that transferred column is used to reduce the size of the shipped relations between the other sites. For the above example, you can just transfer the department number and department name of tuples with location=New York from site 2 to site 1 and perform the joining at site 1 and transfer the final relation back to site 3. What is Bloom Join? As mentioned earlier, bloom join is another method used to avoid transferring unnecessary data between sites when executing queries in a distributed database environments. In bloom join, rather than transferring the join column itself, a compact representation of the join column is transferred between the sites. Bloom join uses a bloom filter which employs a bit vector to execute membership queries. Firstly, a bloom filter is built using the join column and it is transferred between the sites and then the joining operations are performed. What is the difference between Semi Join and Bloom Join? Even though both semi join and bloom join methods are used to minimize the amount of data transferred between the sites when executing queries in a distributed database environment, bloom join reduces the amount of data (number of tuples) transferred compared to semi join by utilizing the concept of bloom filters, which employ a bit vector to determine set memberships. Therefore using bloom join will be more efficient than using semi join.

Supervised vs. unsupervised learning

From a theoretical point of view, supervised and unsupervised learning differ only in the causal structure of the model. In supervised learning, the model defines the effect one set of observations, called inputs, has on another set of observations, called outputs. In other words, the inputs are assumed to be at the beginning and outputs at the end of the causal chain. The models can include mediating variables between the inputs and outputs. In unsupervised learning, all the observations are assumed to be caused by latent variables, that is, the observations are assumed to be at the end of the causal chain. In practice, models for supervised learning often leave the probability for inputs undefined. This model is not needed as long as the inputs are available, but if some of the input values are missing, it is not possible to infer anything about the outputs. If the inputs are also modelled, then missing inputs cause no problem since they can be considered latent variables as in unsupervised learning.

Figure 2: The causal structure of (a) supervised and (b) unsupervised learning. In supervised learning, one set of observations, called inputs, is assumed to be the cause of another set of observations, called outputs, while in unsupervised learning all observations are assumed to be caused by a set of latent variables.

Figure 2 illustrates the difference in the causal structure of supervised and unsupervised learning. It is also possible to have a mixture of the two, where both input observations and latent variables are assumed to have caused the output observations. With unsupervised learning it is possible to learn larger and more complex models than with supervised learning. This is because in supervised learning one is trying to find the connection between two sets of observations. The difficulty of the learning task increases exponentially in the number of steps between the two sets and that is why supervised learning cannot, in practice, learn models with deep hierarchies. In unsupervised learning, the learning can proceed hierarchically from the observations into ever more abstract levels of representation. Each additional hierarchy needs to learn only one step and therefore the learning time increases (approximately) linearly in the number of levels in the model hierarchy. If the causal relation between the input and output observations is complex -- in a sense there is a large causal gap -- it is often easier to bridge the gap using unsupervised learning instead of supervised learning. This is depicted in figure 3. Instead of finding the causal pathway from inputs to outputs, one starts building the model upwards from both sets of observations in the hope that in higher levels of abstraction the gap is easier to bridge. Notice also that the input and output observations are in symmetrical positions in the model.

Figure 3: Unsupervised learning can be used for bridging the causal gap between input and output observations. The latent variables in the higher levels of abstraction are the causes for both sets of observations and mediate the dependence between inputs and outputs.

RDBMS vs ORDBMS A Relational Database Management System (RDBMS) is a Database Management System (DBMS) that is based on the relational model. Most popular DBMSs currently in use are RDMSs. Object-Relational database (ORDBMS) is also a DBMS that extends RDBMS to support a broader class of applications and attempts to create a bridge between relational and object-oriented paradigms. As mentioned, earlier RDBMS is based on the relational model and data in a RDMS are stored in the form of related tables. So, a relational database can simply be seen as a collection of one or more relations or tables with columns and rows. Each column corresponds to an attribute of the relation and each row corresponds to a record that consists of data values for an entity. RDMSs are developed by extending hierarchical and the network models, which were two previous database systems. Main elements of a RDMS are the concepts of relational integrity and normalization. These concepts are based on the 13 rules for a relational system developed by Ted Codd. Following three important fundamentals should be followed by a RDMS. Firstly, all information must be held in the form of a table. Secondly, each value found in the table columns should not repeat and finally the use of Standard Query Language (SQL). The biggest advantage of RDBMSs is its easiness for users to create access and extend data. After a database is created, user can add new data categories to the database without changing the existing application. There are some notable limitations in RDBMSs also. One limitation is that their lack of efficiency when working with languages other than SQL and also the fact that all the information must be in tables where relationships between entities are defined by values. Further, RDMSs do not have enough storage area to handle data such as images, digital audio and video. Currently most of the dominant DBMSs such as IBMs DB2 family, Oracle, Microsofts Access and SQL Server are actually RDMS. As mentioned earlier ORDBMS provides a middle ground between RDMS and object-oriented databases (OODBMS). You can simply say that ORDBMS puts an object oriented front end on a RDBMS. When an application communicates with an ORDBMS it will normally act as though the data is stored as objects.

Then the ORDBMS will convert the object information into data tables with rows and columns and handle the data as it were stored in a RDBMS. Further, when the data is retrieved, it will return a complex object created by reassembling the simple data. Biggest advantage of ORDBMS is that it provides methods to convert data between RDBMS format and OODBMS format, so that the programmer does not need to write code to convert between the two formats and the database access is easy from an object oriented language. Even though RDBMS and ORDBMS are both DBMSs, they differ in how they interact with applications. Applications using RDBMS has to do extra work when storing complex data while ORDBMS inherently provide support for this. But due to the internal conversion between data formats, performance of ORDBMSs can be degraded. Therefore choosing one over the other is dependent on the data that needs to be stored/ managed.

What Is Centralized Database?

Uploaded by

What Is Centralized Database?

Uploaded by

Distributed Database vs Centralized Database Centralized database is a database in which data is stored and maintained in a single location.

OLTP vs. OLAP

OLTP System Online Transaction Processing (Operational System)

OLAP System Online Analytical Processing (Data Warehouse)

Typically very fast

Space Requirements DatabaseDesign

Can be relatively small if historical data is archived

Backup and Recovery

Supervised vs. unsupervised learning

Read more: http://www.differencebetween.com/difference-between-rdbms-and-vs-ordbms/#ixzz1re3cIyub

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.