0% found this document useful (0 votes)
36 views10 pages

DBT 1

Unit 1 discusses database concepts including: 1. Entity-relationship diagrams and relational algebra operations like select, project, and join. 2. Functional dependencies and Armstrong's rules for inferring dependencies. Normalization aims to eliminate transitive dependencies. 3. Database storage includes memory, disks, RAID configurations for fault tolerance, and buffers to improve performance between memory and disks. 4. Records can be stored with fixed or variable lengths. Blocking factors determine how many records fit in a block. Physical addresses map blocks to storage locations.

Uploaded by

Amanpreetpawa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views10 pages

DBT 1

Unit 1 discusses database concepts including: 1. Entity-relationship diagrams and relational algebra operations like select, project, and join. 2. Functional dependencies and Armstrong's rules for inferring dependencies. Normalization aims to eliminate transitive dependencies. 3. Database storage includes memory, disks, RAID configurations for fault tolerance, and buffers to improve performance between memory and disks. 4. Records can be stored with fixed or variable lengths. Blocking factors determine how many records fit in a block. Physical addresses map blocks to storage locations.

Uploaded by

Amanpreetpawa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

🍃

UNIT 1
Status UNIT1

Assign

1
ER Diagram

Entity: represents object in miniworld


Attributes: properties to describe entity, associated with data type(int , string, date etc.)

value set/domain: set of values associated with attribute


relationship: relates entities
degree: number of participating entities

unary operations - select(sigma), project(pi), rename(rho)

relational algebra / set theory - union, intersection, difference, cartesian product


binary operation - join, division
relational operations - outer join, outer union

aggregate functions - sum, count, avg, min, max

key constraint - prim key unique

key integrity constraint - key cant be null

refferential integrity constraint - foreign key either points to existing prim key or is null
Eg - constraint key_name primary/foreign key (attribute_name)

Functional Dependencies - X → Y when tuples for x can predict the tuples in y (eg - if two
tuples have same val for x they should have same value for y)

Armstrongs Inference Rules - for main rules remember RAT

IR1 - reflxive, if y subset of x X→Y

UNIT 1 1
IR2 - augmentation, if X→Y then XZ→YZ
IR3 - transitive, if X→Y and Y→Z then X→Z

IR4 - decomp X→YZ then X→Y and X→Z

IR5 - union X→Y and X→Z then X→YZ

IR6 - pseudotransitive if X→Y and WY→Z then WX→Z

Normalization - top down approach testing to check with normal form it satisfies

each subsequent normal form is supposed to fall under the previous normal form

1NF, 2NF, 3NF - based on FDs

4NF - based on multivalue dependency


5NF - join dependency

1NF - domain of an attribute should have


only atomic i.e simple and indivisible
values, so cant set or tuple of values .˙.
divide the set of values into multiple rows

2NF - every non prime attribute should be


fully dependent on primary key, not
partially dependent on any key; for a
relation with 2 primary keys, an attribute
depends on only one of them then its not
allowed, in that case split relation into two,
the second one having only of the candidate
keys representing the partial dependency

3NF - no transitive dependency, if exists


split into two relations, one with the
primary key and full dependencies and the
other with the partial dependency making
the dependenee the primary key in the new
relation

UNIT 1 2
BCNF - non trivial dependency holds X→A, if X is a superkey of R; eg - student and coursre
together give time, and time gives course so split into two relations;
LHS should be primary key or super key

4NF - in 1NF we split multi valued attributes into multiple rows, now if two attributes have
miltiple values, eg 3 each, we will need 9 rows for the same person; instead split into two
relations - one with person and first attribute giving 3 rows, and second with person and
second attribute giving 3 rows

5NF - lossless decomposition so no join decomposition; when joining previously divided


tables it should give same number of rows in result, no spurioud rows;

test for non-additive or lossless join - create table on top all attributes, on left all relations
involved in the join set full table to have b(i,j); if the relation has the attribute set it to a(j); for
the given dependencies apply to table, if X gives Y, for all positions where X has a make Y
have an a there too; at the end if one entire row has all a’s it will be lossless.

3 - Storage

cache - stored on registers


main mem - on another chip

secondary mem - magnetic disk or solid state devices

tertiary mem - optical disk or tapes etc.

volatile memory is used by CPU in real time, erased when system


turned off and reset; non volatile is permanant

DAS - directly attached storage

NAS - network attached storage

SAN - storage area network

Disk Failures

intermittent failure - checked with parity check, fail


when trying to read or write

media decay - sector is permanantly lost can never be


accessed

UNIT 1 3
write failure - when trying to write power outage so
never stored, cant write and cant read written part

disk crash - entire disk becomes unreadable

RAID - redundant array of independent disks

level 0 - striped disk array without fault tolerance, if one disk fails all fail, not to be used
for mission critical tasks; data broken down into blocks,
level 1 - mirroring and duplexing, one write or two reads per mirrored pair, has high
overhead, no rebuild incase of disk failure,

level 2 - hamming code ECC(parity), high ratio of ECC to disk for small words so
inefficient, correct errors on the fly, high cost, but simpler design, and single transaction
rate, ECC code confirms if correct data or corrects disk errors
level 3 - parallel with parity, striped, high read and write transfer rate, less parity per data
disks so high efficiency, resource intensive, low ratio of ECC so good efficiency

level 4 - independent data disk with shared parity disk, high efficiency, worst write
transition rate, low ECC rate meaning good efficiency
level 5 - independent disk with distributed parity blocks, highest read, medium write, high
efficiency, parity for each block instead of separate disk with only parity blocks, most
complex
level 6 - independent data disk with 2 independent parity schemes, xor generation,
protects against multiple bad block failures, same as level 5 with extended fault tolerance,
two independent parity computations with separate algorithms to give protection against
double disk failure

level 10 - high reliability and high performance, mirroring and striping, expensive and
high overhead, limited scalability (min 4), combination of 0 and 1
level 50 - more fault tolerant than raid 5 but double the parity overhead, very expensive,
striped version of raid 3

flash Storage - between dram and magnetic disk, high density and performance, fast access
speed but entire block needs to be erased and rewritten, USB(universal serial bus) most
common type
database buffer - minimize number of transfers between main mem and disk, difficult to keep
so many blocks in main mem, reduce latency of accessing disk, temporary storage

UNIT 1 4
buffer manager - responsible for allocation of data to buffer, allocates to data not already in
the buffer, no spacethrows out existing block, request block from disk will bring from disk to
memory and pass the addrtess of the block to the main memory to make it available for the
user to access (if the user wants to access thrown out block)
replacement strategy - uses lru which is least recently used
pinned blocks - set timeout when writing back to disk

forced out block - write back to disk even when space not required in buffer, so that data isnt
lost in case of system crash

fixed length records - fixed length


consisting of header giving info about
header like the schema, total length of
record, and timestamp; rest of the record
consists of fixed length of attributes as
defined when creating table. eg - here name
is 30, address in 225 etc.;

if previous record doesnt use up entire


space the rest of the space is wasted
header followed by record 1, record 2 ….
record n; no header for each record

Blockingfactor = floor(B/R)
where B is block length and R is fixed
length of each record; floor bcuz minimum
fit;

physcial address - tells us where in the


2ndary storage mngmnt system the data
is actually stored; indicate which
storage → disk or device on which
block is → number of cylinder →
number of track → number of block →
offset within block
map table to relate logical address to
their corresponding physical address,

UNIT 1 5
stored on disk
offset table gives offset of each record
within the block, here address is the
physical address plus offset

in SQL server - store as pages of size 8kb each which starts with 96 byte
header

extent - consists of 8 pages, basic unit to which space is allocated


every block, record has - database address in server and memory address
in virtual memory

swizzle - transfer from db address space to virtual address space, to avoid


repeated cost of translating, done when block is moved from 2ndary to
main mem
pointer consists of - bit indicating if in db addr or swizzled AND db or
mem pointer

swizzling - when one record in the memory is referncing or pointing to another record in
memory, the first memory will translate the pointers db entry to virtual mem then find it in
the memory and then refernce it; by “swizzling” we cut out the ltaency of using the translate
table and directly user a pointer to point to the required location within the memory itself
types of swizzling -

automatic - as soon as block into memory, addresses and pointers added to table
on demand - leave all pointers unswizzled, when pointer is referenced swizzle it

no - never swizzle a pointer


programmer ctrl of swizzling - explicitly tell which block in mem to swizzle

mem back to disk → unswizzle, the memory address needs to be replaced by the database
address again
block is said to be pinned if cant be send back to disk safely, bit telling this is in the header,
need to make sure block is not pinned before unswizzling

variable length records - fixed first variable after, no extra


space wasted ; in redord head store length of fixed part and
pointer the the beggining of the variable part

UNIT 1 6
if doesnt fit in one block, keep pointer and point to the next
block and have prev ptr from that block to current block, and
has bit in header mentioning wether it is a fragment or not,
and wether it is the first fragemnt or last fragment.

Record Modification -
insert - if not ordered make space anywhere, if ordered find
location to be sotred if no space slide records to make space;
if no room, create overflow block

delete - slide record and compact space, to keep one unused


region available

update - if fixed length no problem, for variable length slide


records to make space or contract space, for increase of size
follow insert, for reduction in size follow deletion steps

6s

column store - stores values of a single


attribute in a page, efficient, high speed
while querying, better for select where only
needs to see selected columns and aggregate
functions, good for olap but bad for oltp

Eg: Amazon Redshift, Apache Cassandra,


MariaDB

row store - stored tuple by tuple, better for


insert update delete since doesnt require
tuple splitting and stitiching

7
index structure - structure for faster retrieval, index file associates value of search key with
value in data file, primary index for primary key, and secondary index for other attributes

multilevel indexing - indexes have indexes, done using b trees

secondary indexes - unlike primary index, find record given value of one or more fileds,
returns current memory address; place bucket inbetween for indirection

UNIT 1 7
document retrieval - uses inverted indexes, every attribute is boolean to check if that word
exists in the document, secondary index for each words, use bucket for inverted indirection

B-trees - root at first layer, last layer is leafs; keys in leaf node are copies from data file and
are ordered/sorted from left to right; allow llokup insertion and deletion of block with very
little need of disk i/o, if any problem only two leaves and their parent is affected.; insert 40
and delete 7

Hash tables - use hash function to hash the key, get arbitrary value and store at that
value,when searching use same hash function to find where stored; equally distributes
records amongst buckets

insert - for key K, run hash function h(k), if bucket for h(K) has space insert, else insert into
overflow block

delete - for h(K) bucket, delete K record, move around remaining records to consolidate and
available space
efficient since lookup is one disk i/o and insert and delete only two disk i/os, but with
increase in size will cause more overflow tables causing linear search

extensible hash table - array points to blocks instead of holding


data blocks, pointer much smaller in size so accomodate more and
length is always in terms of 2 power i.

hash table is sotred on main mem and bins/buckets stored in disk

convert hash into binary, take 1 least significant bits and store in
bucket from pointer of that bit value
insert - key k, get h(k), take first bit i, enter into bucket array
indexed at bit i; follow pointer to bucket array to block B, if there is
space enter record, if not check number j which tells us how many
bits of hash value are used to determine membership; if j<i split

UNIT 1 8
block B into two, distribute based j bit and adjust pointer; if i=j
then increment i by 1, increasing length of bucket array to make
space

From the image, if we now want to insert 0000, enter into the first
bucket; now to enter 0111, split the block, making j=2 so now two
bits considered, and i gets incremented to 3; 000 and 001 point to
first bucket having 0000 and 0001; for the blocks where i equals j
there is only one mapping; for blocks where j<i there are two
mappings

check local depth i and gloabl depth j,

→ if they are equal we need to double the global depth, by incrementing j value by 1, if
before it was 2 it considered 00 01 10 and 11, now it considers 3 bits giving 8 lenght
→ if i is lesser, we need to split that block and increment i

Linear hash tables - n buckets, overflow allowed but less than 1 overflow on avg per bucket,
number of bits to number given by cieling(log2(n))

instead of doubling everytime, just add one bucket at a time


to start with we have 0 and 1, now 0 bucket is filled up, so need to create a bucket 2, 0 will be
00 and 2 will be 10; 00 and 10 use mod 4, but 1 still uses mod 2 since it hasnt changed yet,
doesnt forcefully need to be doubled like how it is done in extensible hashing
find perecntage of no of values by no of possible values and if its belowpredefined
percentage can just add to an overflow, if percentage is exceeded then splitting to be done

Multiple key indexes - two layer of indexing, first indexes first attribute and second indexes
second attribute
query for age = 50 and salary>50 ⇒ look for index in first layer for 50, then the pointer to the
index table of second attribute will get rest of records

UNIT 1 9
R trees - sub regions can have overlaps

search - start at root, examine subregions for point P, if 0 regions point P is not in any data
region, atleast 1 region then recursively search for P at child

insert - start at root, fins sub region into which R fits, more than one possible then pick one,
no sub region then expand subregion where expansion is as little as possible, else insert into
leaf, then split the leaf;

Bitmap indexes - in an attribute if there are 3 possible values (traffic light can be green
yellow or red) use bitmap for this; lets say there are 7 traffic lights, first three are red, next 3
are yellow and last one is green then bitmap for them will be - red: 1110000, yellow:
0001110, green: 0000001;

to check when light is red or yellow use OR to get 11111110, for another attribute you get
1011001, now and these two to get - 1011000 only these satisfy the query

10

CREATE INDEX <index_name> ON <table_name>(<attribute_name>);


CREATE UNIQUE INDEX <index_name> ON <table_name>(<attribute_name>);

CREATE INDEX <composite_index_name> ON <table_name>(<attribute_name1>,


<attribute_name2>);

indexes automatically created for primary key and unique key constraints - called implicit
key

DROP INDEX <index_name>;


avoid indexes on - small tables, col with high number of null values, col frequently
manipulated

UNIT 1 10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy