0% found this document useful (0 votes)
63 views53 pages

Lecture 1 Intro

This document provides an overview of the CS564 Database Management Systems course. The course will cover the basics of how to use and manage data using a relational database model. It will teach students how to design databases, query databases using SQL, and build applications that incorporate databases. The course is split into sections that cover topics like database foundations, design, and internals. Students will complete problem sets, a group programming project, a midterm, and a final exam as part of their assessment. The goal is for students to learn fundamental database concepts and gain practical skills in working with databases.

Uploaded by

Peter L. Montez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views53 pages

Lecture 1 Intro

This document provides an overview of the CS564 Database Management Systems course. The course will cover the basics of how to use and manage data using a relational database model. It will teach students how to design databases, query databases using SQL, and build applications that incorporate databases. The course is split into sections that cover topics like database foundations, design, and internals. Students will complete problem sets, a group programming project, a midterm, and a final exam as part of their assessment. The goal is for students to learn fundamental database concepts and gain practical skills in working with databases.

Uploaded by

Peter L. Montez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

CS564:

Database Management Systems


Lecture 1: Course Overview

Acks: Chris Ré
1
2
Big science is data driven.
3
Increasingly many companies see
themselves as data driven. 4
Even more “traditional” companies…

5
The world is increasingly
driven by data…

This class teaches the basics of


how to use & manage data.
6
Today’s Lecture

1. Introduction, admin & setup


• ACTIVITY: Jupyter “Hello World!”

2. Overview of the relational data model


• ACTIVITY: SQL in Jupyter

3. Overview of DBMS topics: Key concepts & challenges

7
Section 1

1. Introduction, admin & setup

8
Section 1

What you will learn about in this section


1. Motivation for studying DBs

2. Administrative structure

3. Course logistics

4. Overview of lecture coverage

5. ACTIVITY: Jupyter “Hello World!”

9
Section 1 > Introduction

New tech. Same Principles.

10
Section 1 > Introduction

Why should you study databases?


• Mercenary-make more $$$:
• Startups need DB talent right away = low employee #
• Massive industry…

• Intellectual:
• Science: data poor to data rich
• No idea how to handle the data!
• Fundamental ideas to/from all of CS:
• Systems, theory, AI, logic, stats, analysis….

Many great computer systems ideas started in DB.


11
Section 1 > Introduction

What this course is (and is not)


• Discuss fundamentals of data management
• How to design databases, query databases, build applications with them.
• How to debug them when they go wrong!
• Not how to be a DBA or how to tune Oracle 12g.

• We’ll cover how database management systems work

• And some basic principles of how to build them

12
Section 1 > Administrative > Course Staff

Who we are…
Instructor (me) Theo Rekatsinas
• Faculty in the Computer Sciences and part of the UW-Database Group
• Research: data integration and cleaning, statistical analytics, and machine
learning.
• thodrek@cs.wisc.edu
• Office hours: Wed 4:00-5:00pm (after class), Fri 11:00am-12:00 pm @CS 4361

13
Section 1 > Administrative > Course Staff

Teaching Assistants (TAs)


“TAs are humans too!”

14
Section 1 > Administrative > Course Staff

Teaching Assistants (TAs)


“TAs are humans too!”

15
Section 1 > Administrative > Course Staff

Minzhen Vishnu

16
Section 1 > Administrative

Communication w/ Course Staff


The goal is to get you
to answer each
• Piazza https://piazza.com/wisc/fall2017/cs5643 other’s questions so
you can benefit and
learn from each other.
• Class email: cs564fall17@gmail.com
OHs are listed on the course
• Office hours website!

• By appointment!

17
Section 1 > Administrative

Course Website:

https://thodrek.github.io/cs564-fall17/
Course Email:

cs564fall17@gmail.com
18
Section 1 > Logistics

Lectures
• Lecture slides cover essential material
• This is your best reference.
• We are trying to get away from book, but we will have pointers
• Recommended textbooks listed on website

• Try to cover same thing in many ways: Lecture, lecture notes,


homework, exams (no shock)
• Attendance makes your life easier…

19
Section 1 > Logistics

Attendance
• You should attend lectures plus guest lecture
• Guest lectures are fun. Great guests: they want to meet you! Show up!

• Attendance is for your benefit…


• People who did not attend did worse L
• People who did not attend used more course resources L
• People who did not attend were less happy with the course L

20
Section 1 > Logistics

Graded Elements

• Two Problem Sets (15%)

• Programming project (30%)


• Split into four parts All due dates are
• Auction base: Experience with a DB application.
• Implementation of DB internals posted on
website!!!
• Midterm (20%)

• Final exam (35%)

21
Section 1 > Logistics

Un-Graded Elements
• Readings provided to help you!
• Only items in lecture, homework, or project are fair game.

• Activities are again mainly to help / be fun!


• Will occur during class- not graded, but count as part of lecture material (fair
game as well)

• Jupyter Notebooks provided


• These are optional but hopefully helpful.
• Redesigned so that you can ‘interactively replay’ parts of lecture
22
Section 1 > Logistics

What is expected from you


• Attend lectures
• If you don’t, it’s at your own peril

• Be active and think critically


• Ask questions, post comments on forums

• Do programming and homework projects


• Start early and be honest

• Study for tests and exams


23
Section 1 > Logistics

Problem Sets
• Two problems sets at the beginning

• Individual assignments
• Python plus Jupiter notebooks

• 1 week per problem set


• Ask questions, post comments on forums
• Start early!

24
Section 1 > Logistics

Project
• Split into four parts
• Two parts cover DB applications
• Two parts cover DB internals

• In groups of 3
• One person per team emails the group info by Wednesday 9/13.
• Use cs564fall17@gmail.com subject should be CS564-3: Project Group
• Write your names and University ID

• Python (DB design) and C++ (DB internals)

• Varying duration for different parts (2 weeks at least)


• Exact dates posted on website
• Ask questions, post comments on forums
• Start early!
25
Section 1 > Logistics

To encourage awesomeness
Bonus assignments, activities, and projects… Some extremes…

1. I was hung over when I took the test. Intended to make up for silly
mistakes.

2. I want to be a research star! There will be some challenging assignments


that could indicate possible publication (e.g., ACM SIGMOD undergrad
competition)

26
Section 1 > Lectures

Lectures: 1st part - from a user’s perspective


1. Foundations: Relational data models & SQL
• Lectures 2-4
• How to manipulate data with SQL, a declarative language
• reduced expressive power but the system can do more for you

2. Database Design: Design theory and constraints


• Lectures 5-8
• Designing relational schema to keep your data from getting corrupted

27
Section 1 > Lectures

Lectures: 2nd part – database internals


3. Introduction to database systems
• Lectures 9-11
• Data Storage and IO models
• Buffer Manager and File Organization
• External sorting

4. Indexing and Hashing


• Lectures 12-15
• Intro to indexing
• B+ Tree, Hash, and Bitmap Indexes

5. Query processing
• Lectures 16-20
• Access methods and operators
• Joins
• Relational algebra and Query optimization
28
Section 1 > Lectures

Lectures: 3rd part – transactions


6. Transactions
• Lectures 21-22
• Transactions from a user’s perspective
• Logging and Locking

7. Bonus
• Guest Lecture and Lecture 23
• Stratis Viglas from Google
• Machine Learning meets Data Management

29
Section 1 > Lectures

Lectures: A note about format of notes


Take note!!

These are asides / notes (still


need to know these in general!)

Definitions in blue with concept being defined bold & underlined

Main point of slide / key takeaway at bottom

Warnings- pay attention here!

30
Section 1 > ACTIVITY

Jupyter Notebook “Hello World”


• Jupyter notebooks are interactive shells which FYI: “Jupyter Notebook” are also

save output in a nice notebook format called iPython notebooks but they
handle other languages too.
• They also can display markdown, LaTeX, HTML, js…

• You’ll use these for Note: you do need to


• in-class activities know or learn python
• interactive lecture supplements/recaps for this course!
• homeworks, projects, etc.- if helpful!
31
Section 1 > ACTIVITY

Jupyter Notebook Setup


1. HIGHLY RECOMMENDED. Install on your laptop via the instructions on the
next slide / Piazza

2. Other options running via one of the alternative methods: Please help out your
1. Ubuntu VM. peers by posting issues
2. CS Machines. / solutions on Piazza!

3. Come to office hours if you need help with installation!

As a general policy in upper-level CS courses, Windows is not officially supported.

32
Section 1 > ACTIVITY

Jupyter Notebook Setup

https://thodrek.github.io/cs564-
fall17/misc/jupyter_install.html

33
Section 1 > ACTIVITY

Activity-1-1.ipynb

34
Section 2

2. Overview of the relational data


model

35
Section 2

What you will learn about in this section


1. Definition of DBMS

2. Data models & the relational data model

3. Schemas & data independence

4. ACTIVITY: Jupyter + SQL

36
Section 2 > DBMS

What is a DBMS?
• A large, integrated collection of data

• Models a real-world enterprise


• Entities (e.g., Students, Courses)
• Relationships (e.g., Alice is enrolled in CS564)

A Database Management System (DBMS) is a


piece of software designed to store and
manage databases
37
Section 2 > Data models

A Motivating, Running Example


• Consider building a course management system (CMS):

• Students
• Courses Entities
• Professors

• Who takes what


Relationships
• Who teaches what

38
Section 2 > Data models

Data models
• A data model is a collection of concepts for describing data

• The relational model of data is the most widely used model today
• Main Concept: the relation- essentially, a table

• A schema is a description of a particular collection of data, using the


given data model

• E.g. every relation in a relational data model has a schema describing types,
etc.

39
Section 2 > Data models

Modeling the Course Management System


• Logical Schema
• Students(sid: string, name: string, gpa: float)
• Courses(cid: string, cname: string, credits: int)
• Enrolled(sid: string, cid: string, grade: string)

sid Name Gpa Relations cid cname credits


101 Bob 3.2 564 564-2 4
123 Mary 3.8 308 417 2
Students sid cid Grade Courses
123 564 A
Enrolled
40
Section 2 > Data models

Modeling the Course Management System


• Logical Schema
• Students(sid: string, name: string, gpa: float)
• Courses(cid: string, cname: string, credits: int)
• Enrolled(sid: string, cid: string, grade: string)

sid Name Gpa Corresponding cid cname credits


101 Bob 3.2 keys 564 564-2 4
123 Mary 3.8 308 417 2
Students sid cid Grade Courses
123 564 A
Enrolled
41
Section 2 > Schemata

Other Schemata…
• Physical Schema: describes data layout
• Relations as unordered files
• Some data in sorted order (index) Administrators

• Logical Schema: Previous slide

Applications
• External Schema: (Views)
• Course_info(cid: string, enrollment: integer)
• Derived from other tables

42
Section 2 > Schemata

Data independence
Concept: Applications do not need to worry about how the data is
structured and stored
Logical data independence: I.e. should not need to ask: can we add a
new entity or attribute without rewriting
protection from changes in the the application?
logical structure of the data

Physical data independence: I.e. should not need to ask: which disks
are the data stored on? Is the data
protection from physical layout indexed?
changes

One of the most important reasons to use a DBMS 43


Section 2 > ACTIVITY

Activity-1-2.ipynb

44
Section 3

3. Overview of DBMS topics


Key concepts & challenges

45
Section 3

What you will learn about in this section


1. Transactions

2. Concurrency & locking

3. Atomicity & logging

4. Summary

46
Section 3 > DBMS Challenges

Challenges with Many Users


• Suppose that our CMS application serves 1000’s of users or more-
what are some challenges?

• Security: Different users, We won’t look at too much in this


different roles course, but is extremely important

Disk/SSD access is slow, DBMS hide


• Performance: Need to provide
the latency by doing more CPU work
concurrent access
concurrently

• Consistency: Concurrency can DBMS allows user to write programs


lead to update problems as if they were the only user 47
Section 3 > DBMS Challenges

Transactions
• A key concept is the transaction (TXN): an atomic Atomicity: An action
sequence of db actions (reads/writes) either completes
entirely or not at all

Acct Balance Transfer $3k from a10 to a20: Acct Balance


a10 20,000 1. Debit $3k from a10 a10 17,000
a20 15,000 2. Credit $3k to a20 a20 18,000

Written naively, in • Crash before 1,


DB Always
which states is • After 1 but before 2,
preserves
atomicity preserved? • After 2.
atomicity! 48
Section 3 > DBMS Challenges

Transactions
• A key concept is the transaction (TXN): an atomic Atomicity: An action
sequence of db actions (reads/writes) either completes
• If a user cancels a TXN, it should be as if nothing entirely or not at all
happened!

• Transactions leave the DB in a consistent state Consistency: An action


• Users may write integrity constraints, e.g., ‘each course results in a state which
is assigned to exactly one room’ conforms to all
integrity constraints
However, note that the DBMS does not understand the
real meaning of the constraints– consistency burden is
still on the user!
49
Section 3 > DBMS Challenges

Challenge: Scheduling Concurrent


Transactions
• The DBMS ensures that the execution of {T1,…,Tn} is A set of TXNs is
equivalent to some serial execution isolated if their effect
is as if all were
• One way to accomplish this: Locking executed serially
• Before reading or writing, transaction requires a lock from
DBMS, holds until the end

What if Ti and Tj need X and


• Key Idea: If Ti wants to write to an item x and Tj wants Y, and Ti asks for X before Tj,
to read x, then Ti, Tj conflict. Solution via locking: and Tj asks for Y before Ti?
• only one winner gets the lock -> Deadlock! One is
• loser is blocked (waits) until winner finishes aborted…

All concurrency issues handled by the DBMS… 50


Section 3 > DBMS Challenges

Ensuring Atomicity & Durability


• DBMS ensures atomicity even if a TXN crashes!

• One way to accomplish this: Write-ahead logging Write-ahead Logging


(WAL) (WAL): Before any
action is finalized, a
corresponding log
• Key Idea: Keep a log of all the writes done. entry is forced to disk
• After a crash, the partially executed TXNs are undone
using the log
We assume that the log is on
“stable” storage

All atomicity issues also handled by the DBMS… 51


Section 3 > Summary

A Well-Designed DBMS makes many people


happy!
• End users and DBMS vendors
• Reduces cost and makes money

• DB application programmers
• Can handle more users, faster, for cheaper, and with better
reliability / security guarantees!

• Database administrators (DBA) Must still understand


DB internals
• Easier time of designing logical/physical schema, handling
security/authorization, tuning, crash recovery, and more…

52
Section 3 > Summary

Summary of DBMS
• DBMS are used to maintain, query, and manage large datasets.
• Provide concurrency, recovery from crashes, quick application development,
integrity, and security

• Key abstractions give data independence

• DBMS R&D is one of the broadest, most exciting fields in CS. Fact!

53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy