Lecture 1 Intro
Lecture 1 Intro
Acks: Chris Ré
1
2
Big science is data driven.
3
Increasingly many companies see
themselves as data driven. 4
Even more “traditional” companies…
5
The world is increasingly
driven by data…
7
Section 1
8
Section 1
2. Administrative structure
3. Course logistics
9
Section 1 > Introduction
10
Section 1 > Introduction
• Intellectual:
• Science: data poor to data rich
• No idea how to handle the data!
• Fundamental ideas to/from all of CS:
• Systems, theory, AI, logic, stats, analysis….
12
Section 1 > Administrative > Course Staff
Who we are…
Instructor (me) Theo Rekatsinas
• Faculty in the Computer Sciences and part of the UW-Database Group
• Research: data integration and cleaning, statistical analytics, and machine
learning.
• thodrek@cs.wisc.edu
• Office hours: Wed 4:00-5:00pm (after class), Fri 11:00am-12:00 pm @CS 4361
13
Section 1 > Administrative > Course Staff
14
Section 1 > Administrative > Course Staff
15
Section 1 > Administrative > Course Staff
Minzhen Vishnu
16
Section 1 > Administrative
• By appointment!
17
Section 1 > Administrative
Course Website:
https://thodrek.github.io/cs564-fall17/
Course Email:
cs564fall17@gmail.com
18
Section 1 > Logistics
Lectures
• Lecture slides cover essential material
• This is your best reference.
• We are trying to get away from book, but we will have pointers
• Recommended textbooks listed on website
19
Section 1 > Logistics
Attendance
• You should attend lectures plus guest lecture
• Guest lectures are fun. Great guests: they want to meet you! Show up!
20
Section 1 > Logistics
Graded Elements
21
Section 1 > Logistics
Un-Graded Elements
• Readings provided to help you!
• Only items in lecture, homework, or project are fair game.
Problem Sets
• Two problems sets at the beginning
• Individual assignments
• Python plus Jupiter notebooks
24
Section 1 > Logistics
Project
• Split into four parts
• Two parts cover DB applications
• Two parts cover DB internals
• In groups of 3
• One person per team emails the group info by Wednesday 9/13.
• Use cs564fall17@gmail.com subject should be CS564-3: Project Group
• Write your names and University ID
To encourage awesomeness
Bonus assignments, activities, and projects… Some extremes…
1. I was hung over when I took the test. Intended to make up for silly
mistakes.
26
Section 1 > Lectures
27
Section 1 > Lectures
5. Query processing
• Lectures 16-20
• Access methods and operators
• Joins
• Relational algebra and Query optimization
28
Section 1 > Lectures
7. Bonus
• Guest Lecture and Lecture 23
• Stratis Viglas from Google
• Machine Learning meets Data Management
29
Section 1 > Lectures
30
Section 1 > ACTIVITY
save output in a nice notebook format called iPython notebooks but they
handle other languages too.
• They also can display markdown, LaTeX, HTML, js…
2. Other options running via one of the alternative methods: Please help out your
1. Ubuntu VM. peers by posting issues
2. CS Machines. / solutions on Piazza!
32
Section 1 > ACTIVITY
https://thodrek.github.io/cs564-
fall17/misc/jupyter_install.html
33
Section 1 > ACTIVITY
Activity-1-1.ipynb
34
Section 2
35
Section 2
36
Section 2 > DBMS
What is a DBMS?
• A large, integrated collection of data
• Students
• Courses Entities
• Professors
38
Section 2 > Data models
Data models
• A data model is a collection of concepts for describing data
• The relational model of data is the most widely used model today
• Main Concept: the relation- essentially, a table
• E.g. every relation in a relational data model has a schema describing types,
etc.
39
Section 2 > Data models
Other Schemata…
• Physical Schema: describes data layout
• Relations as unordered files
• Some data in sorted order (index) Administrators
Applications
• External Schema: (Views)
• Course_info(cid: string, enrollment: integer)
• Derived from other tables
42
Section 2 > Schemata
Data independence
Concept: Applications do not need to worry about how the data is
structured and stored
Logical data independence: I.e. should not need to ask: can we add a
new entity or attribute without rewriting
protection from changes in the the application?
logical structure of the data
Physical data independence: I.e. should not need to ask: which disks
are the data stored on? Is the data
protection from physical layout indexed?
changes
Activity-1-2.ipynb
44
Section 3
45
Section 3
4. Summary
46
Section 3 > DBMS Challenges
Transactions
• A key concept is the transaction (TXN): an atomic Atomicity: An action
sequence of db actions (reads/writes) either completes
entirely or not at all
Transactions
• A key concept is the transaction (TXN): an atomic Atomicity: An action
sequence of db actions (reads/writes) either completes
• If a user cancels a TXN, it should be as if nothing entirely or not at all
happened!
• DB application programmers
• Can handle more users, faster, for cheaper, and with better
reliability / security guarantees!
52
Section 3 > Summary
Summary of DBMS
• DBMS are used to maintain, query, and manage large datasets.
• Provide concurrency, recovery from crashes, quick application development,
integrity, and security
• DBMS R&D is one of the broadest, most exciting fields in CS. Fact!
53