0% found this document useful (0 votes)
162 views62 pages

SQL L

Not all databases are relational databases. Relational databases structure data into tables with rows and columns, while NoSQL databases keep all information in documents or key-value pairs without relation. For example, a relational database may separate customer and subscription data into tables, while a NoSQL database may store all customer and subscription data together in a single document. SQL is commonly used to interact with relational databases, allowing users to query, manipulate, and get insights from structured data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views62 pages

SQL L

Not all databases are relational databases. Relational databases structure data into tables with rows and columns, while NoSQL databases keep all information in documents or key-value pairs without relation. For example, a relational database may separate customer and subscription data into tables, while a NoSQL database may store all customer and subscription data together in a single document. SQL is commonly used to interact with relational databases, allowing users to query, manipulate, and get insights from structured data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

the context of this 

exercise 301, are all databases relational?

Answer

No, not all databases are relational databases. Databases can be non-relational, and this type of
database is referred to as NoSQL databases.

NoSQL databases are structured differently from the relational database structure. With relational,
we structure tables by the type of relations, but NoSQL keeps all the information in one place, in the
form of key-values or documents.

For example, consider a database of people and their subscriptions to a newsletter.

Using a relational database structure, we might separate the information for each person, having
their id, name, and email, in one table named customers, and then another table for
the subscriptions, having the newsletter name, and other information associated with the
subscriptions.

With a NoSQL database, instead of separating information in this way, we might just have a single
document with the person’s information, as well as the subscription information, all in one
document. As you might tell, this type of structuring might have some benefits and some cons
compared to relational databases.

Relational Databases

Nice work! In one line of code, you returned information from a relational database.

SELECT * FROM celebs;

We’ll take a look at what this code means soon, for now, let’s focus on what relational databases are
and how they are organized.

A relational database is a database that organizes information into one or more tables. Here, the
relational database contains one table.

A table is a collection of data organized into rows and columns. Tables are sometimes referred to
as relations. Here the table is celebs.

A column is a set of data values of a particular type. Here, id, name, and age are the columns.

A row is a single record in a table. The first row in the celebs table has:

An id of 1

A name of Justin Bieber

An age of 22

All data stored in a relational database is of a certain data type. Some of the most common data
types are:

INTEGER, a positive or negative whole number

TEXT, a text string

DATE, the date formatted as YYYY-MM-DD

REAL, a decimal value


Instructions

Now that you have an understanding of what relational databases are, let’s take a closer look at SQL
syntax.

d name age

1 Justin Bieber 22

2 Beyonce Knowles 33

3 Jeremy Lin 26

4 Taylor Swift 26

Database Schema

celebs

name type

id INTEGER

name TEXT

age INTEGER

Rows: 4

What are some ways that SQL is used in data science?

Answer

In the field of data science, there is a process, called the Data Science process, which provides a
structured approach to performing experiments with data. In this data science process, some
important steps include obtaining data, cleaning and organizing data, and exploring the data. Using
tools like SQL, scientists are able to perform these important steps.

When obtaining data, scientists can use SQL to store them in tables and databases.

Once they have obtained the data, they can clean and organize the data utilizing built-in functions
and clauses, to do things such as grouping them according to some qualities or conditions. They can
also separate the data into different tables based on what kind of information they represent.

After these steps, they will be able to explore the data by utilizing many of the built-in SQL functions,
which include functions such as AVG() which returns the average value of a column.

With its many built-in functions and ease-of-use, SQL is a very useful tool to have for data science.

MANIPULATION

Statements
The code below is a SQL statement. A statement is text that the database recognizes as a valid
command. Statements always end in a semicolon ;.

CREATE TABLE table_name (


   column_1 data_type,
   column_2 data_type,
   column_3 data_type
);

Let’s break down the components of a statement:

CREATE TABLE is a clause. Clauses perform specific tasks in SQL. By convention, clauses are written in
capital letters. Clauses can also be referred to as commands.

table_name refers to the name of the table that the command is applied to.

(column_1 data_type, column_2 data_type, column_3 data_type) is a parameter. A parameter is a


list of columns, data types, or values that are passed to a clause as an argument. Here, the
parameter is a list of column names and the associated data type.

The structure of SQL statements vary. The number of lines used does not matter. A statement can be
written all on one line, or split up across multiple lines if it makes it easier to read. In this course, you
will become familiar with the structure of common statements.

Instructions

1.

Let’s take a closer look at the statement we wrote before. In the code editor, type:

SELECT * FROM celebs;

Run the code to observe the results, and ask yourself:

Which parts of the statement are the clauses?

What table are we applying the command to?

Uncover the hint to view the answers, and then proceed to the next exercise.

SELECT and FROM are the clauses here.

We are applying the command to the celebs table.

Question

In the context of this exercise 285, does every SQL statement follow this structure?

Answer

In general, SQL statements will usually follow a similar structure as the given example SQL
statement, but depending on what you are trying to do, they can also look very different.

The following are some examples of different kinds of SQL statements you might use, which will be
covered throughout this lesson.
This statement selects all rows from a table.

SELECT * FROM table;

This statement will create a new table.

CREATE TABLE students (

column_1 data_type,

column_2 data_type

);

This statement will update a row in a table.

UPDATE table

SET column_1 = new_value

WHERE id = 3;

In the context of this exercise 123, what happens if we try to create a table with an existing name?

Answer

When you try to create a table with an already existing table name, you will receive an error
message, and no table will be modified or created.

Because SQLite (used in the exercises) is case insensitive for most syntax including names, this will
apply to any casing of the table name. For instance, given the celebs table from this exercise, if you
tried to run the following, it will throw an error, because the table name already exists.

CREATE TABLE Celebs (

id INTEGER,

name TEXT,

age INTEGER

);

MANIPULATION

Insert

The INSERT statement inserts a new row into a table.

We can use the INSERT statement when you want to add new records. The statement below enters
a record for Justin Bieber into the celebs table.

INSERT INTO celebs (id, name, age)


VALUES (1, 'Justin Bieber', 22);

INSERT INTO is a clause that adds the specified row or rows.


celebs is the table the row is added to.

(id, name, age) is a parameter identifying the columns that data will be inserted into.

VALUES is a clause that indicates the data being inserted.

(1, 'Justin Bieber', 22) is a parameter identifying the values being inserted.

1: an integer that will be added to id column

'Justin Bieber': text that will be added to name column

22: an integer that will be added to age column

Instructions

1.

Add a row to the table. In the code editor, type:

INSERT INTO celebs (id, name, age)


VALUES (1, 'Justin Bieber', 22);

Look at the Database Schema. How many rows are in the celebs table?

Hint

Make sure there is a set of parentheses around the column names and values to be inserted into the
table!

Notice the single quotes around Justin Bieber. This is because text strings require quotes around
them, while numbers don’t.

There should be 1 row in the celebs table now.

2.

Add three more celebrities to the table. Beneath your previous INSERT statement type:

INSERT INTO celebs (id, name, age)


VALUES (2, 'Beyonce Knowles', 33);

INSERT INTO celebs (id, name, age)


VALUES (3, 'Jeremy Lin', 26);

INSERT INTO celebs (id, name, age)


VALUES (4, 'Taylor Swift', 26);

Look at the Database Schema. How many rows are in the celebs table now?

Make sure to enter the three new INSERT statements beneath the first INSERT statement.

There should be 4 rows in the celebs table now.


How do we see what information is stored in these rows? Head to the next exercise to find out!

Question

In the context of this exercise 259, is there a shorter way to insert multiple rows in a table?

Answer

Yes, instead of inserting each row in a separate INSERT statement, you can actually insert multiple
rows in a single statement.

To do this, you can list the values for each row separated by commas, following the VALUES clause of
the statement.

Here is how it would look,

INSERT INTO table (col1, col2, col3)

VALUES

(row1_val1, row1_val2, row1_val3),

(row2_val1, row2_val2, row2_val3),

(row3_val1, row3_val2, row3_val3);

MANIPULATION

Select

SELECT statements are used to fetch data from a database. In the statement below, SELECT returns
all data in the name column of the celebs table.

SELECT name FROM celebs;

1. SELECT is a clause that indicates that the statement is a query. You will use SELECT every time you
query data from a database.
2. name specifies the column to query data from.
3. FROM celebs specifies the name of the table to query data from. In this statement, data is queried
from the celebs table.

You can also query data from all columns in a table with SELECT.

SELECT * FROM celebs;

* is a special wildcard character that we have been using. It allows you to select every column in a
table without having to name each one individually. Here, the result set contains every column in
the celebs table.

SELECT statements always return a new table called the result set.

Instructions
1.

Let’s take a closer look at SELECT and retrieve all the names in the celebs table. In the code editor,
type:

SELECT name FROM celebs;

Don’t forget to include the FROM clause and the name of the table which we are selecting the data
from!

The result should only have a single column (name).

Question

In the context of this exercise 131, what order are rows selected from a table?

Answer

In most SQL databases, by default, the rows will be selected in the order that they appear in the
table, from top to bottom. For , if you have a statement like
SELECT * FROM table this will select all rows from the table from the first row that appears down to
the bottom row.

Later on in the other SQL lessons, you will also learn about clauses such as ORDER BY, which allow
you to set a certain order for how the rows will be returned in the result set.

MANIPULATION

Alter

The ALTER TABLE statement adds a new column to a table. You can use this command when you
want to add columns to a table. The statement below adds a new column twitter_handle to
the celebs table.

ALTER TABLE celebs


ADD COLUMN twitter_handle TEXT;

1. ALTER TABLE is a clause that lets you make the specified changes.
2. celebs is the name of the table that is being changed.
3. ADD COLUMN is a clause that lets you add a new column to a table:

twitter_handle is the name of the new column being added

TEXT is the data type for the new column

4. NULL is a special value in SQL that represents missing or unknown data. Here, the rows that
existed before the column was added have NULL (∅) values for twitter_handle.

Instructions

1.
Add a new column to the table. In the code editor, type:

ALTER TABLE celebs


ADD COLUMN twitter_handle TEXT;

SELECT * FROM celebs;

Question

In the context of this exercise 109, can we add a column at a specific position to a table?

Answer

No, unfortunately, you cannot specify what position to add a column to a table.

By default, a new column will always be added at the end of the table. For most intents and
purposes, this should not affect much, since you can always select the columns in any order, for
instance, like

SELECT col3, col1, col2

If column order is very important, then an alternative is to create a new table and add the columns
in the specific order they should appear.

MANIPULATION

Update

The UPDATE statement edits a row in a table. You can use the UPDATE statement when you want to
change existing records. The statement below updates the record with an id value of 4 to have
the twitter_handle @taylorswift13.

UPDATE celebs
SET twitter_handle = '@taylorswift13'
WHERE id = 4;

1. UPDATE is a clause that edits a row in the table.


2. celebs is the name of the table.
3. SET is a clause that indicates the column to edit.

twitter_handle is the name of the column that is going to be updated

@taylorswift13 is the new value that is going to be inserted into the twitter_handle column.

4. WHERE is a clause that indicates which row(s) to update with the new column value. Here the row
with a 4 in the id column is the row that will have the twitter_handle updated to @taylorswift13.

Instructions

1.

Update the table to include Taylor Swift’s twitter handle. In the code editor, type:

UPDATE celebs
SET twitter_handle = '@taylorswift13'
WHERE id = 4;

SELECT * FROM celebs;

Checkpoint 2 Passed

Hint

Double-check your statement character by character:

Did you include the underscore in twitter_handle?

Did you include the @ in Taylor’s twitter_handle?

Notice the single quotes around @taylorswift13. This is because text strings require quotes around
them, while numbers don’t.

Question

In the context of this exercise 152, how is ALTER different from UPDATE?

Answer

Although similar in the sense that both statements will modify a table, these statements are quite
different.

The ALTER statement is used to modify columns. With ALTER, you can add columns, remove them,
or even modify them.

The UPDATE statement is used to modify rows. However, UPDATE can only update a row, and


cannot remove or add rows.

MANIPULATION

Delete

The DELETE FROM statement deletes one or more rows from a table. You can use the statement
when you want to delete existing records. The statement below deletes all records in
the celebs table with no twitter_handle:

DELETE FROM celebs


WHERE twitter_handle IS NULL;

DELETE FROM is a clause that lets you delete rows from a table.

celebs is the name of the table we want to delete rows from.

WHERE is a clause that lets you select which rows you want to delete. Here we want to delete all of
the rows where the twitter_handle column IS NULL.

IS NULL is a condition in SQL that returns true when the value is NULL and false otherwise.

Instructions
1.

Delete all of the rows that have a NULL value in the twitter handle column. In the code editor, type
the following:

DELETE FROM celebs


WHERE twitter_handle IS NULL;

SELECT * FROM celebs;

How many rows exist in the celebs table now?

Checkpoint 2 Passed

Hint

Did you type:

SELECT * FROM celebs;

after your deletion statement?

There should only be 1 row left.

Question

In the context of this exercise 107, what if we only want to delete a specific number of rows?

Answer

To delete only a specific number of rows, you can utilize the LIMIT statement. The value provided
for LIMIT will be how many rows to affect.

For example, this statement will only delete the first 5 rows that match the condition,

DELETE FROM table

WHERE condition

LIMIT 5;

MANIPULATION

Constraints

Constraints that add information about how a column can be used are invoked after specifying the
data type for a column. They can be used to tell the database to reject inserted data that does not
adhere to a certain restriction. The statement below sets constraints on the celebs table.

CREATE TABLE celebs (


   id INTEGER PRIMARY KEY,
   name TEXT UNIQUE,
   date_of_birth TEXT NOT NULL,
   date_of_death TEXT DEFAULT 'Not Applicable'
);

1. PRIMARY KEY columns can be used to uniquely identify the row. Attempts to insert a row with an
identical value to a row already in the table will result in a constraint violation which will not allow
you to insert the new row.

2. UNIQUE columns have a different value for every row. This is similar to PRIMARY KEY except a
table can have many different UNIQUE columns.

3. NOT NULL columns must have a value. Attempts to insert a row without a value for a NOT
NULL column will result in a constraint violation and the new row will not be inserted.

4. DEFAULT columns take an additional argument that will be the assumed value for an inserted row
if the new row does not specify a value for that column.

Instructions

1.

Create a new table with constraints on the values. In the code editor type:

CREATE TABLE awards (


   id INTEGER PRIMARY KEY,
   recipient TEXT NOT NULL,
   award_name TEXT DEFAULT 'Grammy'
);

How many tables do you see in the database schema on the right?

Checkpoint 2 Passed

Hint

Common errors:

Missing the commas after the first and second column declarations.

Missing the data type or constraints of each column.

what are some reasons to apply constraints to a table?

Answer

Applying constraints to a table can be useful, and provide some important benefits such as reliability
and consistency of your data. The following are a few reasons you might consider for applying
constraints to a table.

One reason for adding constraints is to prevent invalid data in the table. This is very important,
because invalid data can cause issues and unexpected results from calculations. We do not have to
worry about new data being added that might otherwise violate our constraints and cause bigger
issues.
Similar to the previous point, constraints can let us prevent missing data, which is usually filled
as NULL within the table. Instead of having missing values set to NULL, we can set constraints so that
the missing values are given some default value instead, like 0. This can make some calculations
easier to do.

Another important reason to add a constraint is for uniqueness, usually in the form of values like
the id, or identifier column. By using a constraint like the PRIMARY KEY, we can ensure that every
row has their own unique id value.

SQL is a programming language designed to manipulate and manage data stored in relational
databases.

A relational database is a database that organizes information into one or more tables.

A table is a collection of data organized into rows and columns.

A statement is a string of characters that the database recognizes as a valid command.

CREATE TABLE creates a new table.

INSERT INTO adds a new row to a table.

SELECT queries data from a table.

ALTER TABLE changes an existing table.

UPDATE edits a row in a table.

DELETE FROM deletes rows from a table.

Constraints add information about how a column can be used.

Instructions

In this lesson, we have learned SQL statements that create, edit, and delete data. In the upcoming
lessons, we will learn how to use SQL to retrieve information from a database!

Question

In the context of this lesson 156, are there any other commonly used SQL commands?

Answer

The SQL commands covered in this lesson are probably the most common ones you will encounter
or need to use when working with tables. Other available commands are more situational and not as
commonly used.

One such command is DROP TABLE, which you can use to permanently remove a table from a
database. Deleting tables is generally not a frequent occurrence, so you might only use this once in a
while. Other commands, such as ANALYZE, which is used to obtain statistics about a table, are also
not as common and you might only use them in certain situations.

For a full list of all the commands provided by SQLite, which is used in the Codecademy courses, you
can check out the official documentation.
QUERIES

Introduction

In this lesson, we will be learning different SQL commands to query a single table in a database.

One of the core purposes of the SQL language is to retrieve information stored in a database. This is
commonly referred to as querying. Queries allow us to communicate with the database by asking
questions and returning a result set with data relevant to the question.

We will be querying a database with one table named movies.

Let’s get started!

Fun fact: IBM started out SQL as SEQUEL (Structured English QUEry Language) in the 1970’s to query
databases.

Instructions

1.

We should get acquainted with the movies table.

In the editor, type the following:

SELECT * FROM movies;

What are the column names?

QUERIES

Select

Previously, we learned that SELECT is used every time you want to query data from a database
and * means all columns.

Suppose we are only interested in two of the columns. We can select individual columns by their
names (separated by a comma):

SELECT column1, column2


FROM table_name;

To make it easier to read, we moved FROM to another line.

Line breaks don’t mean anything specific in SQL. We could write this entire query in one line, and it
would run just fine.

Instructions

1.

Let’s only select the name and genre columns of the table.

In the code editor, type the following:

SELECT name, genre


FROM movies;
Checkpoint 2 Passed

Stuck? Get a hint

2.

Now we want to include a third column.

Edit your query so that it returns the name, genre, and year columns of the table.

Question

When writing SQL queries, do the commands, like SELECT and FROM have to be all capital letters?

Answer

No, SQLite, which Codecademy uses, is case-insensitive when it comes to clauses


like SELECT and FROM which can be cased in any way. This is different from other programming
languages such as Python where casing is quite important.

Example

/* Both of the following queries will return the same result. */

SELECT * FROM table;

select * from table;

QUERIES

As

Knowing how SELECT works, suppose we have the code below:

SELECT name AS 'Titles'


FROM movies;

Can you guess what AS does?

AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name
can be anything you want as long as you put it inside of single quotes. Here we renamed
the name column as Titles.

Some important things to note:

Although it’s not always necessary, it’s best practice to surround your aliases with single quotes.

When using AS, the columns are not being renamed in the table. The aliases only appear in the
result.

Instructions
1.

To showcase what the AS keyword does, select the name column and rename it with an alias of your
choosing.

Place the alias inside single quotes, like so:

SELECT name AS '______'


FROM movies;

Note in the result, that the name of the column is now your alias.

Checkpoint 2 Passed

Stuck? Get a hint

2.

Edit the query so that instead of selecting and renaming the name column, select
the imdb_rating column and rename it as IMDb.

Checkpoint 3 Passed

Hint

The AS syntax is as follows:

SELECT column AS 'Nickname'


FROM table_name;

To rename the imdb_rating to IMDb:

SELECT imdb_rating AS 'IMDb'


FROM movies;

Put single quotes around the alias.

SQL commands end with a ;.

There should only be one column in the result and its name should now be IMDb.

Question

Can we alias multiple columns in a single query?

Answer

Yes, you can alias multiple columns at a time in the same query.

It is advisable to always include quotes around the new alias. Quotes are required when the alias
contains spaces, punctuation marks, special characters or reserved keywords. It is simply more
convenient to always include quotes around the new alias.

Example

SELECT course_id AS "Course ID", exercise_id AS "Exercise ID"

FROM bugs;
QUERIES

Distinct

When we are examining data in a table, it can be helpful to know what distinct values exist in a
particular column.

DISTINCT is used to return unique values in the output. It filters out all duplicate values in the
specified column(s).

For instance,

SELECT tools
FROM inventory;

might produce:

tools

Hammer

Nails

Nails

Nails

By adding DISTINCT before the column name,

SELECT DISTINCT tools


FROM inventory;

the result would now be:

tools

Hammer

Nails

Filtering the results of a query is an important skill in SQL. It is easier to see the different
possible genres in the movie table after the data has been filtered than to scan every row in the
table.

Instructions

1.
Let’s try it out. In the code editor, type:

SELECT DISTINCT genre


FROM movies;

What are the unique genres?

Checkpoint 2 Passed

Hint

The different genres are:

action

comedy

horror

romance

drama

The empty set symbol ∅ is just an empty value.  DISTINCT recognize empty values, too.

2.

Now, change the code so we return the unique values of the year column instead.

Checkpoint 3 Passed

Hint

Suppose we only want to query the distinct results from a column. We will use the syntax:

SELECT DISTINCT column


FROM table_name;

Following this format, the code below returns the unique values of the year column:

SELECT DISTINCT year


FROM movies;

In the result, there should only be one column with all the distinct years.

Note: You might’ve noticed how there appears to be an empty set symbol ∅ near the bottom of the
results (right below 1987 and above 2017). It is not a bug! DISTINCT recognize empty values, too.

Question

Can we apply DISTINCT to a SELECT query with multiple columns?

Answer
Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note
that DISTINCT will filter out all rows that are not unique in terms of all selected columns.

Feel free to test this out in the editor to see what happens!

Example

Let’s assume that in the Codecademy database there is a table bugs which stores information about
opened bug reports. It might have columns
like course_id, exercise_id, reported_by, reported_date, report_url, etc. For the purpose of this
example, let’s say that this is our table:

id course_id exercise_id reported_by

1 5 4 Tod

2 5 4 Alex

3 5 3 Roy

4 5 4 Roy

5 7 4 Alex

6 7 8 Tod

7 14 2 Alex

8 14 4 Tod

9 14 6 Tod

10 14 2 Roy

Community Manager would like to know the names of the users who have reported bugs in order to
send them a special Thank You note. We can use a SELECT query with DISTINCT keyword to pick
unique values from the reported_by column:

> SELECT DISTINCT reported_by FROM bugs;

reported_by

Alex

Tod

Roy

Awesome! Exactly what we were expecting!

Our coworker would like to know in which exercises bugs have been reported. This gets trickier
because now we have to query two columns: course_id and exercise_id. Let’s try to use the same
approach as before:

> SELECT DISTINCT course_id, exercise_id FROM bugs;


course_id exercise_id

14 2

5 4

14 4

14 6

5 3

7 4

7 8

Is this the result we were hoping for? Yes. It is true that there are duplicated values in
the course_id and exercise_id, but every row is unique (there are no two rows with the same value
in course_id and exercise_id).

QUERIES

Where

We can restrict our query results using the WHERE clause in order to obtain only the information we
want.

Following this format, the statement below filters the result set to only include top rated movies
(IMDb ratings greater than 8):

SELECT *
FROM movies
WHERE imdb_rating > 8;

How does it work?

The WHERE clause filters the result set to only include rows where the following condition is true.

imdb_rating > 8 is the condition. Here, only rows with a value greater than 8 in
the imdb_rating column will be returned.

The > is an operator. Operators create a condition that can be evaluated as either true or false.

Comparison operators used with the WHERE clause are:

= equal to

!= not equal to

> greater than

< less than

>= greater than or equal to

<= less than or equal to

There are also some special operators that we will learn more about in the upcoming exercises.
Instructions

1.

Suppose we want to take a peek at all the not-so-well-received movies in the database.

In the code editor, type:

SELECT * 
FROM movies
WHERE imdb_rating < 5;

Ouch!

Checkpoint 2 Passed

Hint

We are trying to retrieve all the movies with ratings lower than 5.

Common errors:

Missing underscore in the imdb_rating column name.

Missing ; at the end.

2.

Edit the query so that it will now retrieve all the recent movies, specifically those that were released
after 2014.

Select all the columns using *.

Checkpoint 3 Passed

Hint

The condition here would be year > 2014

If you add the condition after the WHERE clause, it would look like:

SELECT *
FROM movies
WHERE year > 2014;

Question

Can we compare values of two columns in a WHERE clause?

Answer

Yes, within a WHERE clause you can compare the values of two columns.

When comparing two columns in a WHERE clause, for each row in the database, it will check the
value of each column and compare them.

Example
/*

This will return all rows where the value in the

x column is greater than the y column value.

*/

SELECT x, y

FROM coordinates

WHERE x > y;

QUERIES

Like I

LIKE can be a useful operator when you want to compare similar values.

The movies table contains two films with similar titles, ‘Se7en’ and ‘Seven’.

How could we select all movies that start with ‘Se’ and end with ‘en’ and have exactly one character
in the middle?

SELECT * 
FROM movies
WHERE name LIKE 'Se_en';

LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column.

name LIKE 'Se_en' is a condition evaluating the name column for a specific pattern.

Se_en represents a pattern with a wildcard character.

The _ means you can substitute any individual character here without breaking the pattern. The
names Seven and Se7en both match this pattern.

Instructions

1.

Let’s test it out.

In the code editor, type:

SELECT * 
FROM movies
WHERE name LIKE 'Se_en';

Checkpoint 2 Passed
Hint

Double-check your query character by character:

Note the single quotes around Se_en.

Note the underscore in it.

Question

Can we apply the LIKE operator to values other than TEXT?

Answer

Yes, you can apply the LIKE operator to numerical values as well.

Whenever you use LIKE however, you must always wrap the pattern within a pair of quotations,
whether for matching a number or a string.

Example

/*

This will select movies where the id number

starts with 2 and is followed by any two numbers.

*/

SELECT *

FROM movies

WHERE id LIKE '2__';

QUERIES

Like II

The percentage sign % is another wildcard character that can be used with LIKE.

This statement below filters the result set to only include movies with names that begin with the
letter ‘A’:

SELECT * 
FROM movies
WHERE name LIKE 'A%';

% is a wildcard character that matches zero or more missing letters in the pattern. For example:

A% matches all movies with names that begin with letter ‘A’

%a matches all movies that end with ‘a’

We can also use % both before and after a pattern:


SELECT * 
FROM movies
WHERE name LIKE '%man%';

Here, any movie that contains the word ‘man’ in its name will be returned in the result.

LIKE is not case sensitive. ‘Batman’ and ‘Man of Steel’ will both appear in the result of the query
above.

Instructions

1.

In the text editor, type:

SELECT * 
FROM movies
WHERE name LIKE '%man%';

How many movie titles contain the word ‘man’?

Checkpoint 2 Passed

Stuck? Get a hint

2.

Let’s try one more.

Edit the query so that it selects all the information about the movie titles that begin with the word
‘The’.

You might need a space in there!

Checkpoint 3 Passed

Hint

The condition should be name LIKE 'The %':

SELECT * 
FROM movies
WHERE name LIKE 'The %';

Notice how the % comes after The.

There is also a space in between because we don’t want words like ‘There’, ‘They’, etc.

Question

When using SQL LIKE operators, how do we search for patterns containing the actual characters “%”
or “_”?

Answer
When searching for a pattern containing the specific characters % or _, we can utilize the escape
character \, similarly to its use in Python.

If we want to search for these specific characters, we can simply add the escape character
immediately before them.

Example

/*

In this pattern, we use an escape character before '%'.

This will only match "%" and not be used like the

wildcard character.

This query will match any titles that end with

' 100%'.

*/

SELECT *

FROM books

WHERE title LIKE '% 100\%';

QUERIES

Is Null

By this point of the lesson, you might have noticed that there are a few missing values in
the movies table. More often than not, the data you encounter will have missing values.

Unknown values are indicated by NULL.

It is not possible to test for NULL values with comparison operators, such as = and !=.

Instead, we will have to use these operators:

IS NULL

IS NOT NULL

To filter for all movies with an IMDb rating:

SELECT name
FROM movies
WHERE imdb_rating IS NOT NULL;

Instructions
1.

Now let’s do the opposite.

Write a query to find all the movies without an IMDb rating.

Select only the name column!

Checkpoint 2 Passed

Hint

We want to query for movies that have a missing value in their imdb_rating field:

SELECT name
FROM movies
WHERE imdb_rating IS NULL;

Notice how we used IS NULL instead of IS NOT NULL here.

Question

When storing missing data, should I store them as NULL?

Answer

It can depend entirely on how you need the data to be stored and utilized.

Let’s say that you have a table of employee information, which included their address. Say that we
wanted to check all rows of this table and find where any addresses are missing. If we stored the
addresses as TEXT values, we might choose to store all the missing values as either '' or as NULL.

If we stored the missing address values as an empty string '' then these values are not NULL. Empty
strings are seen as a string of length 0. So, if we ran a query using

WHERE address IS NULL

it would not give us the rows with missing address values. We would have to check using

WHERE address = ''

With a table containing many different data types, it may be helpful and more convenient to store
any missing values in general as just NULL so that we can utilize the IS NULL and IS NOT
NULL operators.

QUERIES

Between

The BETWEEN operator is used in a WHERE clause to filter the result set within a certain range. It


accepts two values that are either numbers, text or dates.
For example, this statement filters the result set to only include movies with years from 1990 up
to, and including 1999.

SELECT *
FROM movies
WHERE year BETWEEN 1990 AND 1999;

When the values are text, BETWEEN filters the result set for within the alphabetical range.

In this statement, BETWEEN filters the result set to only include movies with names that begin with
the letter ‘A’ up to, but not including ones that begin with ‘J’.

SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'J';

However, if a movie has a name of simply ‘J’, it would actually match. This is
because BETWEEN goes up to the second value — up to ‘J’. So the movie named ‘J’ would be
included in the result set but not ‘Jaws’.

Instructions

1.

Using the BETWEEN operator, write a query that selects all information about movies
whose name begins with the letters ‘D’, ‘E’, and ‘F’.

Checkpoint 2 Passed

Hint

This should be very similar to the second query in the narrative.

BETWEEN 'D' AND 'G' should be the condition:

SELECT *
FROM movies
WHERE name BETWEEN 'D' AND 'G';

This will return all the names that begin with ‘D’, ‘E’, and ‘F’.

BETWEEN 'D' AND 'F' should not be the condition because it would return names that begin with ‘D’
and ‘E’, but not ‘F’ (unless there was a movie with the single letter name of “F”).

And don’t forget to capitalize D and G!

BETWEEN is case-sensitive. If the condition is BETWEEN 'a' AND 'z', it would only return lowercase (a-
z) results and not uppercase (A-Z).

2.

Remove the previous query.

Using the BETWEEN operator, write a new query that selects all information about movies that were
released in the 1970’s.

Checkpoint 3 Passed
Hint

In this statement, the BETWEEN operator is being used to filter the result set to only include movies
with years in 1970-1979:

SELECT *
FROM movies
WHERE year BETWEEN 1970 AND 1979;

Remember, BETWEEN two numbers is inclusive of the second number.

Notice that there is a movie from 1979 in the result.

Also, numeric values (INTEGER or REAL data types) don’t need to be wrapped with single quotes,
whereas TEXT values do.

Question

In SQL, when applying the BETWEEN operator on a range of TEXT values, each value must be


compared somehow to be ordered correctly. What kind of comparison is done on TEXT values?

Answer

In most programming languages, including SQLite and Python, TEXT or string values are compared
based on their lexicographical ordering, and when using the BETWEEN operator for a range
of TEXT values in SQL, the values will be sorted in this way.

Lexicographical ordering is basically the ordering you would find words in a dictionary. If we had two
words, they would be compared starting from their first letter, second letter, and so on, until we find
a non-matching letter. The word which has the letter that comes first in the alphabet would
ultimately be sorted to come first in this lexicographical ordering.

If two words have different lengths, but match up to the last letter of the shorter word, the shorter
word will appear first in the ordering.

Example

A = "Alien"

B = "Aliens"

C = "Alike"

/*

Because A and B share the same sequence of characters

up to the last character of A, which is shorter, A < B.

Also, because "k" comes after "e" in the alphabet, C will

come last in the ordering of these 3 words.


A<B<C

*/

QUERIES

And

Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more
specific and useful.

One way of doing this is to use the AND operator. Here, we use the AND operator to only return 90’s
romance movies.

SELECT * 
FROM movies
WHERE year BETWEEN 1990 AND 1999
   AND genre = 'romance';

year BETWEEN 1990 AND 1999 is the 1st condition.

genre = 'romance' is the 2nd condition.

AND combines the two conditions.

With AND, both conditions must be true for the row to be included in the result.

Instructions

1.

In the previous exercise, we retrieved every movie released in the 1970’s.

Now, let’s retrieve every movie released in the 70’s, that’s also well received.

In the code editor, type:

SELECT *
FROM movies
WHERE year BETWEEN 1970 AND 1979
  AND imdb_rating > 8;

Checkpoint 2 Passed

Stuck? Get a hint

2.

Remove the previous query.


Suppose we have a picky friend who only wants to watch old horror films.

Using AND, write a new query that selects all movies made prior to 1985 that are also in
the horror genre.

Checkpoint 3 Passed

Hint

What are the two conditions?

year < 1985

genre = 'horror'

So your query should look like:

SELECT *
FROM movies
WHERE year < 1985
   AND genre = 'horror';

We indented and placed AND genre = 'horror' on another line just so it is easier to read.

Also, numeric values (1985) don’t need to be wrapped with single quotes, whereas string values do
('horror').

QUERIES

Or

Similar to AND, the OR operator can also be used to combine multiple conditions in WHERE, but
there is a fundamental difference:

AND operator displays a row if all the conditions are true.

OR operator displays a row if any condition is true.

Suppose we want to check out a new movie or something action-packed:

SELECT *
FROM movies
WHERE year > 2014
   OR genre = 'action';

year > 2014 is the 1st condition.

genre = 'action' is the 2nd condition.

OR combines the two conditions.

With OR, if any of the conditions are true, then the row is added to the result.
Instructions

1.

Let’s test this out:

SELECT *
FROM movies
WHERE year > 2014
   OR genre = 'action';

Checkpoint 2 Passed

Hint

This retrieves all the movies released after 2014 or in the action genre.

We are putting OR genre = 'action' on another line and indented just so it is easier to read.

2.

Suppose we are in the mood for a good laugh or a good cry.

Using OR, write a query that returns all movies that are either a romance or a comedy.

Checkpoint 3 Passed

Hint

What are the two conditions?

genre = 'romance'

genre = 'comedy'

So your query should look like:

SELECT *
FROM movies
WHERE genre = 'romance'
   OR genre = 'comedy';

We indented and placed OR genre = 'comedy' on another line just so it is easier to read.

Are there any good romantic comedies in the list?

Question

In a SQL query, can we write conditions that utilize both AND and OR?

Answer

Yes, queries can combine multiple conditions using AND and OR without a real limit to how many
conditions you can combine. However, the more conditions you combine, the more specific the
results will be and the more complex it can get.
Example

/*

This will select movies with id values

from 10 to 20 inclusive, OR with id

values from 50 to 60 inclusive.

*/

SELECT *

FROM movies

WHERE

(id > 10 AND id < 20)

OR

(id > 50 AND id < 60);

QUERIES

Order By

That’s it with WHERE and its operators. Moving on!

It is often useful to list the data in our result set in a particular order.

We can sort the results using ORDER BY, either alphabetically or numerically. Sorting the results
often makes the data more useful and easier to analyze.

For example, if we want to sort everything by the movie’s title from A through Z:

SELECT *
FROM movies
ORDER BY name;

ORDER BY is a clause that indicates you want to sort the result set by a particular column.

name is the specified column.

Sometimes we want to sort things in a decreasing order. For example, if we want to select all of the
well-received movies, sorted from highest to lowest by their year:

SELECT *
FROM movies
WHERE imdb_rating > 8
ORDER BY year DESC;

DESC is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A).

ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z).

The column that we ORDER BY doesn’t even have to be one of the columns that we’re displaying.

Note: ORDER BY always goes after WHERE (if WHERE is present).

Instructions

1.

Suppose we want to retrieve the name and year columns of all the movies, ordered by their name
alphabetically.

Type the following code:

SELECT name, year


FROM movies
ORDER BY name;

Checkpoint 2 Passed

Stuck? Get a hint

2.

Your turn! Remove the previous query.

Write a new query that retrieves the name, year, and imdb_rating columns of all the movies,
ordered highest to lowest by their ratings.

Checkpoint 3 Passed

Hint

What are the columns that are selected and the table we are interested in?

SELECT name, year, imdb_rating


FROM movies;

Next, let’s sort them.

SELECT name, year, imdb_rating


FROM movies
ORDER BY imdb_rating DESC;

We added DESC here because we want to sort it in a descending order.

If you run this query, the result will start with movies with an IMDb rating of 9.0 all the way down to
4.2.
Question

In SQL, can we apply ORDER BY with multiple columns?

Answer

Yes, following the ORDER BY, you can list more than one column for which to order the data by.

When ordering by more than one column, it will first order the data on the first column, then,
keeping the previous column order, it will order on the next column, and so on.

You can also specify ascending or descending order for each listed column.

Example

/*

This will order on the year, then order the

names in reverse alphabetical

order, preserving the order

of the year column.

*/

SELECT year, name

FROM movies

ORDER BY year ASC, name DESC;

Limit

We’ve been working with a fairly small table (fewer than 250 rows), but most SQL tables contain
hundreds of thousands of records. In those situations, it becomes important to cap the number of
rows in the result.

For instance, imagine that we just want to see a few examples of records.

SELECT *
FROM movies
LIMIT 10;

LIMIT is a clause that lets you specify the maximum number of rows the result set will have. This
saves space on our screen and makes our queries run faster.
Here, we specify that the result set can’t have more than 10 rows.

LIMIT always goes at the very end of the query. Also, it is not supported in all SQL databases.

Instructions

1.

Combining your knowledge of LIMIT and ORDER BY, write a query that returns the top 3 highest
rated movies.

Select all the columns.

Checkpoint 2 Passed

Hint

First, what column(s) and table are we interested in?

SELECT *
FROM movies;

Next, sort them by rating (descending so we start from the highest).

SELECT *
FROM movies
ORDER BY imdb_rating DESC;

Lastly, add a LIMIT cap.

SELECT *
FROM movies
ORDER BY imdb_rating DESC
LIMIT 3;

If you run this query, the result will be ‘The Dark Knight’ at an impressive 9.0, ‘Inception’ and ‘Star
Wars: Episode V - The Empire Strikes Back’ tying for second with a rating of 8.8.

Case

A CASE statement allows us to create different outputs (usually in the SELECT statement). It is SQL’s


way of handling if-then logic.

Suppose we want to condense the ratings in movies to three levels:

If the rating is above 8, then it is Fantastic.

If the rating is above 6, then it is Poorly Received.

Else, Avoid at All Costs.

SELECT name,
CASE
  WHEN imdb_rating > 8 THEN 'Fantastic'
  WHEN imdb_rating > 6 THEN 'Poorly Received'
  ELSE 'Avoid at All Costs'
END
FROM movies;

Each WHEN tests a condition and the following THEN gives us the string if the condition is true.

The ELSE gives us the string if all the above conditions are false.

The CASE statement must end with END.

In the result, you have to scroll right because the column name is very long. To shorten it, we can
rename the column to ‘Review’ using AS:

SELECT name,
CASE
  WHEN imdb_rating > 8 THEN 'Fantastic'
  WHEN imdb_rating > 6 THEN 'Poorly Received'
  ELSE 'Avoid at All Costs'
END AS 'Review'
FROM movies;

Instructions

1.

Let’s try one on your own.

Select the name column and use a CASE statement to create the second column that is:

‘Chill’ if genre = 'romance'

‘Chill’ if genre = 'comedy'

‘Intense’ in all other cases

Optional: Rename the whole CASE statement to ‘Mood’ using AS.

Give it your best shot! Check hint for the answer.

Checkpoint 2 Passed

Hint

This is the final boss!

Your query should look like:

SELECT name,
CASE
  WHEN genre = 'romance' THEN 'Chill'
  WHEN genre = 'comedy'  THEN 'Chill'
  ELSE 'Intense'
END AS 'Mood'
FROM movies;

If the genre is romance, then it is Chill.


If the genre is comedy, then it is Chill.

Else, it is Intense.

Don’t forget the comma after name.

Here is another query that will give us the same result:

SELECT name,
CASE
  WHEN genre = 'romance' OR genre = 'comedy'
   THEN 'Chill'
  ELSE 'Intense'
END AS 'Mood'
FROM movies;

If the genre is romance or comedy, then it is Chill.

Else, it is Intense.

Question

For a CASE statement, do all values provided by THEN have to match a single data type?

Answer

No, for CASE statements, the THEN values do not have to return only a single type of value. In fact,
you can have each THEN in a single CASE statement return different value types such as TEXT, REAL,
and INTEGER.

Example

SELECT

CASE

WHEN condition1 THEN "text"

WHEN condition2 THEN 100

WHEN condition3 THEN 3.14

END AS 'example'

FROM table;

QUERIES

Review

Congratulations!

We just learned how to query data from a database using SQL. We also learned how to filter queries
to make the information more specific and useful.
Let’s summarize:

SELECT is the clause we use every time we want to query information from a database.

AS renames a column or table.

DISTINCT return unique values.

WHERE is a popular command that lets you filter the results of the query based on conditions that
you specify.

LIKE and BETWEEN are special operators.

AND and OR combines multiple conditions.

ORDER BY sorts the result.

LIMIT specifies the maximum number of rows that the query will return.

CASE creates different outputs.

Project: Watching the Stock Market

This project will take you off-platform and get you started in your own developer environment!
Never done that before? Not to worry - we’ve shared some resources to help you down below. This
project can be completed entirely on your own - or, you can join the Codecademy Discord 167 and
find someone to work with!

This project is broken down into key questions that your client or company is looking to answer. As a
data scientist, you’ll often become a resource to help businesses answer the key questions about the
efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on investments. To start,
you will be watching the stock market, collecting data, and identifying trends!

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the
following courses or lessons on the Codecademy platform:

What is a Relational Database Management System?  870

Manipulation 444

Queries 276

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to
complete this project - we suggest the following:

DB Browser for SQLite 3.1k


Project Tasks

Get started - hosting your project

DB Browser for SQLite 3.1k is a visual tool for working with SQLite databases. Follow the link to
download the application for your computer.

SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To
open a database, select Open Database at the top of the window and browse for the file.
Alternatively, you can choose to create a New Database by saving a file with
the .sqlite or .db extension.

To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse
for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to
convert fields to numeric type as needed. See here 331 for how to do that.)

There are several tabs near the top of the window for working with the data:

Database Structure: View the tables in your database and the columns they contain.

Browse Data: Browse the data for each table.

Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Manipulation: Collect data on your pick of 5 stocks 1.4k.

Create a table called stocks, where you will be inserting your data.

Hint: See here 708 for a review of the CREATE TABLE syntax. What data type 636 should each field


be?

The stocks table should have a column for symbol, name, datetime, and price.

Collect your data! Choose 3 times throughout the day to document the price of each stock and
continue for at least 1 week. You can do this moving forward, or just take a retroactive look at the
stock market by taking data historically from regular intervals (e.g. the first of the month for the last
six months).

Hint: See here 423 for a review of the INSERT INTO syntax. When inserting the datetime, use the
standard format ‘yyyy-mm-dd hh:mm:ss’. Use the strftime() 407 function to help you get the
datetime of ‘now’.

Queries: Perform basic analysis on the data and identify trends.

What are the distinct stocks in the table?

Query all data for a single stock. Do you notice any overall trends?
Which rows have a price above 100? between 40 to 50, etc?

Sort the table by price. What are the minimum and maximum prices?

Additional Challenges

Intermediate Challenge

Explore using aggregate functions to look at key statistics about the data (e.g., min, max, average).

Group the data by stock and repeat. How do the stocks compare to each other?

Group the data by day or hour of day. Does day of week or time of day impact prices?

Which of the rows have a price greater than the average of all prices in the dataset?

Advanced Challenge

In addition to the built-in aggregate functions, explore ways to calculate other key statistics about
the data, such as the median or variance.

Hint: See here 147 and here 79 for possible solutions.

Let’s refactor the data into 2 tables - stock_info to store general info about the stock itself
(ie. symbol, name) and stock_prices to store the collected data on price (ie. symbol, datetime, price).

Hint: You can use the SQL CREATE TABLE AS statement to create a table by copying the columns of
an existing table. Don’t forget to also drop certain columns from the original table and rename it.

Now, we do not need to repeat both symbol and name for each row of price data. Instead, join the 2
tables in order to view more information on the stock with each row of price.

Add more variables to the stock_info table and update the data (e.g., sector, industry, etc).

Resources & Support

Project-specific resources

SQLite Data Types 636

SQLite Date and Time Functions 66

SQLite strftime() Function 407

SQLite Documentation 37

SQLite Tutorial 86

General Resources

How to get set-up for coding on your computer 257

What is a Relational Database Management System?  27

What you need to know about Git, GitHub & Coding in Teams

How developer teams work 20

First steps in tackling a group project 11


Resource on writing pseudocode 18 to get started with off-platform projects

Community Support

Looking for additional help or someone to work with (or somewhere to brag about your finished
project)? Join our Discord 167 to meet other learners like yourself!

Collaborate with other learners on data collection! Then, join the datasets together for more
interesting analysis.

Each learner can collect data on different stocks for a larger sample of stocks.

Each learner can collect data on same 5 stocks, but at different points throughout the day in order to
spot potential daily trends.

Once you’re done…

AGGREGATE FUNCTIONS

Introduction

We’ve learned how to write queries to retrieve information from the database. Now, we are going to
learn how to perform calculations using SQL.

Calculations performed on multiple rows of a table are called aggregates.

In this lesson, we have given you a table named fake_apps which is made up of fake mobile
applications data.

Here is a quick preview of some important aggregates that we will cover in the next five exercises:

COUNT(): count the number of rows

SUM(): the sum of the values in a column

MAX()/MIN(): the largest/smallest value

AVG(): the average of the values in a column

ROUND(): round the values in the column

Let’s get started!

Instructions

1.

Before getting started, take a look at the data in the fake_apps table.

In the code editor, type the following:

SELECT *
FROM fake_apps;

What are the column names?


Checkpoint 2 Passed

Hint

The column names are id, name, category, downloads, and price.

Question

In general, what is a function?

Answer

A very general description of a function would be: A set of tasks or procedures that can take in a
value, and return another value based on that input.

Functions in programming are similar to ones that you may have seen in math. For example,

f(x, y) = x^2 + y^2

If we use this function with two input values, it would return the sum of the squares of both values.

Similarly, in SQL, aggregate functions can take in a column name of a table, and will return some
numerical value based on the column values. For example,

SELECT COUNT(col) FROM table;

This will return a single number, which is the number of rows that have non-empty values in the
column col.

Some functions we will learn about later can even take values directly, instead of just column names,
like

ROUND(10.4, 0)

and will return a value based on the input. The above would result in 10.0.

AGGREGATE FUNCTIONS

Count

The fastest way to calculate how many rows are in a table is to use the COUNT() function.

COUNT() is a function that takes the name of a column as an argument and counts the number of
non-empty values in that column.

SELECT COUNT(*)
FROM table_name;

Here, we want to count every row, so we pass * as an argument inside the parenthesis.

Instructions

1.
Let’s count how many apps are in the table.

In the code editor, run:

SELECT COUNT(*)
FROM fake_apps;

Checkpoint 2 Passed

Stuck? Get a hint

2.

Add a WHERE clause in the previous query to count how many free apps are in the table.

Checkpoint 3 Passed

Hint

Remember the WHERE statement?

The following code should go inside the previous query, before the semicolon:

SELECT COUNT(*)
FROM fake_apps
WHERE price = 0;

WHERE indicates we want to only include rows where the following condition is true.

price = 0 is the condition.

There are 73 free apps in the table.

Question

When using the SQL COUNT() function for a column, does it include duplicate values?

Answer

Yes, when using the COUNT() function on a column in SQL, it will include duplicate values by default.
It essentially counts all rows for which there is a value in the column.

If you wanted to count only the unique values in a column, then you can utilize the DISTINCT clause
within the COUNT() function.

Example

/* This will return 22, the number of distinct category values. */

SELECT COUNT(DISTINCT category)

FROM fake_apps;

AGGREGATE FUNCTIONS
Sum

SQL makes it easy to add all values in a particular column using SUM().

SUM() is a function that takes the name of a column as an argument and returns the sum of all the
values in that column.

What is the total number of downloads for all of the apps combined?

SELECT SUM(downloads)
FROM fake_apps;

This adds all values in the downloads column.

Instructions

1.

Let’s find out the answer!

In the code editor, type:

SELECT SUM(downloads)
FROM fake_apps;

Checkpoint 2 Passed

Hint

There are 3,322,760 total downloads.

Question

When do we use the COUNT() function or the SUM() function?

Answer

Although they might appear to perform a similar task, the COUNT() and SUM() functions have very
different uses.

COUNT() is used to take a name of a column, and counts the number of non-empty values in that
column. COUNT() does not take into account the actual values stored, and only cares if they have a
non-empty value. Each row is essentially counted as 1 towards the total count.

On the other hand, SUM() takes a column name, and returns the sum of all values in the column,
meaning that it must take into account the actual values stored.

In general, use COUNT() when you want to count how many rows contain a non-empty value for a
specified column. Use SUM() when you want to get the total sum of all values in a column.

AGGREGATE FUNCTIONS

Max / Min
The MAX() and MIN() functions return the highest and lowest values in a column, respectively.

How many downloads does the most popular app have?

SELECT MAX(downloads)
FROM fake_apps;

The most popular app has 31,090 downloads!

MAX() takes the name of a column as an argument and returns the largest value in that column.
Here, we returned the largest value in the downloads column.

MIN() works the same way but it does the exact opposite; it returns the smallest value.

Instructions

1.

What is the least number of times an app has been downloaded?

In the code editor, type:

SELECT MIN(downloads)
FROM fake_apps;

Checkpoint 2 Passed

Hint

1,387 downloads.

2.

Delete the previous query.

Write a new query that returns the price of the most expensive app.

Checkpoint 3 Passed

Hint

SELECT MAX(price)
FROM fake_apps;

$14.99 is the price of the most expensive app.

Question

If multiple rows have the minimum or maximum value, which one is returned when
using MAX/MIN?

Answer

Typically, when you have more than one row that contains the minimum or maximum value in a
column, the topmost row containing that value will be returned in the result.
For example, if the table contained multiple rows with the minimum price of 0.0, then the result of a
query with MIN(price) will choose the topmost row from the table that had this price value.

Example

/*

This should return the siliconphase app, because

it was the topmost row that had the minimum price

value of the column.

*/

SELECT id, name, MIN(price)

FROM fake_apps;

AGGREGATE FUNCTIONS

Average

SQL uses the AVG() function to quickly calculate the average value of a particular column.

The statement below returns the average number of downloads for an app in our database:

SELECT AVG(downloads)
FROM fake_apps;

The AVG() function works by taking a column name as an argument and returns the average value
for that column.

Instructions

1.

Calculate the average number of downloads for all the apps in the table.

In the code editor, type:

SELECT AVG(downloads)
FROM fake_apps;

Checkpoint 2 Passed

Hint

16,613.8 average downloads.

2.

Remove the previous query.

Write a new query that calculates the average price for all the apps in the table.
Checkpoint 3 Passed

Hint

Which column should go inside the parenthesis?

SELECT AVG(_____)
FROM fake_apps;

The average price is $2.02365.

Question

In SQL, how can we get the average of only the unique values of a column?

Answer

To run the AVG() function on a column such that it only averages the unique values in the column,
we could use the DISTINCT clause right before the column name.

Example

/* Returns 2.02365 */

SELECT AVG(price)

FROM fake_apps;

/* Returns 4.15833.. */

SELECT AVG(DISTINCT price)

FROM fake_apps;

AGGREGATE FUNCTIONS

Round

By default, SQL tries to be as precise as possible without rounding. We can make the result table
easier to read using the ROUND() function.

ROUND() function takes two arguments inside the parenthesis:

a column name

an integer

It rounds the values in the column to the number of decimal places specified by the integer.
SELECT ROUND(price, 0)
FROM fake_apps;

Here, we pass the column price and integer 0 as arguments. SQL rounds the values in the column to
0 decimal places in the output.

Instructions

1.

Let’s return the name column and a rounded price column.

In the code editor, type:

SELECT name, ROUND(price, 0)


FROM fake_apps;

Checkpoint 2 Passed

Stuck? Get a hint

2.

Remove the previous query.

In the last exercise, we were able to get the average price of an app ($2.02365) using this query:

SELECT AVG(price)
FROM fake_apps;

Now, let’s edit this query so that it rounds this result to 2 decimal places.

This is a tricky one!

Checkpoint 3 Passed

Hint

You can treat AVG(price) just like any other value and place it inside the ROUND function like so:

ROUND(AVG(price), 2)

Here, AVG(price) is the 1st argument and 2 is the 2nd argument because we want to round it to two
decimal places:

SELECT ROUND(AVG(price), 2)
FROM fake_apps;

Question

Does the ROUND() function round up?

Answer

When using the ROUND() function, you can provide a second argument, which is the precision, or
number of decimal places to round the number on.
In SQLite, rounding is done by rounding up if the next decimal value is 5, and rounds down if the
value is less than 5.

For example,

/* This will result in 4.0 */

SELECT ROUND(3.5, 0);

/* This will result in 6.4 */

SELECT ROUND(6.42, 1);

/* This will result in 6.0 */

SELECT ROUND(6.42, 0);

AGGREGATE FUNCTIONS

Group By I

Oftentimes, we will want to calculate an aggregate for data with certain characteristics.

For instance, we might want to know the mean IMDb ratings for all movies each year. We could
calculate each number by a series of queries with different WHERE statements, like so:

SELECT AVG(imdb_rating)
FROM movies
WHERE year = 1999;

SELECT AVG(imdb_rating)
FROM movies
WHERE year = 2000;

SELECT AVG(imdb_rating)
FROM movies
WHERE year = 2001;

and so on.

Luckily, there’s a better way!

We can use GROUP BY to do this in a single step:

SELECT year,
   AVG(imdb_rating)
FROM movies
GROUP BY year
ORDER BY year;

GROUP BY is a clause in SQL that is used with aggregate functions. It is used in collaboration with
the SELECT statement to arrange identical data into groups.

The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT.

Instructions

1.

In the code editor, type:

SELECT price, COUNT(*)


FROM fake_apps
GROUP BY price;

Here, our aggregate function is COUNT() and we arranged price into groups.

What do you expect the result to be?

Checkpoint 2 Passed

Hint

The result contains the total number of apps for each price.

It is organized into two columns, making it very easy to see the number of apps at each price.

2.

In the previous query, add a WHERE clause to count the total number of apps that have been
downloaded more than 20,000 times, at each price.

Checkpoint 3 Passed

Hint

Remember, WHERE statement goes before the GROUP BY statement:

SELECT price, COUNT(*)


FROM fake_apps
WHERE downloads > 20000
GROUP BY price;

3.

Remove the previous query.

Write a new query that calculates the total number of downloads for each category.

Select category and SUM(downloads).

Checkpoint 4 Passed

Hint

First, select the two columns we want:


SELECT category, SUM(downloads)
FROM fake_apps;

Next, group the result for each category by adding a GROUP BY:

SELECT category, SUM(downloads)


FROM fake_apps
GROUP BY category;

Question

When using the GROUP BY clause, do we always have to group by one of the selected columns listed
after SELECT?

Answer

No, you can GROUP BY a column that was not included in the SELECT statement.

For example, this query does not list the price column in the SELECT, but it does group the data by
that column.

SELECT name, downloads

FROM fake_apps

GROUP BY price;

However, usually we do include the grouped by column in the SELECT for the sake of clarity, so that
it’s easier to see what rows belong to which group.

AGGREGATE FUNCTIONS

Group By II

Sometimes, we want to GROUP BY a calculation done on a column.

For instance, we might want to know how many movies have IMDb ratings that round to 1, 2, 3, 4, 5.
We could do this using the following syntax:

SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY ROUND(imdb_rating)
ORDER BY ROUND(imdb_rating);

However, this query may be time-consuming to write and more prone to error.

SQL lets us use column reference(s) in our GROUP BY that will make our lives easier.

1 is the first column selected

2 is the second column selected


3 is the third column selected

and so on.

The following query is equivalent to the one above:

SELECT ROUND(imdb_rating),
   COUNT(name)
FROM movies
GROUP BY 1
ORDER BY 1;

Here, the 1 refers to the first column in our SELECT statement, ROUND(imdb_rating).

Instructions

1.

Suppose we have the query below:

SELECT category,
   price,
   AVG(downloads)
FROM fake_apps
GROUP BY category, price;

Write the exact query, but use column reference numbers instead of column names after GROUP BY.

Checkpoint 2 Passed

Hint

These numbers represent the selected columns:

1 refers to category.

2 refers to price.

3 refers to AVG(downloads)

Now, change the GROUP BY with numbers:

SELECT category,
   price,
   AVG(downloads)
FROM fake_apps
GROUP BY 1, 2;

Note: Even if you use column names instead of numbers, it will still be correct because these two
queries are exactly the same!

Question

Do column references have to follow the order the columns are listed in the SELECT?

Answer
No, once you list the columns after the SELECT, they can be referenced by the order they appeared,
starting from 1 for the first listed column.

You are not limited to referencing them in the exact order they were listed, like

GROUP BY 1, 2, 3

You can freely use the references in any order, like you would normally without using references.

GROUP BY 3, 1, 2

However, when using references, it is important to always keep in mind what numbers referenced
which column, as it can become confusing as you list more columns in the SELECT. It is a convenient
shortcut, but not necessarily always the best choice.

AGGREGATE FUNCTIONS

Having

In addition to being able to group data using GROUP BY, SQL also allows you to filter which groups to
include and which to exclude.

For instance, imagine that we want to see how many movies of different genres were produced each
year, but we only care about years and genres with at least 10 movies.

We can’t use WHERE here because we don’t want to filter the rows; we want to filter groups.

This is where HAVING comes in.

HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can
be used with HAVING.

We can use the following for the problem:

SELECT year,
   genre,
   COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;

When we want to limit the results of a query based on values of the individual rows, use WHERE.

When we want to limit the results of a query based on an aggregate property, use HAVING.

HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.

Instructions

1.

Suppose we have the query below:

SELECT price,
   ROUND(AVG(downloads)),
   COUNT(*)
FROM fake_apps
GROUP BY price;

It returns the average downloads (rounded) and the number of apps – at each price point.

However, certain price points don’t have very many apps, so their average downloads are less
meaningful.

Add a HAVING clause to restrict the query to price points that have more than 10 apps.

Checkpoint 2 Passed

Hint

The total number of apps at each price point would be given by COUNT(*).

SELECT price,
   ROUND(AVG(downloads)),
   COUNT(*)
FROM fake_apps
GROUP BY price
HAVING COUNT(*) > 10;

COUNT(*) > 10 is the condition.

Because the condition has an aggregate function in it, we have to use HAVING instead of WHERE.

Question

Can a WHERE clause be applied with a HAVING statement in the same query?

Answer

Yes, you can absolutely apply a WHERE clause in a query that also utilizes a HAVING statement.

When you apply a WHERE clause in the same query, it must always be before any GROUP BY, which
in turn must be before any HAVING.

As a result, the data is essentially filtered on the WHERE condition first. Then, from this filtered data,
it is grouped by specified columns and then further filtered based on the HAVING condition.

Example

/*

This will first filter the movies with a box_office > 500000.

Then, it will group those results by genre, and finally restrict

the query to genres that have more than 5 movies.

*/
SELECT genre, ROUND(AVG(score))

FROM movies

WHERE box_office > 500000

GROUP BY genre

HAVING COUNT(*) > 5;

AGGREGATE FUNCTIONS

Review

Congratulations!

You just learned how to use aggregate functions to perform calculations on your data. What can we
generalize so far?

COUNT(): count the number of rows

SUM(): the sum of the values in a column

MAX()/MIN(): the largest/smallest value

AVG(): the average of the values in a column

ROUND(): round the values in the column

Aggregate functions combine multiple rows together to form a single value of more meaningful
information.

GROUP BY is a clause used with aggregate functions to combine data from one or more columns.

HAVING limit the results of a query based on an aggregate property.

Project: Trends in Estimated Home Values

some information has changed. please see the rest of thread to apply corrections

This project will take you off-platform and get you started in your own developer environment!
Never done that before? Not to worry - we’ve shared some resources to help you down below. This
project can be completed entirely on your own - or, you can join our Community Discord 198 and
find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a
data scientist, you’ll often become a resource to help businesses answer the key questions about the
efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on real estate
investments. Start by analyzing the data on median estimated values of single family homes by zip
codes from the past two decades.
Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the
following courses or lessons on the Codecademy platform:

Queries 456

Aggregate Functions 144

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to
complete this project - we suggest the following:

DB Browser for SQLite 1.5k

Project Tasks

Get started - hosting your project

DB Browser for SQLite 1.5k is a visual tool for working with SQLite databases. Follow the link to
download the application for your computer.

SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To
open a database, select Open Database at the top of the window and browse for the file.
Alternatively, you can choose to create a New Database by saving a file with
the .sqlite or .db extension.

To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse
for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to
convert fields to numeric type as needed. See here 598 for how to do that.)

You can download the data you’ll be using for this specific project here 146.

There are several tabs near the top of the window for working with the data:

Database Structure: View the tables in your database and the columns they contain.

Browse Data: Browse the data for each table.

Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Exploration: Familiarize yourself with the dataset.

How many distinct zip codes are in this dataset?

How many zip codes are from each state?


What range of years are represented in the data?

Hint: The date column is in the format yyyy-mm. Try taking a look at using the substr() function to
help extract just the year.

Using the most recent month of data available, what is the range of estimated home values across
the nation?

Note: When we imported the data from a CSV file, all fields are treated as a string. Make sure to
convert the value field into a numeric type if you will be ordering by that field. See here 598 for a
hint.

Analysis: Explore how home value differ by region as well as change over time.

Using the most recent month of data available, which states have the highest average home values?
How about the lowest?

Which states have the highest/lowest average home values for the year of 2017? What about for the
year of 2007? 1997?

Additional Challenges

Intermediate Challenge

What is the percent change 162 in average home values from 2007 to 2017 by state? How about
from 1997 to 2017?

Hint: We can use the WITH clause to create temporary tables containing the average home values
for each of those years, then join them together to compare the change over time.

How would you describe the trend in home values for each state from 1997 to 2017? How about
from 2007 to 2017? Which states would you recommend for making real estate investments?

Advanced Challenge

Join the house value data with the table of zip-code level census data. Do there seem to be any
correlations between the estimated house values and characteristics of the area, such as population
count or median household income?

Resources & Support

Project-specific resources

SQLite Documentation 111

SQLite Tutorial 109

SQLite substr() Function 60

Home Value Data 298

General Resources

How to get set-up for coding on your computer 125

What is a Relational Database Management System?  12

What you need to know about Git, GitHub & Coding in Teams
How developer teams work 16

First steps in tackling a group project 7

Resource on writing pseudocode 26 to get started with off-platform projects

MULTIPLE TABLES

Introduction

In order to efficiently store data, we often spread related information across multiple tables.

For instance, imagine that we’re running a magazine company where users can have different types
of subscriptions to different products. Different subscriptions might have many different properties.
Each customer would also have lots of associated information.

We could have one table with all of the following information:

order_id

customer_id

customer_name

customer_address

subscription_id

subscription_description

subscription_monthly_price

subscription_length

purchase_date

However, a lot of this information would be repeated. If the same customer has multiple
subscriptions, that customer’s name and address will be reported multiple times. If the same
subscription type is ordered by multiple customers, then the subscription price and subscription
description will be repeated. This will make our table big and unmanageable.

So instead, we can split our data into three tables:

1. orders would contain just the information necessary to describe what was ordered:

order_id, customer_id, subscription_id, purchase_date

2. subscriptions would contain the information to describe each type of subscription:

subscription_id, description, price_per_month, subscription_length

3. customers would contain the information for each customer:

customer_id, customer_name, address

In this lesson, we’ll learn the SQL commands that will help us work with data that is stored in
multiple tables.

Instructions
1.

Examine these tables by pasting the following code into the editor:

SELECT *
FROM orders
LIMIT 5;

SELECT *
FROM subscriptions
LIMIT 5;

SELECT * 
FROM customers
LIMIT 5;

In the context of this exercise 129, are there any guidelines and reasons for splitting the table as
shown in the example?

Answer

Yes, although there might not necessarily be a one-fits-all solution to every possible database
structure, there are some guidelines that can help you when you need to split a table into multiple
tables.

A concept that most programmers use to refer to restructuring a database in this manner
is “Database Normalization”. A very brief explanation of this concept is that it aims to accomplish
that databases tables are structured so that they avoid data redundancy and keep the data accurate
and consistent. “Data redundancy” is when we have a lot of repeated, or redundant, data.

In the example given in this exercise, it explains how we might split a table into multiple tables,
which is essentially doing normalization. Initially, the table has many columns, and as a result, some
of the values will most likely be repeated many times, which introduces data redundancy.

Say for example, we added 1 million rows to this table. Some values like customer_address might
end up being stored many thousands of times, when we really only need to store it once per
customer.

To avoid this issue, we would split the table so that the information is stored more concisely. If we
need the customer information, we can obtain it from the customers table, by their customer_id.
And, if we need the information for a subscription, all of its information is stored in
the subscriptions table. We only need the customer_id and subscription_id in the orders table, and
we can obtain their information from their respective tables.

Employee Table

ID ENAME DOJ SALARY BONUS DEPT DESIGNATION MANAGER COMPID

1 James Potter 01-Jun-14 75000 1000 ICP PM NULL 1001


ID ENAME DOJ SALARY BONUS DEPT DESIGNATION MANAGER COMPID

2 Ethan McCarty 01-Feb-14 90000 1200 ETA PM NULL NULL

3 Emily Rayner 01-Jan-14 25000 100 ETA SE 2 1002

4 Jack Abraham 01-Jul-14 30000 NULL ETA SSE 2 NULL

5 Ayaz Mohammad 01-Apr-14 40000 NULL ICP TA 1 1003

Employee Table Structure

Column Name Data Type Null Definition Constraint

ID NUMBER NOT NULL PRIMARY KEY

ENAME VARCHAR2(40) NOT NULL

DOJ DATE NOT NULL Default SYSDATE

SALARY NUMBER(9,2) NOT NULL

BONUS NUMBER(9,2) NULL

DEPT CHAR(3) NOT NULL check(Dept IN ('ICP','ETA','IVS'))

DESIGNATION CHAR(3) NOT NULL

MANAGER NUMBER NULL References EMPLOYEE(ID)

COMPID NUMBER NULL UNIQUE,References COMPUTER(COMPID)


Computer Table

COMPID MAKE MODEL MYEAR

1001 Dell Vostro 2013

1002 Dell Precision 2014

1003 Lenovo Edge 2013

1004 Lenovo Horizon 2014

create table employeee (

ID numbercontraint emp_Id_pk primary key,

ENAME varchar2(20),

DOJ date Default SYSDATE,

SALARY number,

BONUS number,

DEPT char(6) check(Dept IN ('ICP','ETA','IVS')),

DESIGNATION char(3),

MANAGER number References EMPLOYEE(ID),

COMPID number References COMPUTER(COMPID)

);

insert into employeee values(1, "James Potter", "01-Jun-14", 75000, 1000, "ICP", "PM",
NULL, 1001);

insert into employeee values(2, "Ethan McCarty", "01-Feb-14", 90000, 1200, "ETA", "PM",
NULL, NULL);

insert into employeee values(3, "Emily Rayner", "01-Jan-14", 25000, 100, "ETA", "SE", 2,
1002);

insert into employeee values(4, "Jack Abraham", "01-Jul-14", 30000, NULL, "ETA", "SSE", 2,
NULL);

insert into employeee values(5, "Ayaz Mohammad", "01-Apr-14", 40000, NULL, "ICP",
"TA", 1, 1003);
create table computer(

COMPID number,

MAKE char(10),

MODEL varchar2(20),

MYEAR number

);

insert into computer values(1001, "Dell", "Vostro", 2013);

insert into computer values(1002, "Dell", "Precision", 2014);

insert into computer values(1003, "Lenovo", "Edge", 2013);

insert into computer values(1004, "Lenovo", "Horizon", 2014);

Let us understand the concepts related to CASE statement using some examples.

Top of Form

Equality

Bottom of Form

SELECT Id, EName, Designation, Salary,

CASE Designation

WHEN 'SE' THEN Salary * 1.2

WHEN 'SSE' THEN Salary * 1.1

ELSE Salary * 1.05

END New_Salary

FROM Employee;
SELECT column1, column2, ... FROM
SELECT SELECT  statement is used to fetch data from a database.
table_name; 

SELECT column1, column2, ...FROM


WHERE WHERE  clause is used to extract only those records that fulfill a specified
table_name WHERE condition;

COUNT  is a function that takes the name of a column as argument and
COUNT SELECT COUNT * FROM table_name ; 
is not NULL.

SELECT DISTINCT columnname FROM


DISTINCT DISTINCT  function is used to specify that the statement is a query whic
table_name;

SELECT * FROM table_name LIMIT


LIMIT LIMIT  is a clause to specify the maximum number of rows the result se
number;

INSERT INTO table_name


INSERT (column1,column2,column3...) INSERT  is used to insert new rows in the table.
VALUES(value1,value2,value3...); 

UPDATE table_name
UPDATE SET[[column1]=[VALUES]] WHERE UPDATE  used to update the rows in the table.
[condition];

DELETE FROM table_name WHERE


DELETE DELETE  statement is used to remove rows from the table which are spe
[condition]; 

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy