04 - (Introducing Queries) Data
04 - (Introducing Queries) Data
In many organizations, SQL is used as a complement to other tools such as spreadsheet applications. If the
data we're interested in can fit in a spreadsheet and does not have many relationships to other data of interest,
we can analyze it in a spreadsheet.
But for sprawling and diverse data such as the data related to a retail platform, organizing the data in a
database is best. Then, we use SQL queries to uncover trends in website traffic, customer reviews, and
product sales. Which products had the highest sales last week? Which products get the worst review scores
from customers? How did website traffic change when a feature was introduced? SQL shines when an
organization has lots of data with complex relationships.
Keywords
The two most common keywords are SELECT and FROM. Perhaps we'd like a list of every patron our
library has. The SELECT keyword indicates which fields should be selected - in this case, the name field.
The FROM keyword indicates the table in which these fields are located - in this case, the patrons table.
Query
Here's how the query should be written. The SELECT statement appears first, followed on the next line by
the FROM statement. It's best practice to end the query with a semicolon to indicate that the query is
complete. We also capitalize keywords while keeping table and field names all lowercase.
Now let's take a look at the results of our query, often called a result set. The result set lists all patron names,
just as we had hoped. Note that we have not changed our database by writing this query. The tables,
including the patrons table, are exactly the same as before we wrote the query.
Selecting multiple fields
To select multiple fields, we can list multiple field names after the SELECT keyword, separated by commas.
For example, to select id and name, we'd list both field names in the order we'd like them to appear in our
result set. Notice that this does not have to match the order the fields are presented in the table.
Consider the following table named people:
id name location
001 Fred US
002 Tony UK
003 Mike Germany
004 Steward Germany
005 Taylor UK
To select a single column specify the column name after the SELECT keyword:
SELECT name
FROM people
name
Fred
Tony
Mike
Steward
Taylor
To select the multiple columns for a table list the columns names separated by commas.
id name
001 Fred
002 Tony
003 Mike
004 Steward
005 Taylor
SELECT *
FROM people
id name location
001 Fred US
002 Tony UK
003 Mike Germany
004 Steward Germany
005 Taylor UK
Aliasing
Sometimes it can be helpful to rename columns in our result set, whether for clarity or brevity. We can do
this using aliasing. Perhaps we'd like to select the name and hire year for each record in the employees
table.
We could alias the name column as first_name in the query by adding the AS keyword to indicate an alias
of first_name after selecting the name field. The result set now has first_name rather than name as the
column header. The alias only applies to the result of this particular query; in other words, the field name
in the employees table itself is still name rather than first_name.
Consider the following table named people:
id name location
001 Fred US
002 Tony UK
003 Mike Germany
004 Steward Germany
005 Taylor UK
To select a single column specify the column name after the SELECT keyword:
first_name
Fred
Tony
Mike
Steward
Taylor
DISTINCT keyword
Some SQL questions require a way to return a list of unique values. Let's imagine that we are interested in
getting a list of countries. If we select the location field from the employees table, the result set shows
several locations listed twice, which isn't what we are looking for. To get a list of locations with no repeat
values, we can add the DISTINCT keyword before the location field name in the SELECT statement. Now,
we can see that all of our location without duplicates.
id name location
001 Fred US
002 Tony UK
003 Mike Germany
004 Steward Germany
005 Taylor UK
To select a single column specify the column name after the SELECT keyword:
name
US
UK
Germany
DISTINCT with multiple fields
It's possible to return the unique combinations of multiple field values by listing multiple fields after the
DISTINCT keyword. Take a look at the employees table. Perhaps we'd like to know the years that different
departments hired employees. We could use this SQL query to look at this information, selecting the dept_id
and year_hired from the employees table. Looking at the results, we see that department 3 hired two
employees in 2021.
To avoid repeating this information, we could add the DISTINCT keyword before the fields to select. Notice
that the department id and year_hired fields still have repeat values individually, but none of the records
are the same: they are all unique combinations of the two fields.