4 SQL Select
4 SQL Select
It is worth noting that SQL is a meta-language, but it is certainly not a programming language and therefore
does not contain the typical control mechanisms of both procedural and object-oriented languages (Visual
Basic, Basic.Net, Delphi, Java, C#, Python, etc.) such as IF ... THEN, FOR ... NEXT, DO WHILE ... LOOP, etc.
1
RELATIONAL DATABASE AND SQL LANGUAGE
However, for a practical use of SQL it is sometime useful to include SQL statements inside a conventional
programming language (most of the current programing language allow this possibility); in this case the use of
SQL is called embedded, while the programming language is called the host language.
Before proceeding further on, we report in a schematic way the format (syntax), with which the SQL command
statements will be presented.
<> - Arrows delimit the names of key elements of the SQL language. For example, we will use expression such
as: <command>, <expression>, <identifier>;
[] - Brackets denote an optional element;
... - Three dots (the ellipsis) indicate that the previous item can be repeated a limitless number of tile
{} - Braces are used to group, together, multiple elements of a definition;
| - A vertical bar line between two elements indicates that the preceding element is an alternative to the
following one (i.e., OR operator).
We also observe that an SQL command is normally constituted by an operation followed by one or more
clauses that specify the effect of the operation. Thus, the general command definition has the following shape:
<Command>:: = <action> <clause> [<clause> ...]
2
RELATIONAL DATABASE AND SQL LANGUAGE
As it can be seen, the SELECT operator mandatorily requires a Selection List that defines the fields (i.e., the
columns) to be returned and that will form the output table. In addition, the query must also indicate where
(i.e., in which Tables) the fields listed in the SELECT section must be searched. This part corresponds to the
<Table Expression> that is used, exactly, to define the sub set of the table on which the query must operate.
Of this part, only the FROM clause is mandatory, all the others are optional. Specifically, its syntax is defined as
follows:
Practically speaking, it is necessary to include in the FROM Expression at least the name of a single table, i.e.,
the table from which data must be collected. Obviously, to create complex queries operating on more than a
single Table, it will be necessary to include in the FROM Expression the name of all the tables that must
considered during the search. As shown by the syntax, the names of the table must be separated by a comma.
In practice, the easiest form of a Select query consists in a list of records (columns) that belongs to the same
Table as in the pseudo-code that follows:
<Selection List> :: =
SELECT <First field> [{, <Second field}, …]
<Table Expression> :: =
FROM <Input_Table>
It should be clear that, this query operates a projection of the Input_Table; indeed, all the records of the
Input_Table will be returned, but only the fields that appear in the Selection List will be displayed.
Let us consider a simple CUSTOMERS table as the one in Fig. 4.1
CUSTOMERS
ID_Customer - Integer (PK)
Name – Char (252)
Surname – Char (252)
Date of Birth – Date
Nationality – Char (252)
Income – Currency
3
RELATIONAL DATABASE AND SQL LANGUAGE
Additional operators
It is interesting to note that the order of the fields in the outcome table, depends, exclusively, on the order
used in the Selection List.
It is also possible to rename the fields using the AS operator, as shown below:
SELECT <Field name> AS ᶦ<New Name>ᶦ [{,<Field Name> AS <new name>} …]
FROM <Table Name>
Sometimes, it may be useful to select all the fields of a Table. In this case, instead of writing the name of each
field in the Selection List, it is sufficient to use the All operator that is indicate with an asterisk (*).
4
RELATIONAL DATABASE AND SQL LANGUAGE
The Select operator makes it possible not only to select some fields from a table, but it also permits to create
new Calculated Fields. To this aims it is sufficient to include in the Selection List a simple mathematical
expression operating on one or more fields of the selected tables.
To clarify both concepts, let us consider a PRODUCTS table as the one of Fig.2 where, OO stands for On-Order,
OH for On-Hand and R is the reorder level (i.e., the level of the Inventory Position that triggers a new
replenishment order). Note that the hypothesis is made that a single warehouse is used and so, data
concerning the inventory can be conveniently placed in the PRODUCTS table.
Also note that there is a Forward Key, namely Category_ID, thus it should be clear that this table is in OTM
relation with the “father” table CATEGORIES.
PRODUCTS
ID_Product - Integer (PK)
Name – Char (252)
Shelf Life – Integer
OO – Integer
OH – Integer
R - Integer
Price – Currency
Category_ID (FK) - Integer
In case of a negative gap a certain amount of product (greater or equal than the Gap) should be ordered.
To this aim we could write a query as the following one:
5
RELATIONAL DATABASE AND SQL LANGUAGE
… … … … … … … … …
n Violet 5 50 30 40 5€ 1 80 40
As we have noted above, the table CATEGORIES has a OTM relation with the table PRODUCTS and so, it could
be interesting to see how many CATEGORIES are used to categorize all the products contained in the
PRODUCTS table. To this aim, we could make a projection on PRODUCTS using Category_ID as the only field
included in the Selection List. This corresponds to the following simple query:
SELECT Category_ID
FROM PRODUCTS
However, the obtained result (an example is given in Tab. 4.5) is not satisfactory, since what we get is a list of
ID_Category, but most of them appear more than one time. This is correct, since the relation is of the OTM
form, and so the FK ID_Category is not unique.
1
2
3
1
1
…
2
…
5
6
RELATIONAL DATABASE AND SQL LANGUAGE
If we do not want duplicated data (i.e., we want the minimum set of the data contained in the FK field), we
need to use the DISTINCT operator.
<Select Command> : : =
SELECT [ALL | DISTINCT] <Selection List>
<Table Expression>
[<Sorting Criteria>]
Aw we can see, after the keyword SELECT there are two operators (i.e., ALL and DISTINCT) divided by a vertical
bar and they are included inside a pair of brackets. The vertical bar means that ALL and DISTINCT are mutually
exclusive, the brackets means that them both are optional. Since ALL comes before DISTINCT, it is considered
as the default option, in other words typing SELECT is analogous to type SELECT ALL.
What is the difference between SELECT ALL <field name> and SELECT DISTINCT <field name>? It is easy to guess
that, with respect to selected field, using the ALL operator, all the records will be shown, using DISTINCT, only
non-duplicated records will be shown. In other words, the DISTINCT operator is a first way to implement the
Selection operation (of relational algebra).
Owing to these issues, we can reformulate the previous query as follows:
SELECT DISTINCT Category_ID
FROM PRODUCTS
A possible outcome is shown below:
Tab. 4.4 (a)
A Select query with the DISTINCT operator made on the FK
Category_ID
1
2
3
4
5
It is now clear that products have been grouped in five distinct categories.
We conclude by observing that, and this is obvious, the DISTINCT operator can be used only once in a SELECT
statement, i.e., DISTINCT can be applied to a single field only.
7
RELATIONAL DATABASE AND SQL LANGUAGE
Note that <condition> can be any (valid) logical condition obtained through comparison operators, logical
operators and any other conditional connectors.
Tab. 4.5.
Comparison Operators
Meaning Symbol
Equal =
Different <>
Greater >
Less <
Greater or equal >=
Less of equal <=
To make a first example we can consider Table 4.3 again. Now we want to see only the products that need a
replenishment order (i.e., those one with a negative gap).
1
It may seem confusing but the SELECT operator performs a ‘projection’, whereas it is the WHERE operator that performs
a ‘selection’.
8
RELATIONAL DATABASE AND SQL LANGUAGE
Let us make a step further. Suppose that the PRODUCTS table has an additional field called Minimum Ordering
Lot (MOL), which is the minimum quantity that can be ordered to a supplier. Also, orders must be integer
multiple of the MOL quantity, so, for instance, if the gap was -15 and MOL was 10, then an order equal to 20
should be issued.
If so the reorder quantity calculated above is wrong, thus we should correct it. This time the calculation is a
little harder, so it may be wise to create a custom-made function, for example using VBA or another language,
and to call that function directly in the Selection List. Public functions, indeed, can be directly used in inside
SQL statements. This is a very important feature!
As a first thing let us see how the function that we need looks like:
Public Function RQ (OO, OH, R, MOL As Integer) As Integer
Dim IP, Gap, Q, N As Integer ' Definition of internal variables
IP = (OO + OH)
Gap = (IP - R)
If Gap < 0 Then ' An order is needed
Gap = - (Gap) ' This is the theoretical quantity to be reorder
N = Ceiling(Gap / MOL) ' A custom function that rounds up a number
End If
RQ = (MOL * N) ' The output
End Function
9
RELATIONAL DATABASE AND SQL LANGUAGE
Obviously, it is also possible to build complex logical condition, by aggregating simple condition by
means of logical connectors (i.e., AND, OR and NOT), accordingly to the well-known De Morgan’s
laws3.
Let us consider a couple of examples to clarify these concepts.
SELECT Surname
FROM CUSTOMERS
WHERE NOT (Country = ᶦUSAᶦ ) AND City = ᶦVancouverᶦ
This query returns the last name of the customers (stored in the CUSTMERS table) that do not leave in the USA
and that reside in Vancouver4. Note that, in this case the NOT condition is almost redundant. Vancouver is a
2
All the VBA functions available in Access can be found at: https://support.office.com/en-us/article/Access-Functions-by-
category-B8B136C3-2716-4D39-94A2-658CE330ED83
3
De Morgan’s laws can be found at: https://en.wikipedia.org/wiki/De_Morgan%27s_laws
4
Note that the single quotation mark ' ' are used to define a String
10
RELATIONAL DATABASE AND SQL LANGUAGE
Canadian city and so, it is clear that people leaving in Vancouver are not USA citizens. However, since there is a
city named Vancouver in the Washington State, the NOT condition may filter some data.
Now, what happen if we modify the query as follows?
SELECT Surname
FROM CUSTOMERS
WHERE NOT (Country = ᶦUSAᶦ AND City = ᶦVancouverᶦ)
In this case the scenario is totally different. Since the NOT condition applies to both statements, this time the
filter is almost useless. Indeed, the query will return all the customers, except those few ones living in
Vancouver in the Washington State.
Using the De Morgan’s laws we could have written the previous query also in this alternative way:
SELECT Surname
FROM CUSTOMERS
WHERE NOT (Country = ᶦUSAᶦ) OR NOT (City = ᶦVancouverᶦ)
The BETWEEN operator is used to check if a value belongs to a specific interval. Its syntax is the following one:
<field> BETWEEN <low value> AND <high value>
11
RELATIONAL DATABASE AND SQL LANGUAGE
The IN operator is used to check if a value belongs to a list/set of values. Its syntax is the following one:
<field> IN ({<'first value of the sequence'>, } …)
The LIKE operator is used to check if a string has a predefined format; the format is defined using jolly or
wildcard chars. Its syntax is the following one:
<field> LIKE <wildcard sequence>
The IS NULL operator is used to check if a field is null. IS NULL is commonly used together with the NOT
operator (i.e., NOT IS NULL), so as to return only the records with non-null fields.
Some examples follow.
The first one concerns the use of the IN operator.
SELECT Surname
FROM CUSTOMERS
WHERE City IN (ᶦVancouverᶦ, 'New York', 'Chicago') AND Country = 'USA'
In this case the query returns the American customers that live in one of the cities of the list.
Also note that an OR condition could be used instead of the IN operator:
SELECT Surname
FROM CUSTOMERS
WHERE (City = ᶦVancouverᶦ OR City = 'New York' OR City = 'Chicago') AND Country = 'USA'
12
RELATIONAL DATABASE AND SQL LANGUAGE
List [charlist] [charlist] One of the chars of the charlist must be found in the
specified position of the input string
Not in List [^charlist] [!charlist] None of the chars of the charlist must be found in the
specified position of the input string
For instance, LIKE 'c[h-m]?????' requires a string of seven chars starting with the letter 'c', followed by a letter
in the range [h-m]. Also in this case, Chicago would be fine, but Chelsea would be not.
Although rarely, it may happen that the input string (to be analyzed) contains the underscore (_) or the
percentage (%) or the hashtag (#). Let us suppose that some customers have been coded using one char for the
name, three chars for the surname and two chars for the nationality, and that the hashtag has been used to
separate these chars, as for F#ZMM#IT. What should we write if we wanted to identify all the Italian customers
(i.e., those ones having IT as the last two chars of their code)? The solution is shown below:
WHERE Code LIKE '?$#???$#IT' ESCAPE '$'
The dollar ($) is called Escape character and it is used to say to the compiler that a wildcard char (in this case
the hashtag) has to be considered as a standard char. More precisely any wildcard that is preceded by the
escape char will be considered as a normal char and not as a wildcard one. So, in this case the sequence
'?$#???$#IT' indicates a string of eight letters (obviously the two escape characters are not counted) having the
13
RELATIONAL DATABASE AND SQL LANGUAGE
following structure: the first char is a letter, the second is an hashtag, the following three chars are letters, the
sixth char is an hashtag and the last two chars are IT. Please note that the escape chars must not be a wildcard
and it must not be part of the string to be analysed (exactly as the dollar $, in the present case).
However, formulated in this way, this is a parametric query and, therefore, when executed, the user will be
asked to fill in an input form, to specify the ID value to be searched.
To avoid this problem, it would be nice to modify the WHERE clause as follows:
WHERE ID = I
with ‘I’ being a public variable evaluated (at run time) by VBA code. Unfortunately, as above mentioned. even if
‘I’ was a global variable, this query would not work. In fact, a query can only receive public functions, not public
variables. To overcome this fact, we should then write (in a public module) a trivial function of type Get that
reads and returns the public variable ‘I’.
Public Function Get_ID() As Integer
Get_ID = I ˈIt reads and return the value of the public variable I
End Function
In this way, we can now modify our query as follows:
SELECT Name, Surname, [...]
FROM REGISTRY
WHERE ID = Get:ID()
For some reasons, although public variables are not allowed inside SQL code, references to the values of some
Components placed on a Form are admitted. For instance, suppose that the user can operate on a Form
(named F_Id) with a Combo Box (named Cbx_Id) populated with all the customers’ ID from the REGISTRY table.
In this case, the previous query could have also been written as:
14
RELATIONAL DATABASE AND SQL LANGUAGE
15
RELATIONAL DATABASE AND SQL LANGUAGE
Unfortunately, none of the two queries does work. Indeed, it is not possible to include in the same selection list
elements that operates on a single record (the Name field in this case) and functions that operates on group of
records (the MIN operator in this case). Indeed, this create an incompatibility error.
Similarly, it is not possible to include a function operating on groups in the WHERE clause; indeed, also this
would generate an incompatibility error.
5
If the name of a field is made of two or more words separated by a space, the whole string must be placed in brackets
16
RELATIONAL DATABASE AND SQL LANGUAGE
For the sake of completeness, we anticipate that the only possible solution is that to use a combined query as
the following one6:
SELECT Name
FROM PRODUCTS
WHERE [Unit Price] = (SELECT MIN[Unit Price] FROM PRODUCTS WHERE [Category ID] = ᶦHerbsᶦ)
The inner query (the one used in the WHERE clause of the outer query) returns a scalar value which is then
used as comparison value of the outer query. This is absolutely licit, and the query does work.
6
This topic will be better explained in the following chapters
17
RELATIONAL DATABASE AND SQL LANGUAGE
It is important to note that, if a group operating function is d in the Selection list of a query, then, all the other
elements included in the Selection list must be part of the GROUP BY list, too. For instance, in the query above,
[Category ID] appear both in the Selection and in the Group By list.
Also note that, in the query, an additional operator has been used. This is the ORDER BY operator that is used
to sort (in Ascending ASC or Descending DES order) the records returned by a query; its syntax is as follows:
<Ordering Statement> : : = ORDER BY <Ordering condition > [{, <other condition>} …]
where:
<Ordering Condition> : : = <Field name> | <Column Number> [ASC|DESC]
Briefly, it is sufficient to indicate the field or the fields on which the ordering have to be based. Fields can be
indicated by name or, making reference to their position in the table (i.e., column number). The field name is
the standard option. Also, there is the need to indicate if data have to be sorted in Ascending or in Descending
order. Ascending is the default option, even if ASC is not explicitly indicated, it is considered as the standard
way to order data.
18
RELATIONAL DATABASE AND SQL LANGUAGE
HAVING is the last interesting operator to be described. If used conjointly with the GROUP BY operator,
HAVING makes it possible to add logical conditions on the group operating functions, exactly as the WHERE
operator makes it possible to define logical condition on the function that operates on single records.
To make a simple example, we can consider the following condition. We want to know how many products
belong to each category, but we want to limit this analysis to the categories that include at least five different
products. This is a typical case requiring the use of the having condition; the query that we need is, in fact, the
following one:
Tab. 4.9
An example of the Having operator
Category ID #Prod
1 6
5 5
6 15
… …
N 7
Note that not all the Category ID are shown in the table; indeed, only the categories that comprehend more
than five products are returned by the query.
Also note that the GROUP BY operator must always operate on a group operating function as in the present
case where, in fact, we wrote: HAVING COUNT(ProductsID) >=5.
19
RELATIONAL DATABASE AND SQL LANGUAGE
Conversely, as we have already noted, a group operating function must never be included in the WHERE clause.
For instance, that following query is wrong and does not work:
SELECT CategoryID, COUNT(ProductsID) AS #Prod
FROM PRODUCTS
GROUP BY CategoryID
WHERE COUNT(ProductsID) >=5
It is also worth noting that the HAVING operator can be based on any one of the comparison
operators (i.e., LIKE, IN and BETWEEN). Also, and perhaps more important, it is possible to use in the
same query both the WHERE and the HAVING operators:
• The WHERE clause is executed first and, in this way, all the records that do not fulfill the logical
condition (included in the WHERE clause) are eliminated.
• The HAVING operator acts on the remaining records, next.
For instance, let us consider the following query:
SELECT CategoryID, COUNT(ProductsID) AS #Prod
FROM PRODUCTS
WHERE CategoryID <= 6
GROUP BY CategoryID
HAVING COUNT(ProductsID) BETWEEN 5 AND 12
Tab. 4.10
Having operator used conjointly with WHERE
Category ID #Prod
1 6
5 5
As it can be seen, only two records are returned. Indeed, all the records with a Category ID greater than six are
erased by the WHERE condition. Next, the HAVING operator limits the analysis to those categories with a
number of products comprised between 5 and 12. So, since the 6th category has 15 products it is also erased.
20