Week+4SQL
Week+4SQL
• Concatenate
• Trimming
• Changing Case
• Substring Functions
• Date and Time String
String Functions:
• Concatenate
• Substring
• Trim
• Upper
• Lower
Concatenate
SELECT Company_name,
Contact_name,
Company_name || ‘(‘ || Cotact_name || ‘)’
FROM Customers
Trim
TRIM
RTRIM
LTRIM
Substring
Substring is a useful function that allows you to pull apart just a portion of
the string that you're looking at.
Applying the above function on Alexander, Bruce and Valli would return lex,
ruc and all respectively.
Applying the following to Nancy would return ncy. It just gives me whatever
it can fill in with it.
There are two many different formats. Look what database you are working
with and what type of format it uses.
• STRFTIME
• Compute current date and compare it to a recorded date in your
data
• Use the NOW function
• Combine several date and time functions together to manipulate
data
Example:
SELECT Birthdate,
STRFTIME (‘%Y’, Birthdate) AS Year,
STRFTIME (‘%m’, Birthdate) AS Month,
STRFTIME (‘%d’, Birthdate) AS Day,
FROM employee
Another function is the function to find current time:
SELECT DATE(‘now’)
SELECT STRFTIME(‘%Y %m %d’ , ’now’) <— ‘now’ is a modifier
SELECT STRFTIME(‘%H %M %S %s’ , ’now’)
Example:
SELECT Birthdate,
STRFTIME (‘%Y’, Birthdate) AS Year,
STRFTIME (‘%m’, Birthdate) AS Month,
STRFTIME (‘%d’, Birthdate) AS Day,
DATE(‘now’) - Birthdate AS age
FROM employee
CASE STATEMENT
CASE
WHEN C1 THEN E1
WHEN C2 THEN E2
…
ELSE [RESULT else]
END
CASE input_expression
WHEN when_expression THEN result_expression […N]
[ELSE else_result_expression]
END
Example:
SELECT employee_id,
firstName,
last_name,
city,
CASE city
WHEN ‘Calgary’ THEN ‘Calgary’
ELSE ‘Other’
END AS Calgary
FROM Employee
ORDER BY LastName, FirstName
For this example, I'm going to show you how you could add a couple of the
cases together. Here, what I'm looking at in the Chinook database is how I'm
going to classify my tracks. I want to classify them based on the number of
bytes they have. Again, we have discussed a little bit earlier in the
course how we do this when we're doing predictive modeling or
forecasting. So here, we may want to bin all of our small sales customers
into one and predict their future sales, or large scale customers into
another, and so on and so forth. In this case, I'm going to be looking at the
size of the tracks, and so I want to bin the bytes.
SELECT trackId,
name,
bytes,
CASE
WHEN bytes <30000 THEN ‘SMALL’
WHEN bytes >=30001 AND bytes<=50000 THEN ‘Medium’
WHEN bytes >= 50001 THEN ‘Large’
ELSE ‘other’
END (AS) bytes_category
FROM Tracks
VIEWS
we're always combining data from multiple sources or transforming it in
some way. As you know, sometimes things like the order of operations can
get a little tricky. Instead of creating a whole new table, sometimes we can
create the illusion of a table by using a view. A view is essentially a stored
query, and it helps us clean up our queries and simplify when we have to
write. In a view, you can add or remove columns without changing the
schema. You're not actually writing the query to the database or anything,
what you're doing is you're kind of storing it for the time being. This is really
helpful and pays off when we use it to encapsulate queries.
CREATE [TEMP] VIEW [IF NOT EXIST]
view_name (column_name_list)
AS
select_statement
Example:
Let's say I want to get a count of how many territories each employee
has. If you look at our Diagram, this information is separated out from each
other. I'm going to create a view, so that on that view, I can just run a
simple count on the number of territories. So here I will create my
view. Then for my view, I'm just going to call it as my_view.
Now that I have my_view out there, I can actually perform even more
queries on top of that. I can now take that view and I can select the counts
in the territory descriptions. For example, this will give me an idea of the
counts of how many territories that each employee has. I can then group it
by the employee's last name and first name. Now I can see the total count
for each territory of what each employee has. This would've been a little bit
more complex to do if I tried to do it all at once. But creating view just made
things really simple. The beauty of the view is that it can be used like a
table. But it's unlike a table in that you don't have to have ETL or run ETL on
any of the data. This helps a lot by encapsulating complex queries
or complex calculations that you're trying to write. It can really help simplify
it. It can also be used in pretty much any database, except for stored
procedures.
Now that I have my_view out there, I can actually perform even more
queries on top of that. I can now take that view and I can select the counts
in the territory descriptions. For example, this will give me an idea of the
counts of how many territories that each employee has. I can then group it
by the employee's last name and first name. Now I can see the total count
for each territory of what each employee has. This would've been a little bit
more complex to do if I tried to do it all at once.
Views are really most helpful if you need to join a set of tables and you're
having trouble getting calculations. Particularly those complex ones dealing
with the order of operations in the right order to get the output you're
looking for. Another benefit of views includes different securities or write
capabilities. We talked about not being able to write data to an environment
or to a particular database. Views are helpful because you're creating a view
of a table but not actually writing data to that table. This is a way to get
around some of those database writing limitations.
Another thing that views are helpful for is to create a stepping stone in
multilevel queries. For example, let's say you create a query that counts the
number of cells that each person has made. You could then write a query
that groups the salespeople into a particular group. Then you can count the
sales of that group as well. It just creates this multilevel dimension that you
wouldn't have been able to do elsewhere. And then, it also helps so that
you're not transferring any data through and ETL process.
It's definitely data understanding, and asking yourself things like, are there
lots of NULLs value in this? Is the data made up of string values that were
just free form or entered?
Or is it concatenated dates and times? But then, there's also this concept of
business understanding, meaning how do all these pieces and elements
relate to each other. If you're new to this subject area and you've never
worked with the data before, it's going to take you a little bit longer to write
your queries. Because of this, it's going to take you a little bit longer to
figure out, how does everything work together? How does it join or relate to
each other?
But it'll always be worth taking the time to understand your data as much as
you can before you really start to analyze it. It's important to really
understand the relationships and the dependencies. That leads us to our
second step, which is the business, or subject area understanding. As you
start to get familiar with your data, what will happen, is that you'll run into
questions about the business problem you're trying to solve, the problem or
subject or area that you're looking at.
One of the things to be careful for is what I call, the unspoken need. You
may have a business problem where they say, for example, we want to
predict whether or not a customer is likely to buy our product. That seems
pretty straight forward and easy, right? But as you dive into the data more,
you may start to get questions like, well, what customers? What
products? Some of the things that are unspoken are certain logical
exclusions. For example, are there certain customers that should be
excluded from the this analysis? Are there certain cases where past cells
shouldn't be added into this model, or should it be counted? This is why you
have to walk in between that data understanding and business
understanding. Because you frequently need to look at the data to get
questions. And then, you need to go back to the business to understand the
problem better.
We're going to talk about those steps in this video. After this lesson, you
should be able to determine and map out the data elements needed for a
query. Discuss some of the strategies to employ, as you'd begin to write
more complex queries. And explain some common troubleshooting
techniques to try in your SQL code when it isn't giving you the results you
expect.
Okay, so to really understand a problem, you really need to map out what
are the exact data elements you need.
You need to know the data you're going to go after and understand some of
the issues with the data from the profiling you've done. So where do you
start with your data and query? If you're always extracting data, it's always
going to start with the select statement. So you're going to have to use
select and from.
What I do is usually write out okay, where is the data that I need? And then
kind of draw out a diagram of the different tables and the pieces of
information I need on paper. Basically, just creating my own data model and
map.
I start with this just as sources, and then from each source, I go down
and define the fields I need. And then from there, I also define how I'm
going to join those different sources together.
From that point, I'm going to decide if I need to do any calculations. It's just
kind of going through a logical process that I go through. But again, you're
always going to start with SELECT. I mean that's the great thing about SQL,
it's consistent in that way.
That leads us to our next tip which is test along the way. Don't wait to test
your query until you've combined multiple sources together. And you have all
your calculations done and finished. Think of this as little building blocks. If
you write a calculation of the average selling price of something, look at how
many values you're getting back for just that calculation from the table and
make sure that seems right. Then combine this result with another table and
then test that. If you know your data, you could dive in a little bit
quicker. But this will really make sure that your order of operations is
correct. This is key because as I have said before, it's easy to get results
back but getting the right results back that you expect is a little bit harder.
It's helpful to start at the basic things first. Okay, I'm getting these fields
from this table. Does that work? Yes, okay, now I'm getting these fields from
this table and from another table. What's my join like? Is this
working? Okay, yes it is. Slowly start to build it back up in order to figure out
where things went wrong. Let's say at this point, you're working through a
problem and you know your data you have profiled it, you've tested it,
and you have started simple and you have your query.
Be sure that when you are writing it, the next thing to look at is to make
sure you are formatting it correctly and commenting nicely.
I think that clean code says a lot about you. Make sure that it's easy to read,
you're using popular indentation, you're commenting strategically where you
need to, etc. You never know when you're going to need to revisit your
query or you're going to need to hand it in to someone else and they need to
edit to, from there. Just keep your code clean. Format your comments where
necessary and strategically.
Then, you want to make sure to review what you've done. A lot of times,
what happens is you'll write a query, you'll be using it for your model, and
you'll be looking at different stats and things like that. Then you need to go
back and edit and change that query. Always make sure you review the
query to see if anything has changed. Has the data changed? Are the
business rules different? Do you need to update and change the date
indicators? Does anything need to be updated? And general, be really careful
when you're going back and using old queries.
Okay, that really takes you through a problem from beginning to end. Again,
it all starts with the data and problem understanding. Make sure you spend
the time there. Make sure you are really spending the time thinking about
what you are doing before you actually start writing the queries. I promise,
it will save you time in the long run, then go through and really understand
your data through profiling it. Make sure you're testing along the way, and
keeping your code clean and commenting.
Those are just a few little tips that I can give you. You're fully equipped now
to go and retrieve the data you need, which is exciting because the first step
in doing data science is to be able to get your data.
You now have that in your toolbox. Reflect on these steps and framework
when you look at problems and start writing your queries. All right, go and
get your data and start analyzing in.