SQL Project File
SQL Project File
Use auto_insurance_risk;
1. Write a query to calculate what % of the customers have made a claim in the current exposure period[i.e. in the given
dataset]?
2.1. Create a new column as 'claim_flag' in the table 'auto_insurance_risk' as integer datatype.
2.2 Set the value to 1 when ClaimNb is greater than 0 and set the value to 0 otherwise.
UPDATE Auto_insurance_risk
SET claim_flag = 1.5
CASE WHEN ClaimNb > 0 THEN 1
ELSE 0
END;
3.1. What is the average exposure period for those who have claimed?
Thus those with higher avg. Exposure tend to claim much more often as compared to the rest.
4.1. If we create an exposure bucket where buckets are like below, what is the % of total claims by these buckets? 2
UPDATE Auto_insurance_risk
SET ebucket =
CASE
WHEN Exposure >= 0 and Exposure <=0.25 THEN "E1"
WHEN Exposure > 0.25 and Exposure <=0.50 THEN "E2"
WHEN Exposure >= 0.51 and Exposure <=0.75 THEN "E3"
This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00
https://www.coursehero.com/file/106702000/sql-project-filedocx/
WHEN Exposure > 0.75 THEN "E4"
END
We can conclude that E1 and E4 have higher claim (total they both comprise of almost 2/3 rd of 1
total claims. Thus need to be checked into as this means anyperson within E1/E4 has high
chances of claim.
5. Which area has the highest number of average claims? Show the data in percentage w.r.t. the number of policies in
corresponding Area.
select area,count(ClaimNb)
from Auto_insurance_risk
group by area 2
order by count(ClaimNb) desc
limit 1;
Area C
6. If we use these exposure bucket along with Area i.e. group Area and Exposure Buckets together and look at the claim
rate, an interesting pattern could be seen in the data. What is that?
select Area,ebucket,sum(claim_flag)/6780.13 as
claim_rate,sum(ClaimNb)
from Auto_insurance_risk
group by Area,ebucket 3
order by sum(ClaimNb) desc;
7.1. If we look at average Vehicle Age for those who claimed vs those who didn't claim, what do you see in the summary?
1.5 Marks for SQL and 1 for inference.
Those who did not claim have higher vehicle age as compared
to those who claimed.
This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00
https://www.coursehero.com/file/106702000/sql-project-filedocx/
7.2. Now if we calculate the average Vehicle Age for those who claimed and group them by Area, what do you see in the
summary? Any particular pattern you see in the data?
select ebucket,avg(VehAge),claim_flag
from Auto_insurance_risk
group by ebucket,claim_flag;
9.1. Create a Claim_Ct flag on the ClaimNb field as below, and take average of the BonusMalus by Claim_Ct. 2
UPDATE Auto_insurance_risk
SET Claim_Ct =
CASE WHEN ClaimNb = 1 THEN "1 Claim"
WHEN ClaimNb > 1 THEN "MT 1 Claims"
WHEN ClaimNb = 0 THEN "No Claims"
END;
We can see that the average BonuMalus is almost same for categories being a bit inclined 1
towards those who have already claimed more than once.
10. Using the same Claim_Ct logic created above, if we aggregate the 4
Density column (take average) by Claim_Ct, what inference can we
make from the summary data?
2
11. Which Vehicle Brand & Vehicle Gas combination have the
highest number of Average Claims (use ClaimNb field for
aggregation)?
select VehBrand,VehGas,avg(ClaimNb)
from Auto_insurance_risk
group by VehBrand,VehGas
order by avg(ClaimNb) desc;
This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00
https://www.coursehero.com/file/106702000/sql-project-filedocx/
Thus Vehicle Brand B12 which is a Regular Vehicle Gas has the highest average claims.
select
Region,Exposure,count(claim_flag)/6780.13
from Auto_insurance_risk 3
group by Region,Exposure
order by count(claim_flag)/6780.13 DESC
limit 5;
13.1. Are there any cases of illegal driving i.e. underaged folks
driving and committing accidents?
select claim_flag,count(claim_flag)
from Auto_insurance_risk
where age = "1 - Beginner" 1
group by claim_flag;
13.2 Create a bucket on DrivAge and then take average of BonusMalus by this Age Group Category. WHat do you infer from
the summary?
DrivAge=18 then 1-Beginner, DrivAge<=30 then 2-Junior, DrivAge<=45 then 3-Middle Age, DrivAge<=60 then 4-Mid-
Senior, DrivAge>60 then 5-Senior 2.5 Marks for SQL and 1.5 for inference.
UPDATE Auto_insurance_risk
SET age =
CASE WHEN DrivAge =18 THEN "1 - Beginner"
WHEN DrivAge > 18 and DrivAge <=30 THEN "2 -
Junior"
WHEN DrivAge > 30 and DrivAge <=45 THEN "3 -
Middle Age"
WHEN DrivAge > 45 and DrivAge <=60 THEN "4 - Mid
Senior" 4
WHEN DrivAge > 60 THEN "5 - Senior"
END;
We can see that BonusMalus i.e. which penalises them for making claims decreases with age.
This can be due to the fact the that older people have much more experience in driving as
compared to younger ones so they are expected to drive cautiously.
14. Mention one major difference between unique constraint and primary key? 2
Primary Key - Only one primary key is allowed to use in a table, thus used to uniquely identify each
record in the table. The primary key does not accept the any duplicate and NULL values
This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00
https://www.coursehero.com/file/106702000/sql-project-filedocx/
Unique key - A column with a unique key constraint can only contain unique values. It is not a
compulsion to have a unique key in a table.
15. If there are 5 records in table A and 10 records in table B and we cross-join these two tables, how many records will be
there in the result set?
2
5*10 = 50
16. What is the difference between inner join and left outer join?
Inner join returns a combined tuples between two or more tables where at least one attribute in
common. If there is no attribute in common between tables then it will return nothing.
2
Left Outer join is an operation that returns a combined tuples from a specified table even the join
condition will fail. It returns all records from the left table (Table 1) and matching records from the
right table (Table 2).
17. Consider a scenario where Table A has 5 records and Table B has 5 records. Now while inner joining Table A and Table
B, there is one duplicate on the joining column in Table B (i.e. Table A has 5 unique records, but Table B has 4 unique
values and one redundant value). What will be record count of the output? 2
25
18. What is the difference between WHERE clause and HAVING clause?
WHERE Clause is used to filter the records from the table based on the specified condition whereas
HAVING Clause is used to filter record from the groups based on the specified condition.
WHERE Clause can be used without GROUP BY Clause, but HAVING Clause cannot be used
without GROUP BY Clause.
This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00
https://www.coursehero.com/file/106702000/sql-project-filedocx/
Powered by TCPDF (www.tcpdf.org)