0% found this document useful (0 votes)
46 views5 pages

SQL Project File

This document contains the SQL code and analysis for an auto insurance risk dataset. Some key findings include: 1) 50.23% of customers made a claim in the current exposure period. 2) Those with higher average exposure tended to claim more often than others. 3) Exposure buckets E1 and E4 had the highest claim rates, comprising almost 2/3 of total claims. 4) Area C had the highest number of average claims as a percentage of total policies. 5) Average vehicle age was lower for those who claimed compared to those who didn't. BonusMalus decreases with older driver age groups. 6) Vehicle brand B12 with regular gas had the

Uploaded by

Harsh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

SQL Project File

This document contains the SQL code and analysis for an auto insurance risk dataset. Some key findings include: 1) 50.23% of customers made a claim in the current exposure period. 2) Those with higher average exposure tended to claim more often than others. 3) Exposure buckets E1 and E4 had the highest claim rates, comprising almost 2/3 of total claims. 4) Area C had the highest number of average claims as a percentage of total policies. 5) Average vehicle age was lower for those who claimed compared to those who didn't. BonusMalus decreases with older driver age groups. 6) Vehicle brand B12 with regular gas had the

Uploaded by

Harsh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SQL Graded Project

Utkarsh Atri – Edart C-III


Dataset - Auto_Insurance_Risk

Use auto_insurance_risk;

1. Write a query to calculate what % of the customers have made a claim in the current exposure period[i.e. in the given
dataset]?

select count(*) from Auto_insurance_risk


where ClaimNb>0; 2
(34060/678013)*100 = 50.23%

2.1. Create a new column as 'claim_flag' in the table 'auto_insurance_risk' as integer datatype.

alter table Auto_insurance_risk 1.5


add column claim_flag int;

2.2 Set the value to 1 when ClaimNb is greater than 0 and set the value to 0 otherwise.

UPDATE Auto_insurance_risk
SET claim_flag = 1.5
CASE WHEN ClaimNb > 0 THEN 1
ELSE 0
END;

3.1. What is the average exposure period for those who have claimed?

select avg(Exposure) from Auto_insurance_risk


where claim_flag = 1; 1

3.2 What do you infer from the result? 1


Use claim_flag variable to group the data.
select claim_flag,avg(Exposure) from Auto_insurance_risk
GROUP by claim_flag

Thus those with higher avg. Exposure tend to claim much more often as compared to the rest.

4.1. If we create an exposure bucket where buckets are like below, what is the % of total claims by these buckets? 2
UPDATE Auto_insurance_risk
SET ebucket =
CASE
WHEN Exposure >= 0 and Exposure <=0.25 THEN "E1"
WHEN Exposure > 0.25 and Exposure <=0.50 THEN "E2"
WHEN Exposure >= 0.51 and Exposure <=0.75 THEN "E3"

This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00

https://www.coursehero.com/file/106702000/sql-project-filedocx/
WHEN Exposure > 0.75 THEN "E4"
END

For percent claims per bucket

select ebucket,count(ClaimNb), count(ClaimNb)/6780.13


from Auto_insurance_risk
group by ebucket;

4.2 What do you infer from the summary?

We can conclude that E1 and E4 have higher claim (total they both comprise of almost 2/3 rd of 1
total claims. Thus need to be checked into as this means anyperson within E1/E4 has high
chances of claim.

5. Which area has the highest number of average claims? Show the data in percentage w.r.t. the number of policies in
corresponding Area.

select area,count(ClaimNb)
from Auto_insurance_risk
group by area 2
order by count(ClaimNb) desc
limit 1;

Area C

6. If we use these exposure bucket along with Area i.e. group Area and Exposure Buckets together and look at the claim
rate, an interesting pattern could be seen in the data. What is that?

select Area,ebucket,sum(claim_flag)/6780.13 as
claim_rate,sum(ClaimNb)
from Auto_insurance_risk
group by Area,ebucket 3
order by sum(ClaimNb) desc;

We can see that as mentioned earlier E4 and


E1 have higher claims.
So we see that it’s E4  E1 E2  E3 as an
average trend.

7.1. If we look at average Vehicle Age for those who claimed vs those who didn't claim, what do you see in the summary?
1.5 Marks for SQL and 1 for inference.

select claim_flag,avg(VehAge) from Auto_insurance_risk


group by claim_flag; 2.5

Those who did not claim have higher vehicle age as compared
to those who claimed.

This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00

https://www.coursehero.com/file/106702000/sql-project-filedocx/
7.2. Now if we calculate the average Vehicle Age for those who claimed and group them by Area, what do you see in the
summary? Any particular pattern you see in the data?

select Area,avg(VehAge) 2.5


from Auto_insurance_risk
group by area
having claim_flag=1;

8. If we calculate the average vehicle age by exposure bucket(as mentioned 3


above), we see an interesting trend between those who claimed vs those who
didn't. What is that?

select ebucket,avg(VehAge),claim_flag
from Auto_insurance_risk
group by ebucket,claim_flag;

9.1. Create a Claim_Ct flag on the ClaimNb field as below, and take average of the BonusMalus by Claim_Ct. 2
UPDATE Auto_insurance_risk

SET Claim_Ct =
CASE WHEN ClaimNb = 1 THEN "1 Claim"
WHEN ClaimNb > 1 THEN "MT 1 Claims"
WHEN ClaimNb = 0 THEN "No Claims"
END;

select Claim_Ct,avg(BonusMalus)from Auto_insurance_risk


group by Claim_Ct;

9.2 What is the inference from the summary?

We can see that the average BonuMalus is almost same for categories being a bit inclined 1
towards those who have already claimed more than once.

10. Using the same Claim_Ct logic created above, if we aggregate the 4
Density column (take average) by Claim_Ct, what inference can we
make from the summary data?

select Claim_Ct,avg(Density)from Auto_insurance_risk


group by Claim_Ct

Average Density is higher for those with more than


one claims. It increases with the claims, thus being
more for those who’ve claimed.

2
11. Which Vehicle Brand & Vehicle Gas combination have the
highest number of Average Claims (use ClaimNb field for
aggregation)?

select VehBrand,VehGas,avg(ClaimNb)
from Auto_insurance_risk
group by VehBrand,VehGas
order by avg(ClaimNb) desc;

This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00

https://www.coursehero.com/file/106702000/sql-project-filedocx/
Thus Vehicle Brand B12 which is a Regular Vehicle Gas has the highest average claims.

12. List the Top 5 Regions & Exposure[use the buckets


created above] Combination from Claim Rate's
perspective. Use claim_flag to calculate the claim rate.

select
Region,Exposure,count(claim_flag)/6780.13
from Auto_insurance_risk 3
group by Region,Exposure
order by count(claim_flag)/6780.13 DESC
limit 5;

13.1. Are there any cases of illegal driving i.e. underaged folks
driving and committing accidents?

select claim_flag,count(claim_flag)
from Auto_insurance_risk
where age = "1 - Beginner" 1
group by claim_flag;

Yes, there are a total of 61 cases of illegal driving

13.2 Create a bucket on DrivAge and then take average of BonusMalus by this Age Group Category. WHat do you infer from
the summary?
DrivAge=18 then 1-Beginner, DrivAge<=30 then 2-Junior, DrivAge<=45 then 3-Middle Age, DrivAge<=60 then 4-Mid-
Senior, DrivAge>60 then 5-Senior 2.5 Marks for SQL and 1.5 for inference.

UPDATE Auto_insurance_risk
SET age =
CASE WHEN DrivAge =18 THEN "1 - Beginner"
WHEN DrivAge > 18 and DrivAge <=30 THEN "2 -
Junior"
WHEN DrivAge > 30 and DrivAge <=45 THEN "3 -
Middle Age"
WHEN DrivAge > 45 and DrivAge <=60 THEN "4 - Mid
Senior" 4
WHEN DrivAge > 60 THEN "5 - Senior"
END;

select age as Age_Category,avg(BonusMalus)


from Auto_insurance_risk
group by age;

We can see that BonusMalus i.e. which penalises them for making claims decreases with age.
This can be due to the fact the that older people have much more experience in driving as
compared to younger ones so they are expected to drive cautiously.

14. Mention one major difference between unique constraint and primary key? 2
Primary Key - Only one primary key is allowed to use in a table, thus used to uniquely identify each
record in the table. The primary key does not accept the any duplicate and NULL values

This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00

https://www.coursehero.com/file/106702000/sql-project-filedocx/
Unique key - A column with a unique key constraint can only contain unique values. It is not a
compulsion to have a unique key in a table.

15. If there are 5 records in table A and 10 records in table B and we cross-join these two tables, how many records will be
there in the result set?
2
5*10 = 50

16. What is the difference between inner join and left outer join?

Inner join returns a combined tuples between two or more tables where at least one attribute in
common. If there is no attribute in common between tables then it will return nothing.
2
Left Outer join is an operation that returns a combined tuples from a specified table even the join
condition will fail. It returns all records from the left table (Table 1) and matching records from the
right table (Table 2).

17. Consider a scenario where Table A has 5 records and Table B has 5 records. Now while inner joining Table A and Table
B, there is one duplicate on the joining column in Table B (i.e. Table A has 5 unique records, but Table B has 4 unique
values and one redundant value). What will be record count of the output? 2
25

18. What is the difference between WHERE clause and HAVING clause?

WHERE Clause is used to filter the records from the table based on the specified condition whereas
HAVING Clause is used to filter record from the groups based on the specified condition.
WHERE Clause can be used without GROUP BY Clause, but HAVING Clause cannot be used
without GROUP BY Clause.

This study source was downloaded by 100000826983498 from CourseHero.com on 06-07-2022 06:46:43 GMT -05:00

https://www.coursehero.com/file/106702000/sql-project-filedocx/
Powered by TCPDF (www.tcpdf.org)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy