0% found this document useful (0 votes)

17 views28 pages

Cyber Security Breaches Coding

Uploaded by

rasheedmumuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views28 pages

Cyber Security Breaches Coding

Uploaded by

rasheedmumuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

import numpy as np

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sys
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
Dataset 1

In [2]:
df = pd.read_csv("/kaggle/input/data-breaches-a-comprehensive-list/df_1.csv")
In [3]:
df.head()
Out[3]:

Unnamed: 0 Entity Year Records Organization type Method Sources

0 0 21st Century Oncology 2016 2200000 healthcare hacked [5][6]

1 1 500px 2020 14870304 social networking hacked [7]

poor
2 2 Accendo Insurance Co. 2020 175350 healthcare [8][9]
security

15200000
3 3 Adobe Systems Incorporated 2013 tech hacked [10]
0

poor
4 4 Adobe Inc. 2019 7500000 tech [11][12]
security

In [4]:
df.shape
Out[4]:
(352, 7)
In [5]:
df.dtypes
Out[5]:
Unnamed: 0 int64
Entity object
Year object
Records object
Organization type object
Method object
Sources object
dtype: object
In [6]:
df.drop(['Sources'], axis=1, inplace=True)
In [7]:
df.isnull().any()
Out[7]:
Unnamed: 0 False
Entity False
Year False
Records True
Organization type False
Method True
dtype: bool
In [8]:
df.isnull().sum()
Out[8]:
Unnamed: 0 0
Entity 0
Year 0
Records 2
Organization type 0
Method 1
dtype: int64
In [9]:
df.columns = ['id', 'Entity', 'Year', 'Records', 'Organization type',
'Method']
In [10]:
df.head(10)
Out[10]:

id Entity Year Records Organization type Method

0 0 21st Century Oncology 2016 2200000 healthcare hacked

1 1 500px 2020 14870304 social networking hacked

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security

id Entity Year Records Organization type Method

15200000
3 3 Adobe Systems Incorporated 2013 tech hacked
0

4 4 Adobe Inc. 2019 7500000 tech poor security

5 5 Advocate Medical Group 2017 4000000 healthcare lost / stolen media

6 6 AerServ (subsidiary of InMobi) 2018 75000 advertising hacked

7 7 Affinity Health Plan, Inc. 2013 344579 healthcare lost / stolen media

32000000
8 8 Airtel 2019 telecommunications poor security
0

9 9 Air Canada 2018 20000 transport hacked

In [11]:
table_year_df = df['Year'].value_counts()
table_year_df
Out[11]:
2011 34
2020 31
2019 30
2015 28
2013 28
2018 26
2014 25
2012 23
2016 22
2010 19
2008 16
2021 13
2009 13
2007 12
2017 9
2006 7
2005 6
2022 5
2004 2
2019-2020 1
2018-2019 1
2014 and 2015 1
Name: Year, dtype: int64
In [12]:
df['Year'] = df['Year'].astype(str)
df['Year'] = df['Year'].str[:4]
df['Year'] = df['Year'].astype(int)
In [13]:
df.dtypes
Out[13]:
id int64
Entity object
Year int64
Records object
Organization type object
Method object
dtype: object
In [14]:
table_year_df = df['Year'].value_counts()
table_year_df
Out[14]:
2011 34
2019 31
2020 31
2015 28
2013 28
2018 27
2014 26
2012 23
2016 22
2010 19
2008 16
2021 13
2009 13
2007 12
2017 9
2006 7
2005 6
2022 5
2004 2
Name: Year, dtype: int64
In [15]:
sns.countplot(x='Year', data=df);
plt.title('Data Breaches pro Jahr')
plt.xticks(rotation=90);

In [16]:
sns.countplot(x='Year', data=df, order=table_year_df.index.values);
plt.title('Data Breaches pro Jahr in order')
plt.xticks(rotation=90);

In [17]:
table1 = df['Method'].value_counts()
table1
Out[17]:
hacked 192
poor security 43
lost / stolen media 33
accidentally published 21
inside job 19
lost / stolen computer 16
unknown 7
improper setting, hacked 2
poor security/inside job 2
intentionally lost 1
accidentally exposed 1
publicly accessible Amazon Web Services (AWS) server 1
hacked/misconfiguration 1
rogue contractor 1
ransomware hacked 1
misconfiguration/poor security 1
unprotected api 1
zero-day vulnerabilities 1
data exposed by misconfiguration 1
Poor security 1
poor security / hacked 1
accidentally uploaded 1
unsecured S3 bucket 1
inside job, hacked 1
social engineering 1
Name: Method, dtype: int64
In [18]:
sns.countplot(x='Method', data=df, order = table1.index.values);
plt.title('Method')
plt.xticks(rotation=90);

In [19]:
df_nothacked = df.loc[df['Method'] != 'hacked']
df_nothacked.head()
Out[19]:

id Entity Year Records Organization type Method

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security

4 4 Adobe Inc. 2019 7500000 tech poor security

5 5 Advocate Medical Group 2017 4000000 healthcare lost / stolen media

Affinity Health Plan,

7 7 2013 344579 healthcare lost / stolen media
Inc.

8 8 Airtel 2019 320000000 telecommunications poor security

In [20]:
sns.countplot(x='Method', data=df_nothacked, order =
df_nothacked['Method'].value_counts().index);
plt.title('Method')
plt.xticks(rotation=90);

In [21]:
table2 = df['Organization type'].value_counts()
table2.head(23)
Out[21]:
web 53
healthcare 47
financial 38
government 30
retail 27
tech 19
academic 13
telecoms 12
gaming 12
social network 8
hotel 8
transport 7
military 7
energy 4
restaurant 3
media 3
mobile carrier 2
social media 2
government, military 2
telecom 2
tech, retail 2
government, healthcare 2
telecommunications 2
Name: Organization type, dtype: int64
In [22]:
org_counts = df['Organization type'].value_counts().rename('org_counts')

df_org = df.merge(org_counts.to_frame(),
left_on='Organization type',
right_index=True)
In [23]:
org_counts.head()
Out[23]:
web 53
healthcare 47
financial 38
government 30
retail 27
Name: org_counts, dtype: int64
In [24]:
df_org.head()
Out[24]:

Organization
id Entity Year Records Method org_counts
type

0 0 21st Century Oncology 2016 2200000 healthcare hacked 47

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security 47

lost / stolen
5 5 Advocate Medical Group 2017 4000000 healthcare 47
media

lost / stolen
7 7 Affinity Health Plan, Inc. 2013 344579 healthcare 47
media
Organization
id Entity Year Records Method org_counts
type

Ankle & Foot Center of Tampa Bay,

14 14 2021 156000 healthcare hacked 47
Inc.

In [25]:
df_org_upper = df_org[df_org.org_counts > 2]
In [26]:
sns.countplot(x='Organization type', data=df_org_upper, order =
df_org_upper['Organization type'].value_counts().index);
plt.title('Data Breaches by Organisations')
plt.xticks(rotation=90);

In [27]:
sns.histplot(x='Organization type', stat='percent', data=df_org_upper);
plt.title('Data Breaches by Organisations')
plt.xticks(rotation=90);

In [28]:
df_cleaned_records = df.drop(df[df.Records == 'unknown'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'G20 world leaders'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'tens of thousands'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '19 years of data'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '63 stores'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'over 5,000,000'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'unknown (client list)'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'millions'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '235 GB'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == '350 clients emails'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '9,000,000 (approx) - basic
booking, 2208 (credit card details)'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'Unknown'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == '2.5GB'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == '250 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '500 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '54 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '51 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '10 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '8 locations'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '93 stores'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == '200 stores'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'undisclosed'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'Source Code
Compromised'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == '100 terabytes'].index,
inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'TBC'].index, inplace=True)
df_cleaned_records = df.drop(df[df.Records == 'unknown'].index, inplace=True)
df_cleaned_records = df.dropna(subset=['Records'])
In [29]:
df_cleaned_records.shape
Out[29]:
(305, 6)
In [30]:
df_cleaned_records['Records'] = df_cleaned_records['Records'].astype(float)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-

docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
In [31]:
df_total_records = df_cleaned_records.groupby('Year', sort=False)
["Records"].sum().reset_index(name ='Total Records')
In [32]:
df_total_records
Out[32]:

Year Total Records

0 2016 5.405824e+08

1 2020 1.251422e+09
Year Total Records

2 2013 3.469435e+09

3 2019 3.824901e+09

4 2017 2.547669e+08

5 2018 1.531850e+09

6 2005 4.682500e+07

7 2021 6.139627e+07

8 2015 2.016545e+08

9 2004 9.251000e+07

1
2006 7.126000e+07
0

1
2014 8.513410e+08
1
Year Total Records

1
2008 6.906650e+07
2

1
2010 1.598048e+07
3

1
2009 2.554680e+08
4

1
2011 2.277881e+08
5

1
2012 4.288396e+08
6

1
2007 1.532864e+08
7

1
2022 9.958922e+06
8

In [33]:
plt.figure(figsize=(10,5))
sns.lineplot(data=df_total_records, x='Year', y='Total Records')
plt.xticks([2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020, 2022]);

In [34]:
df_total_records_org = df_cleaned_records.groupby('Organization type',
sort=False)["Records"].sum().reset_index(name ='Total Records')
In [35]:
df_total_records_org = df_total_records_org.sort_values('Total Records',
ascending=False, ignore_index=True)
df_total_records_org_clean =
df_total_records_org.drop(df_total_records_org.index[21:])
In [36]:
df_total_records_org_clean
Out[36]:

Organization type Total Records

0 web 5.203696e+09

1 social network 1.238000e+09

2 tech 1.000898e+09

3 financial service company 8.850000e+08

4 financial 8.185971e+08

5 hotel 5.055630e+08

6 telecommunications 4.200000e+08

7 retail 3.721407e+08

8 data broker 3.400000e+08

9 Telephone directory 2.990550e+08

10 government 2.023509e+08
Organization type Total Records

personal and demographic data about

11 2.010000e+08
residents ...

12 gaming 1.726240e+08

13 healthcare 1.711426e+08

14 financial, credit reporting 1.631190e+08

15 messaging app 1.620000e+08

16 Consumer Goods 1.500000e+08

17 market analysis 1.200000e+08

18 Question & Answer 1.000000e+08

19 local search 1.000000e+08

20 genealogy 9.228389e+07

In [37]:
sns.catplot(data=df_total_records_org_clean, x='Organization type', y='Total
Records')
plt.title('Total Records per Organisation Type')
plt.xticks(rotation=90);

In [38]:
df_total_records_org_clean =
df_total_records_org_clean.drop(df_total_records_org_clean.index[:1])
In [39]:
sns.catplot(data=df_total_records_org_clean, x='Organization type', y='Total
Records')
plt.title('Total Records per Organisation Type')
plt.xticks(rotation=90);

In [40]:
df_total_records_Method = df_cleaned_records.groupby('Method', sort=False)
["Records"].sum().reset_index(name ='Total Records')
In [41]:
df_total_records_Method = df_total_records_Method.sort_values('Total
Records', ascending=False, ignore_index=True)
df_total_records_Method_clean =
df_total_records_Method.drop(df_total_records_Method.index[21:])
df_total_records_Method_clean
Out[41]:

Method Total Records

0 hacked 7.404780e+09

1 poor security 3.610143e+09

2 unknown 4.482339e+08

3 poor security / hacked 4.122143e+08

4 accidentally published 2.699175e+08

5 data exposed by misconfiguration 2.500000e+08

6 Poor security 2.010000e+08

Method Total Records

7 lost / stolen media 1.704345e+08

8 unsecured S3 bucket 1.060000e+08

9 unprotected api 1.000000e+08

10 misconfiguration/poor security 1.000000e+08

11 inside job, hacked 9.200000e+07

12 inside job 7.642610e+07

13 lost / stolen computer 4.139767e+07

14 publicly accessible Amazon Web Services (AWS) ... 3.800000e+07

15 improper setting, hacked 2.145775e+07

16 social engineering 6.054459e+06

17 poor security/inside job 5.214200e+06

18 ransomware hacked 1.648922e+06

Method Total Records

19 accidentally uploaded 1.500000e+06

20 intentionally lost 9.600000e+05

In [42]:
sns.catplot(data=df_total_records_Method_clean, x='Method', y='Total
Records')
plt.title('Total Records per Method')
plt.xticks(rotation=90);

In [43]:
Method_counts = df['Method'].value_counts().rename('Method_counts')

df_Method = df_total_records_Method_clean.merge(Method_counts.to_frame(),
left_on='Method',
right_index=True)
In [44]:
df_Method['relative'] = df_Method['Total Records']/df_Method['Method_counts']
In [45]:
df_Method = df_Method.sort_values('relative', ascending=False,
ignore_index=True)
In [46]:
df_Method
Out[46]:

Method Total Records Method_counts relative

0 poor security / hacked 4.122143e+08 1 4.122143e+08

1 data exposed by misconfiguration 2.500000e+08 1 2.500000e+08

2 Poor security 2.010000e+08 1 2.010000e+08

3 unsecured S3 bucket 1.060000e+08 1 1.060000e+08

Method Total Records Method_counts relative

4 misconfiguration/poor security 1.000000e+08 1 1.000000e+08

5 unprotected api 1.000000e+08 1 1.000000e+08

6 poor security 3.610143e+09 39 9.256777e+07

7 inside job, hacked 9.200000e+07 1 9.200000e+07

8 unknown 4.482339e+08 5 8.964678e+07

9 hacked 7.404780e+09 160 4.627988e+07

1
publicly accessible Amazon Web Services (AWS) ... 3.800000e+07 1 3.800000e+07
0

1
accidentally published 2.699175e+08 19 1.420618e+07
1

1
improper setting, hacked 2.145775e+07 2 1.072888e+07
2

1
social engineering 6.054459e+06 1 6.054459e+06
3
Method Total Records Method_counts relative

1
lost / stolen media 1.704345e+08 32 5.326079e+06
4

1
inside job 7.642610e+07 18 4.245895e+06
5

1
lost / stolen computer 4.139767e+07 15 2.759844e+06
6

1
poor security/inside job 5.214200e+06 2 2.607100e+06
7

1
ransomware hacked 1.648922e+06 1 1.648922e+06
8

1
accidentally uploaded 1.500000e+06 1 1.500000e+06
9

2
intentionally lost 9.600000e+05 1 9.600000e+05
0

In [47]:
sns.catplot(data=df_Method, x='Method', y='relative')
plt.title('Total Records per Method')
plt.xticks(rotation=90);

In [48]:
df_heatmap = df.copy(deep=True)
In [49]:
le = LabelEncoder()
df_heatmap['Records'] = le.fit_transform(df_heatmap['Records'])
df_heatmap['Entity'] = le.fit_transform(df_heatmap['Entity'])
df_heatmap['Organization type'] = le.fit_transform(df_heatmap['Organization
type'])
df_heatmap['Method'] = le.fit_transform(df_heatmap['Method'])
In [50]:
plt.figure(figsize=(10,5))
sns.heatmap(df_heatmap[['Year', 'Records', 'Organization type',
'Method']].corr(), cmap='Spectral_r', annot=True);

Dataset 2

In [51]:
df2 = pd.read_csv("/kaggle/input/cyber-security-breaches-data/Cyber Security
Breaches.csv")
In [52]:
df2.head()
Out[52]:

Un N S
Name_o Date Type bre bre y
na u t Business_ Individ Location_of Date_Po Su
f_Cover _of_ _of_ ach ach e
me m a Associate uals_A _Breached_ sted_or_ mm
ed_Entit Brea Breac _sta _en a
d: be t _Involved ffected Information Updated ary
y ch h rt d r
0 r e

A
bin
der
cont
aini
Brooke ng 200 2
Army T 10/16 2014- the 9- Na 0
0 1 0 NaN 1000 Theft Paper
Medical X /2009 06-30 prot 10- N 0
Center ecte 16 9
d
heal
th
info
r...

Fiv
e
des
kto
p
Mid
com
America 200 2
pute
Kidney M 9/22/ Network 2014- 9- Na 0
1 2 1 NaN 1000 Theft rs
Stone O 2009 Server 05-30 09- N 0
cont
Associat 22 9
aini
ion, LLC
ng
une
ncr
ypte
d ...
Un N S
Name_o Date Type bre bre y
na u t Business_ Individ Location_of Date_Po Su
f_Cover _of_ _of_ ach ach e
me m a Associate uals_A _Breached_ sted_or_ mm
ed_Entit Brea Breac _sta _en a
d: be t _Involved ffected Information Updated ary
y ch h rt d r
0 r e

Alaska
Departm Other
200 2
ent of Portable
A 10/12 2014- Na 9- Na 0
2 3 2 Health NaN 501 Theft Electronic
K /2009 01-23 N 10- N 0
and Device,
12 9
Social Other
Services

A
lapt
op
was
Health lost
Services by
200 2
for an
D 10/9/ 2014- 9- Na 0
3 4 3 Children NaN 3800 Loss Laptop emp
C 2009 01-23 10- N 0
with loye
09 9
Special e
Need... whi
le
in
tran
...

A
shar
ed
Co
mp
uter
L. 200 2
that
Douglas C 9/27/ Desktop 2014- 9- Na 0
4 5 4 NaN 5257 Theft was
Carlson, A 2009 Computer 01-23 09- N 0
use
M.D. 27 9
d
for
bac
kup
was
...

In [53]:
df2.columns = ['id', 'Number', 'Entity', 'State',
'Business_Associate_Involved', 'Individuals_Affected', 'Date_of_Breach',
'Type_of_Breach', 'Location_of_Breached_Information',
'Date_Posted_or_Updated', 'Summary', 'breach_start',
'breach_end', 'year']
In [54]:
df2.shape
Out[54]:
(1055, 14)
In [55]:
df2.dtypes
Out[55]:
id int64
Number int64
Entity object
State object
Business_Associate_Involved object
Individuals_Affected int64
Date_of_Breach object
Type_of_Breach object
Location_of_Breached_Information object
Date_Posted_or_Updated object
Summary object
breach_start object
breach_end object
year int64
dtype: object
In [56]:
df2.isnull().sum()
Out[56]:
id 0
Number 0
Entity 0
State 0
Business_Associate_Involved 784
Individuals_Affected 0
Date_of_Breach 0
Type_of_Breach 0
Location_of_Breached_Information 0
Date_Posted_or_Updated 0
Summary 913
breach_start 0
breach_end 910
year 0
dtype: int64
In [57]:
df2.head()
Out[57]:
Nu
S Business_ Individ Date_ Type_ Location_of_ Date_Pos Sum brea brea y
i m Enti
ta Associate_ uals_Af of_Br of_Br Breached_Inf ted_or_U mar ch_s ch_ e
d be ty
te Involved fected each each ormation pdated y tart end ar
r

A
bind
er
Bro cont
oke aini
Arm ng 200 2
y T 10/16 2014-06- the 9- Na 0
0 1 0 NaN 1000 Theft Paper
Med X /2009 30 prot 10- N 0
ical ecte 16 9
Cent d
er heal
th
info
r...

Five
Mid
desk
Am
top
eric
com
a
pute
Kid 200 2
rs
ney M 9/22/ Network 2014-05- 9- Na 0
1 2 1 NaN 1000 Theft cont
Ston O 2009 Server 30 09- N 0
aini
e 22 9
ng
Ass
une
ocia
ncry
tion,
pted
LLC
...

Alas
ka
Dep
artm
ent Other
200 2
of Portable
A 10/12 2014-01- 9- Na 0
2 3 2 Heal NaN 501 Theft Electronic NaN
K /2009 23 10- N 0
th Device,
12 9
and Other
Soci
al
Serv
ices

3 4 3 Heal D NaN 3800 10/9/ Loss Laptop 2014-01- A 200 Na 2

th C 2009 23 lapt 9- N 0
Nu
S Business_ Individ Date_ Type_ Location_of_ Date_Pos Sum brea brea y
i m Enti
ta Associate_ uals_Af of_Br of_Br Breached_Inf ted_or_U mar ch_s ch_ e
d be ty
te Involved fected each each ormation pdated y tart end ar
r

op
Serv was
ices lost
for by
Chil an
dren emp 10- 0
with loye 09 9
Spe e
cial whil
Nee e in
d... tran.
..

A
shar
ed
Co
L.
mpu
Dou
ter 200 2
glas
C 9/27/ Desktop 2014-01- that 9- Na 0
4 5 4 Carl NaN 5257 Theft
A 2009 Computer 23 was 09- N 0
son,
used 27 9
M.D
for
.
back
up
was.
..

In [58]:
df2.drop(['Number','Summary', 'Date_Posted_or_Updated', 'breach_start',
'breach_end', 'Business_Associate_Involved'], axis=1, inplace=True)
In [59]:
df2.head()
Out[59]:
i Stat Individuals_Affect Date_of_Brea Type_of_Brea Location_of_Breached_Inform
Entity year
d e ed ch ch ation

Brooke
Army 200
0 1 TX 1000 10/16/2009 Theft Paper
Medical 9
Center

Mid
America
Kidney 200
1 2 MO 1000 9/22/2009 Theft Network Server
Stone 9
Associatio
n, LLC

Alaska
Departme
nt of Other Portable Electronic 200
2 3 AK 501 10/12/2009 Theft
Health and Device, Other 9
Social
Services

Health
Services
for
200
3 4 Children DC 3800 10/9/2009 Loss Laptop
9
with
Special
Need...

L.
Douglas 200
4 5 CA 5257 9/27/2009 Theft Desktop Computer
Carlson, 9
M.D.

In [60]:
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 1055 non-null int64
1 Entity 1055 non-null object
2 State 1055 non-null object
3 Individuals_Affected 1055 non-null int64
4 Date_of_Breach 1055 non-null object
5 Type_of_Breach 1055 non-null object
6 Location_of_Breached_Information 1055 non-null object
7 year 1055 non-null int64
dtypes: int64(3), object(5)
memory usage: 66.1+ KB
In [61]:
sns.countplot(data=df2, x='year');
plt.title('Data Breaches per Year')
plt.xticks(rotation=90);

In [62]:
sns.countplot(data=df2, x='year', order = df2['year'].value_counts().index);
plt.title('Data Breaches per Year in order')
plt.xticks(rotation=90);

In [63]:
table_year_df2 = df2['year'].value_counts()
table_year_df2
Out[63]:
2013 254
2011 229
2012 227
2010 211
2009 56
2014 56
2008 13
2004 2
2005 2
1997 1
2003 1
2007 1
2006 1
2002 1
Name: year, dtype: int64
In [64]:
sns.countplot(data=df2, x='Type_of_Breach', order =
df2['Type_of_Breach'].value_counts().index);
plt.title('Method')
plt.xticks(rotation=90);

In [65]:
table3 = df2['Type_of_Breach'].value_counts()
table3.head(14)
Out[65]:
Theft 516
Unauthorized Access/Disclosure 148
Other 91
Loss 85
Hacking/IT Incident 75
Improper Disposal 38
Theft, Unauthorized Access/Disclosure 26
Theft, Loss 15
Unknown 10
Unauthorized Access/Disclosure, Hacking/IT Incident 9
Unauthorized Access/Disclosure, Other 8
Loss, Unauthorized Access/Disclosure 5
Theft, Other 5
Theft, Unauthorized Access/Disclosure, Hacking/IT Incident 3
Name: Type_of_Breach, dtype: int64
In [66]:
Type_of_Breach_counts =
df2['Type_of_Breach'].value_counts().rename('Type_of_Breach_counts')

df2_Type_of_Breach = df2.merge(Type_of_Breach_counts.to_frame(),
left_on='Type_of_Breach',
right_index=True)
In [67]:
df2_Type_of_Breach_upper =
df2_Type_of_Breach[df2_Type_of_Breach.Type_of_Breach_counts > 4]
In [68]:
sns.countplot(data=df2_Type_of_Breach_upper, x='Type_of_Breach',
order =
df2_Type_of_Breach_upper['Type_of_Breach'].value_counts().index);
plt.title('Method')
plt.xticks(rotation=90);

In [69]:
table4 = df2['State'].value_counts()
table4
Out[69]:
CA 113
TX 83
FL 66
NY 58
IL 49
PA 40
IN 40
OH 33
TN 32
NC 32
MA 32
PR 31
GA 30
KY 26
MI 26
MO 25
WA 25
AZ 21
MN 21
NJ 20
CO 18
VA 18
MD 18
CT 17
OR 15
WI 14
SC 13
AL 12
AR 11
NM 10
NE 9
UT 9
DC 9
IA 8
LA 7
RI 7
KS 7
OK 6
WV 5
MS 5
NV 5
AK 5
WY 4
NH 4
MT 4
DE 3
ND 3
ID 2
HI 1
SD 1
ME 1
VT 1
Name: State, dtype: int64
In [70]:
State_counts = df2['State'].value_counts().rename('State_counts')

df2_State = df2.merge(State_counts.to_frame(),
left_on='State',
right_index=True)
In [71]:
df2_State_upper = df2_State[df2_State.State_counts >= 15]
In [72]:
plt.figure(figsize=(18,8))
sns.countplot(data=df2_State_upper, x='State', order =
df2_State_upper['State'].value_counts().index);
plt.title('Data Breaches per State')
plt.xticks(rotation=90);
In [73]:
df2_2006 = df2.loc[df2['year']>2006]
In [74]:
plt.figure(figsize=(10,5))
plt.scatter(data = df2_2006, y = 'Individuals_Affected', x = 'year',
alpha=1/2);

In [75]:
plt.figure(figsize=(10,5))
plt.hist2d(data = df2_2006, y = 'Individuals_Affected', x = 'year',
cmin=0.5, cmap = 'icefire')
plt.colorbar();

In [76]:
df2_2006_breach =
df2_2006.loc[df2_2006['Type_of_Breach'].isin(df2_2006['Type_of_Breach'].value
_counts().index[:11])]
plt.figure(figsize=(15,8))
sns.countplot(data = df2_2006_breach, x = 'year', hue = 'Type_of_Breach');

In [78]:
plt.figure(figsize=(10,5))
sns.boxplot(data=df2_2006_breach, y = 'Type_of_Breach', x = 'year');

In [79]:
df2_heatmap = df2.copy(deep=True)
In [80]:
le = LabelEncoder()

df2_heatmap['State'] = le.fit_transform(df2_heatmap['State'])
df2_heatmap['Date_of_Breach'] =
le.fit_transform(df2_heatmap['Date_of_Breach'])
df2_heatmap['Type_of_Breach'] =
le.fit_transform(df2_heatmap['Type_of_Breach'])
df2_heatmap['Location_of_Breached_Information'] =
le.fit_transform(df2_heatmap['Location_of_Breached_Information'])
In [81]:
plt.figure(figsize=(10,5))
sns.heatmap(df2_heatmap[['State', 'Individuals_Affected', 'Date_of_Breach',
'Type_of_Breach',
'Location_of_Breached_Information', 'year']].corr(),
cmap='Spectral_r', annot=True);

PANDAS Cheatsheet
No ratings yet
PANDAS Cheatsheet
4 pages
5_6222284503725904817
No ratings yet
5_6222284503725904817
259 pages
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
CheatSheet
No ratings yet
CheatSheet
15 pages
Libro CULTIVOSTRADICIONALES
No ratings yet
Libro CULTIVOSTRADICIONALES
245 pages
dev record final (3)
No ratings yet
dev record final (3)
34 pages
vm5k Me PDF
0% (1)
vm5k Me PDF
72 pages
FM Mock 6
No ratings yet
FM Mock 6
101 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
IAS
No ratings yet
IAS
56 pages
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
No ratings yet
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
9 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
TVM Challenging
100% (1)
TVM Challenging
5 pages
Startup 1668080110
No ratings yet
Startup 1668080110
36 pages
Gujarat Stamp Act
No ratings yet
Gujarat Stamp Act
123 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Measurement and Evaluation
No ratings yet
Measurement and Evaluation
97 pages
MRP Live Md01n
No ratings yet
MRP Live Md01n
7 pages
intro-to-pandas-world-happiness
No ratings yet
intro-to-pandas-world-happiness
20 pages
As3 - Sailau Dinara - Colaboratory
No ratings yet
As3 - Sailau Dinara - Colaboratory
6 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Kunal_DA-12_Assignment-4
No ratings yet
Kunal_DA-12_Assignment-4
26 pages
Adc Unit 1
No ratings yet
Adc Unit 1
67 pages
lab record dev
No ratings yet
lab record dev
20 pages
Practica 9
No ratings yet
Practica 9
24 pages
Pyhon Solution
No ratings yet
Pyhon Solution
45 pages
Preparing a Dataset for Analysis. a Journey Through My Data Preparation _ by Tom Welsh _ Feb, 2022 _ Medium
No ratings yet
Preparing a Dataset for Analysis. a Journey Through My Data Preparation _ by Tom Welsh _ Feb, 2022 _ Medium
18 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
task2-eda-cleaning
No ratings yet
task2-eda-cleaning
33 pages
CST 383 Start-Up Success Failure - Colaboratory
No ratings yet
CST 383 Start-Up Success Failure - Colaboratory
32 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Accounting Tugas 4
No ratings yet
Accounting Tugas 4
15 pages
A Proposal For Awareness Raising
No ratings yet
A Proposal For Awareness Raising
19 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
NB 7
No ratings yet
NB 7
25 pages
python_cheatsheet
No ratings yet
python_cheatsheet
3 pages
Building Trust in Government Through Citizen Engagement
100% (1)
Building Trust in Government Through Citizen Engagement
45 pages
GP-330B Om Eng 44520a PDF
No ratings yet
GP-330B Om Eng 44520a PDF
28 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Fahey 2012 An Improved Ignition Device The Reactive Semiconductor Bridge
No ratings yet
Fahey 2012 An Improved Ignition Device The Reactive Semiconductor Bridge
14 pages
DMV Unit-4-1.pdf
No ratings yet
DMV Unit-4-1.pdf
10 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
DV LAb Staff
No ratings yet
DV LAb Staff
73 pages
articulo nobleza negra
No ratings yet
articulo nobleza negra
9 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Need For Communication Interfaces: Why Are Communication Interfaces Required in Embedded Systems
No ratings yet
Need For Communication Interfaces: Why Are Communication Interfaces Required in Embedded Systems
76 pages
Circular Economy Plastics India Roadmap - 0
No ratings yet
Circular Economy Plastics India Roadmap - 0
85 pages
Data Visulization Chapter 2
No ratings yet
Data Visulization Chapter 2
24 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
7 pages
final dev record
No ratings yet
final dev record
49 pages
Week 2_Data Exploration
No ratings yet
Week 2_Data Exploration
8 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Learning_NumPy_and_pandas
No ratings yet
Learning_NumPy_and_pandas
3 pages
Lease Cheat Sheet: Not A Third Party
No ratings yet
Lease Cheat Sheet: Not A Third Party
9 pages
Cat 740B-2011
100% (3)
Cat 740B-2011
17 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Physics 131 Final2008
No ratings yet
Physics 131 Final2008
2 pages
U.S. Centers For Disease Control and Prevention (CDC) U.S. Centers For Medicare & Medicaid Services (CMS) The Maryland Department of Health (MDH)
No ratings yet
U.S. Centers For Disease Control and Prevention (CDC) U.S. Centers For Medicare & Medicaid Services (CMS) The Maryland Department of Health (MDH)
10 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
24
No ratings yet
24
7 pages
BDA File
No ratings yet
BDA File
26 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Stock Controller
No ratings yet
Stock Controller
7 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Mohit
No ratings yet
Mohit
19 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
PB 1 IP Answer Key 2024
No ratings yet
PB 1 IP Answer Key 2024
6 pages
DRAGHICI SORIN IONUT - Managing Partner La Digital WorkForce - Resume
No ratings yet
DRAGHICI SORIN IONUT - Managing Partner La Digital WorkForce - Resume
2 pages
Https Detailed-Assessment.s3.Amazonaws - Com PDF 3680373 115800 Reports 314084 1075252
No ratings yet
Https Detailed-Assessment.s3.Amazonaws - Com PDF 3680373 115800 Reports 314084 1075252
3 pages
View PDF DS FS 4200DN 12 09 20
No ratings yet
View PDF DS FS 4200DN 12 09 20
2 pages
CASE 11 McDonald
No ratings yet
CASE 11 McDonald
2 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Lijjat Papad
No ratings yet
Lijjat Papad
3 pages
ESTIMATE-WEST 1 HOUSING & DINING, WILLIAMSBURG, VA, Revised
No ratings yet
ESTIMATE-WEST 1 HOUSING & DINING, WILLIAMSBURG, VA, Revised
5 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
PJT Explanation of Code Line by Line
No ratings yet
PJT Explanation of Code Line by Line
2 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Power Plant Operation
86% (7)
Power Plant Operation
199 pages
ChatGPT for Cybersecurity Cookbook: Learn practical generative AI recipes to supercharge your cybersecurity skills
From Everand
ChatGPT for Cybersecurity Cookbook: Learn practical generative AI recipes to supercharge your cybersecurity skills
Clint Bodungen
No ratings yet
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cyber Security Breaches Coding

Uploaded by

Cyber Security Breaches Coding

Uploaded by

import numpy as np

Unnamed: 0 Entity Year Records Organization type Method Sources

0 0 21st Century Oncology 2016 2200000 healthcare hacked [5][6]

1 1 500px 2020 14870304 social networking hacked [7]

id Entity Year Records Organization type Method

0 0 21st Century Oncology 2016 2200000 healthcare hacked

1 1 500px 2020 14870304 social networking hacked

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security

4 4 Adobe Inc. 2019 7500000 tech poor security

5 5 Advocate Medical Group 2017 4000000 healthcare lost / stolen media

6 6 AerServ (subsidiary of InMobi) 2018 75000 advertising hacked

9 9 Air Canada 2018 20000 transport hacked

id Entity Year Records Organization type Method

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security

4 4 Adobe Inc. 2019 7500000 tech poor security

5 5 Advocate Medical Group 2017 4000000 healthcare lost / stolen media

Affinity Health Plan,

8 8 Airtel 2019 320000000 telecommunications poor security

0 0 21st Century Oncology 2016 2200000 healthcare hacked 47

2 2 Accendo Insurance Co. 2020 175350 healthcare poor security 47

Ankle & Foot Center of Tampa Bay,

See the caveats in the documentation: https://pandas.pydata.org/pandas-

Year Total Records

Organization type Total Records

1 social network 1.238000e+09

3 financial service company 8.850000e+08

8 data broker 3.400000e+08

9 Telephone directory 2.990550e+08

personal and demographic data about

14 financial, credit reporting 1.631190e+08

15 messaging app 1.620000e+08

16 Consumer Goods 1.500000e+08

17 market analysis 1.200000e+08

18 Question & Answer 1.000000e+08

19 local search 1.000000e+08

Method Total Records

1 poor security 3.610143e+09

3 poor security / hacked 4.122143e+08

4 accidentally published 2.699175e+08

5 data exposed by misconfiguration 2.500000e+08

6 Poor security 2.010000e+08

7 lost / stolen media 1.704345e+08

8 unsecured S3 bucket 1.060000e+08

9 unprotected api 1.000000e+08

10 misconfiguration/poor security 1.000000e+08

11 inside job, hacked 9.200000e+07

12 inside job 7.642610e+07

13 lost / stolen computer 4.139767e+07

14 publicly accessible Amazon Web Services (AWS) ... 3.800000e+07

15 improper setting, hacked 2.145775e+07

16 social engineering 6.054459e+06

17 poor security/inside job 5.214200e+06

18 ransomware hacked 1.648922e+06

19 accidentally uploaded 1.500000e+06

20 intentionally lost 9.600000e+05

Method Total Records Method_counts relative

0 poor security / hacked 4.122143e+08 1 4.122143e+08

1 data exposed by misconfiguration 2.500000e+08 1 2.500000e+08

2 Poor security 2.010000e+08 1 2.010000e+08

3 unsecured S3 bucket 1.060000e+08 1 1.060000e+08

4 misconfiguration/poor security 1.000000e+08 1 1.000000e+08

5 unprotected api 1.000000e+08 1 1.000000e+08

6 poor security 3.610143e+09 39 9.256777e+07

7 inside job, hacked 9.200000e+07 1 9.200000e+07

8 unknown 4.482339e+08 5 8.964678e+07

9 hacked 7.404780e+09 160 4.627988e+07

3 4 3 Heal D NaN 3800 10/9/ Loss Laptop 2014-01- A 200 Na 2

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.