Data Preparation-All Pds
Data Preparation-All Pds
Data Preparation
• Guide to Encoding Categorical Values in Python
1
2/20/2022
Data Preparation
• As with many other aspects of the Data Science world, there is no
single answer on how to approach this problem
2
2/20/2022
>> # Define the headers since the data does not have any
>> headers = ["symboling", "normalized_losses", "make", "fuel_type",
"aspiration","num_doors", "body_style", "drive_wheels",
"engine_location", "wheel_base", "length", "width", "height",
"curb_weight", "engine_type", "num_cylinders", "engine_size",
"fuel_system", "bore", "stroke", "compression_ratio",
"horsepower", "peak_rpm", "city_mpg", "highway_mpg",
"price"]
>> df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/autos/imports-85.data", header=None,
names=headers, na_values="?" )
>> df.head()
3
2/20/2022
>> df.dtypes
4
2/20/2022
10
5
2/20/2022
11
12
6
2/20/2022
13
14
7
2/20/2022
15
>> obj_df.dtypes
16
8
2/20/2022
17
18
9
2/20/2022
19
20
10
2/20/2022
21
• This has the benefit of not weighting a value improperly but does
have the downside of adding more columns to the data set.
22
11
2/20/2022
23
24
12
2/20/2022
• Proper naming will make the rest of the analysis just a little bit
easier.
25
26
13
2/20/2022
27
28
14
2/20/2022
29
15