Session 4
Session 4
Session4: Pandas
Agenda
➔ Pandas
➔ Data Types
➔ Mini project
2
1. Pandas
3
Pandas:
➔ Pandas provides data structures and functionality to quicklymanipulate
and analyze data.
➔ The key to understanding Pandas for machine learning is understanding
the Series and DataFrame data structures.
4
Pandas Series:
➔ A series is a one dimensional array where the rows and columns can be
labeled.
➔ You can access the data in a series like a NumPy array and like a
dictionary.
5
Pandas DataFrame:
➔ A data frame is a multi-dimensional array where the rows and the
columns can be labeled.
6
Data Loading:
➔ The most common format for machine learning data is CSV files.
➔ There are a number of considerations when loading yourmachine
learning data from CSV files.
○ File Headers. Does your data have a file header?, you may need to
name your attributes manually.
○ Delimiter. The standard delimiter that separates values in fields is
the comma (,) Your file could use a different delimiter like tab or
white space in which case you must specify it explicitly.
○ Quotes. Sometimes field values can have spaces. In these CSV files
the values are often quoted.
7
Data Loading:
➔ We will use New York City Airbnb Open Data Database forpracticing.
➔ Download the CSV file to your folder with yourscripts.
➔ You can load the data into your scriptusing:
○ Python Standard Library.
○ NumPy
○ Pandas
8
Data Loading using pandas:
➔ You can load your CSV data using Pandas and the pandas.readcsv()
function.
➔ The function returns a pandas.DataFrame that you can immediately start
summarizing and plotting.
9
Data Exploratory:
➔ First step is to see how your data is formulated. You can view the first few
rows of a dataframe using df.head() method. It can take the number of
rows you want to see. df.head(7) will retrieve the first 7 rows.
➔ df.tail() is the same as df.head() instead it movies back the last few rows.
10
Data Exploratory:
➔ df.info() gives a summary info of
the data, like number if non-null
values. You can find that some
columns have lower number of
non-null count; meaning they
have null values.
11
Data Exploratory:
➔ df.describe() give back summary statistics.
12
Data Exploratory:
➔ df.columns give back columnnames/features.
13
Data Exploratory:
➔ df.dtypes give back column datatypes.
14
2. Data Types
15
Data Types:
➔ Quantitative is numerical data like number of dogs
➔ Qualitative is text data like the bread of dogs.
16
Data Types:
➔ Discrete data is a numerical type of data that includes whole, concrete
numbers with specific and fixed data values determined by counting. Like
number of dogs.
➔ Continuous data includes complex numbers and varying data values that
are measured over a specific time interval. Like temperature readings.
17
Data Types:
18
Data Types:
➔ Nominal data simply names something without assigning it to an order in
relation to other numbered objects or pieces of data. Like colors.
➔ Ordinal data, unlike nominal data, involves some order. Like grades.
19
3. Mini project
20
Any Questions?
21
THANK YOU!