0% found this document useful (0 votes)
11 views22 pages

Session 4

The document provides an agenda for a machine learning diploma session on Pandas. It discusses Pandas data structures like Series and DataFrames, data types like quantitative, qualitative, discrete and continuous. It also covers data loading from CSV files into Pandas, exploring the data through methods like head(), tail(), info(), describe(), and checking columns and data types. The last section mentions a mini project to practice these Pandas concepts on New York City Airbnb open data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views22 pages

Session 4

The document provides an agenda for a machine learning diploma session on Pandas. It discusses Pandas data structures like Series and DataFrames, data types like quantitative, qualitative, discrete and continuous. It also covers data loading from CSV files into Pandas, exploring the data through methods like head(), tail(), info(), describe(), and checking columns and data types. The last section mentions a mini project to practice these Pandas concepts on New York City Airbnb open data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Machine Learning Diploma

Session4: Pandas
Agenda
➔ Pandas
➔ Data Types
➔ Mini project

2
1. Pandas

3
Pandas:
➔ Pandas provides data structures and functionality to quicklymanipulate
and analyze data.
➔ The key to understanding Pandas for machine learning is understanding
the Series and DataFrame data structures.

4
Pandas Series:
➔ A series is a one dimensional array where the rows and columns can be
labeled.
➔ You can access the data in a series like a NumPy array and like a
dictionary.

5
Pandas DataFrame:
➔ A data frame is a multi-dimensional array where the rows and the
columns can be labeled.

6
Data Loading:
➔ The most common format for machine learning data is CSV files.
➔ There are a number of considerations when loading yourmachine
learning data from CSV files.
○ File Headers. Does your data have a file header?, you may need to
name your attributes manually.
○ Delimiter. The standard delimiter that separates values in fields is
the comma (,) Your file could use a different delimiter like tab or
white space in which case you must specify it explicitly.
○ Quotes. Sometimes field values can have spaces. In these CSV files
the values are often quoted.

7
Data Loading:
➔ We will use New York City Airbnb Open Data Database forpracticing.
➔ Download the CSV file to your folder with yourscripts.
➔ You can load the data into your scriptusing:
○ Python Standard Library.
○ NumPy
○ Pandas

8
Data Loading using pandas:
➔ You can load your CSV data using Pandas and the pandas.readcsv()
function.
➔ The function returns a pandas.DataFrame that you can immediately start
summarizing and plotting.

9
Data Exploratory:
➔ First step is to see how your data is formulated. You can view the first few
rows of a dataframe using df.head() method. It can take the number of
rows you want to see. df.head(7) will retrieve the first 7 rows.
➔ df.tail() is the same as df.head() instead it movies back the last few rows.

10
Data Exploratory:
➔ df.info() gives a summary info of
the data, like number if non-null
values. You can find that some
columns have lower number of
non-null count; meaning they
have null values.

11
Data Exploratory:
➔ df.describe() give back summary statistics.

12
Data Exploratory:
➔ df.columns give back columnnames/features.

13
Data Exploratory:
➔ df.dtypes give back column datatypes.

14
2. Data Types

15
Data Types:
➔ Quantitative is numerical data like number of dogs
➔ Qualitative is text data like the bread of dogs.

16
Data Types:
➔ Discrete data is a numerical type of data that includes whole, concrete
numbers with specific and fixed data values determined by counting. Like
number of dogs.
➔ Continuous data includes complex numbers and varying data values that
are measured over a specific time interval. Like temperature readings.

17
Data Types:

18
Data Types:
➔ Nominal data simply names something without assigning it to an order in
relation to other numbered objects or pieces of data. Like colors.
➔ Ordinal data, unlike nominal data, involves some order. Like grades.

19
3. Mini project

20
Any Questions?

21
THANK YOU!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy