0% found this document useful (0 votes)
28 views

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Introduction to Data Science

using python Part2


Pandas
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Let’s see how we read this data into pandas:


Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Before you use pandas you must


Let’s see how we read this data into pandas: import it. Anytime you use pandas put
this line as the top of your code.
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Reading the data into a variable called


Let’s see how we read this data into pandas: df_grades.

Built in read_csv method Path to file


Reading in Data From Excel
So, what is df_grades and how does it store the data?

Typing the name of any variable at the end of a code cell will display the contents of
the variable.
Reading in Data From Excel
So, what is df_grades and how does it store the data?

• df_grades is a pandas dataframe.

• The data is stored in a tabular format very similar to excel.


Reading in Data From Excel
Data file

Jupyter notebook
Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook


Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter Notebook

“/” separates directories


Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook in folder


Notebooks
“..” = go back one directory
The head() Method
Using the head() method

• If the data is really large you don’t want to print out the entire dataframe to your
output.

• The head(n) method outputs the first n rows of the data frame. If n is not supplied,
the default is the first 5 rows.

• I like to run the head() method after I read in the dataframe to check that everything
got read in correctly.

• There is also a tail(n) method that returns the last n rows of the dataframe
Basic Features

Think of this
as a list

object = string

float64 = decimal

int64 = integer
Basic Features
column names

row names = index


Basic Features
column names

row names = index


Basic Features
column names

row names = index

• Pandas defaults to have the index be the row number and it will automatically
recognize that the first row is the column names.

• Next we discuss how to pick out various pieces of the dataframe.


Selecting a Single Column

• Between square brackets, the column must be given as a string


• Outputs column as a series
• A series is a one dimensional dataframe. more on this in the slicing
section
Selecting a Single Column

• Exactly equivalent way to get Name column


• + : don’t have to type brackets or quotes
• -: won’t generalize to selecting multiple columns,, won’t work if column
names have spaces, can’t create new columns this way
Selecting Multiple Columns

• List of strings, which correspond to


column names.
• You can select as many column as
you want.
• Column don’t have to be contiguous.
Storing Result

Why store a slice?

• We might want/have to do our


analysis is steps.
• Less error prone
• More readable

The variable name stores a


series
Slicing a Series

Slice/index through
the index, which is
usually numbers
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element


Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice


non_inclusive
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice


Arbitrary slice
Slicing a Data Frame

• There are a few ways to pick slice a data frame, we will use the .loc method.

• Access elements through the index labels column names

• We will see how to change both of these labels later on


Slicing a Data Frame

• Pick a single value out.


Column name
Index label (string)
(number)
Slicing a Data Frame

• Pick out entire row: “pick out all


columns”

first_row is a series
Slicing a Data Frame

• Pick out contiguous chunk: Endpoints are inclusive!


Slicing a Data Frame

• Pick out arbitrary chunk:


Built in Functions

How do I compute the average score on the final?


Built in Functions

How do I compute the average score on the final?

Built in mean() method


Built in Functions

How do I compute the highest Mini Exam 1 score?


Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

summary_df is
a dataframe!
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

Notice here the


index is not row
numbers…
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value apears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value appears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

unique(): Returns an array of all of the unique values.


Attributes vs. Methods

When do I a put a ()?


Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods
Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods

Require computation for output

Features of dataframe
Creating New Columns

Let’s create a useless new column of all 1s:


Creating New Columns

We can also create column as function of other column. The Final was worth 36
points, let’s create a column for each student’s percentage.
Deleting Columns
Deleting Columns

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy