0% found this document useful (0 votes)
2 views3 pages

Pandas Functions (1)

The document explains the use of loc and iloc methods in pandas for selecting data from DataFrames, highlighting their differences in label-based and integer-based indexing. It also covers methods for filling missing values (bfill and ffill), various pandas functions for data inspection, selection, manipulation, handling missing data, visualization, I/O operations, statistical functions, and merging/joining DataFrames. Overall, it serves as a comprehensive guide to essential pandas functionalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Pandas Functions (1)

The document explains the use of loc and iloc methods in pandas for selecting data from DataFrames, highlighting their differences in label-based and integer-based indexing. It also covers methods for filling missing values (bfill and ffill), various pandas functions for data inspection, selection, manipulation, handling missing data, visualization, I/O operations, statistical functions, and merging/joining DataFrames. Overall, it serves as a comprehensive guide to essential pandas functionalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

loc and iloc are two methods used in pandas, a popular Python library for data manipulation.

They are
used to select rows and columns from a DataFrame, but they differ in how they reference and access
data:

loc:
Stands for "location" and is primarily label-based.

It is used for selecting data by specifying row and column labels or boolean conditions.

The syntax is typically df.loc[row_label, column_label] or df.loc[boolean_condition].

iloc:
Stands for "integer location" and is primarily integer-based.

It is used for selecting data by specifying the integer positions of rows and columns.

The syntax is typically df.iloc[row_index, column_index].

Here's an example to illustrate the difference:

E.X.

import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}

df = pd.DataFrame(data, index=['x', 'y', 'z'])

# Using loc to select data by label

result_loc = df.loc['x', 'A'] # Selects the value at row 'x' and column 'A'

# Using iloc to select data by integer location

result_iloc = df.iloc[0, 0] # Selects the value at the first row and first column

print(result_loc) # Output: 1

print(result_iloc) # Output: 1
Bfill and ffill
bfill and ffill are methods in pandas used for filling missing values in a DataFrame or Series with
values from nearby rows. They are often used in data preprocessing when dealing with missing
data.

bfill stands for "backward fill." It fills missing values with the next valid value from the bottom
(i.e., the next row in the DataFrame). It looks backward to fill gaps.

ffill stands for "forward fill." It fills missing values with the last valid value from the top (i.e., the
previous row in the DataFrame). It looks forward to fill gaps.

Pandas functions
For Data Inspection
1. df.head(n): Display the first n rows of a DataFrame.
2. df.tail(n): Display the last n rows of a DataFrame.
3. df.shape: Get the number of rows and columns in the DataFrame.
4. df.info(): Display information about the DataFrame, including data types and missing
values.
5. df.describe(): Generate descriptive statistics for numeric columns.

Selection and Filtering:


1. df[column_name]: Select a single column by name.
2. df[[col1, col2]]: Select multiple columns.
3. df.loc[rows, columns]: Select rows and columns by label.
4. df.iloc[rows, columns]: Select rows and columns by integer position.
5. df[df['column'] > value]: Filter rows based on a condition.
Data Manipulation:

1. df.drop(columns=['col1', 'col2']): Remove specified columns.


2. df.rename(columns={'old_name': 'new_name'}): Rename columns.
3. df.sort_values(by='column_name'): Sort the DataFrame by a column.
4. df.groupby('column_name').agg(func): Group data and apply an aggregation function.
5. df.pivot_table(): Create pivot tables.

Handling Missing Data:


1. df.isnull(): Check for missing values.
2. df.dropna(): Remove rows with missing values.
3. df.fillna(value): Fill missing values with a specific value.

Data Visualization:
1. df.plot(): Create basic plots using Matplotlib.

I/O Operations:
pd.read_csv('file.csv'): Read data from a CSV file.
df.to_csv('file.csv'): Write DataFrame to a CSV file.
Similar functions exist for other file formats like Excel, SQL databases, etc.

Statistical Functions:
df.mean(), df.median(), df.std(), etc.: Calculate basic statistics for columns.

Merging and Joining Data:


pd.concat([df1, df2]): Concatenate DataFrames.
pd.merge(df1, df2, on='key'): Perform SQL-like joins.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy