SQL To Pandas - Group Aggregations
SQL To Pandas - Group Aggregations
Group Aggregations
SQL Pandas
SELECT col
umn1
, MEAN(column2) df.groupby('column1')
FROM table ['column2'].mean()
GROUP BY column1;
Explanation
Grouping data is a powerful method to aggregate and analyze data based on
categories or groups. It helps in summarizing data and deriving insights by grouping
similar items together. In SQL, you can use the GROUP BY clause to group data
based on one or more columns and then apply aggregation functions on the grouped
data. In Pandas, the .groupby() method allows you to group data by one or more
columns and apply aggregation functions to these groups.
Imagine you are analyzing a dataset of companies. You might want to know the
average number of employees in each industry. Grouping the data by industry and
then calculating the average employee count for each group provides this insight.
SQL to Pandas:
Group Aggregations
Key Conepts
.groupby() Method: In Pandas, the .groupby() method groups data by specified
columns and allows you to apply aggregation functions to these groups.
SQL to Pandas:
Group Aggregations
SQL Data
Pandas Data
import pandas as pd
df = pd.DataFrame(data)
SQL to Pandas:
Group Aggregations
SQL Input
SELECT industry
, AVG(employee_count) AS avg_employee_count
FROM companies
GROUP BY industry;
Pandas Input
Pandas Output
industry
Automotive 7.075700e+04
E-commerce 1.335000e+06
Entertainment 8.008500e+03
Technology 8.877283e+04
Name: employee_count, dtype: float64