0% found this document useful (0 votes)
18 views6 pages

SQL To Pandas - Group Aggregations

SQL to pandas

Uploaded by

sagarvshinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

SQL To Pandas - Group Aggregations

SQL to pandas

Uploaded by

sagarvshinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SQL to Pandas:

Group Aggregations

SQL Pandas

SELECT col
umn1
, MEAN(column2) df.groupby('column1')
FROM table ['column2'].mean()
GROUP BY column1;

#SQL_to_Pandas with Shane Butler


SQL to Pandas:
Group Aggregations

Explanation
Grouping data is a powerful method to aggregate and analyze data based on
categories or groups. It helps in summarizing data and deriving insights by grouping
similar items together. In SQL, you can use the GROUP BY clause to group data
based on one or more columns and then apply aggregation functions on the grouped
data. In Pandas, the .groupby() method allows you to group data by one or more
columns and apply aggregation functions to these groups.

Imagine you are analyzing a dataset of companies. You might want to know the
average number of employees in each industry. Grouping the data by industry and
then calculating the average employee count for each group provides this insight.
SQL to Pandas:
Group Aggregations

Key Conepts
.groupby() Method: In Pandas, the .groupby() method groups data by specified
columns and allows you to apply aggregation functions to these groups.
SQL to Pandas:
Group Aggregations

SQL Data

-- Create the companies table


CREATE TABLE companies (
company_id INT,
company_name VARCHAR(50),
industry VARCHAR(50),
location VARCHAR(50),
employee_count INT,
revenue INT,
year_founded INT
);

-- Insert data into the companies table


INSERT INTO companies (company_id, company_name, industry, location, employee_count, revenue,
year_founded) VALUES
(1, 'Apple', 'Technology', 'Cupertino', 147000, 274515000, 1976),
(2, 'Google', 'Technology', 'Mountain View', 135301, 182527000, 1998),
(3, 'Microsoft', 'Technology', 'Redmond', 163000, 143015000, 1975),
(4, 'Amazon', 'E-commerce', 'Seattle', 1335000, 386064000, 1994),
(5, 'Meta', 'Technology', 'Menlo Park', 58604, 85965000, 2004),
(6, 'Tesla', 'Automotive', 'Palo Alto', 70757, 31536000, 2003),
(7, 'Uber', 'Technology', 'San Francisco', 22600, 11427000, 2009),
(8, 'Airbnb', 'Technology', 'San Francisco', 6132, 3376000, 2008),
(9, 'Netflix', 'Entertainment', 'Los Gatos', 9400, 25099999, 1997),
(10, 'Spotify', 'Entertainment', 'Stockholm', 6617, 7800000, 2006);
SQL to Pandas:
Group Aggregations

Pandas Data
import pandas as pd

#Create a DataFrame (like a table in SQL)


data = {
'company_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'company_name': ['Apple', 'Google', 'Microsoft', 'Amazon', 'Meta', 'Tesla', 'Uber', 'Airbnb',
'Netflix', 'Spotify'],
'industry': ['Technology', 'Technology', 'Technology', 'E-commerce', 'Technology', 'Automotive',
'Technology', 'Technology', 'Entertainment', 'Entertainment'],
'location': ['Cupertino', 'Mountain View', 'Redmond', 'Seattle', 'Menlo Park', 'Palo Alto', 'San
Francisco', 'San Francisco', 'Los Gatos', 'Stockholm'],
'employee_count': [147000, 135301, 163000, 1335000, 58604, 70757, 22600, 6132, 9400, 6617],
'revenue': [274515000, 182527000, 143015000, 386064000, 85965000, 31536000, 11427000, 3376000,
25099999, 7800000],
'year_founded': [1976, 1998, 1975, 1994, 2004, 2003, 2009, 2008, 1997, 2006]
}

df = pd.DataFrame(data)
SQL to Pandas:
Group Aggregations
SQL Input

SELECT industry
, AVG(employee_count) AS avg_employee_count
FROM companies
GROUP BY industry;

Pandas Input

# Calculate the avg employee count by industry (like GROUP BY in SQL)


grouped_df = df.groupby('industry')['employee_count'].mean()
print(grouped_df)

Pandas Output

industry
Automotive 7.075700e+04
E-commerce 1.335000e+06
Entertainment 8.008500e+03
Technology 8.877283e+04
Name: employee_count, dtype: float64

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy