Pandas_Interview_Questions_PowerBI_SQL
Pandas_Interview_Questions_PowerBI_SQL
Q: You're working with a CSV dataset that contains sales data. Some rows have missing values in the ProductName
and Revenue columns. Before importing into Power BI, how would you handle this using Pandas?
A:
import pandas as pd
df = pd.read_csv('sales.csv')
df = df[df['ProductName'].notna()]
df['Revenue'] = df['Revenue'].fillna(0)
SELECT Region, SUM(SalesAmount) FROM Sales GROUP BY Region HAVING SUM(SalesAmount) > 10000
A:
df_grouped = df.groupby('Region')['SalesAmount'].sum().reset_index()
Q: You have two dataframes: orders and customers. How would you perform a LEFT JOIN in Pandas?
A:
df['OrderDate'] = pd.to_datetime(df['OrderDate'])
df['Year'] = df['OrderDate'].dt.year
df['Month'] = df['OrderDate'].dt.month
monthly_sales['Sales_PY'] = monthly_sales['Sales'].shift(12)
Q: You have a large dataset (10 million rows) in CSV format. What Pandas techniques would you use?
A:
filtered_chunks = []
filtered_chunks.append(chunk)
df_filtered = pd.concat(filtered_chunks)
A:
Q: How would you clean and transform data from Excel sheets and push to SQL?
A:
combined['Revenue'] = combined['Revenue'].fillna(0)
combined['Date'] = pd.to_datetime(combined['Date'])
engine = create_engine('mssql+pyodbc://server/db?driver=SQL+Server')