Top 50 Industry-Relevant Data Analyst Interview Q - A
Top 50 Industry-Relevant Data Analyst Interview Q - A
o INNER JOIN: Returns records with matching values in both tables. Use when you need only
common records.
o LEFT JOIN: Returns all records from the left table and matched records from the right. Use when
you want all records from the left.
o FULL OUTER JOIN: Returns all records when there's a match in either table.
3. How would you find the second highest salary from an Employee table?
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
o Used to perform calculations across rows related to the current row. Example: RANK() OVER
(PARTITION BY dept ORDER BY salary DESC)
o Use indexes, avoid SELECT *, limit subqueries, use EXPLAIN PLAN, and check joins and filters.
o CTEs improve readability, can be recursive, and can be reused within the same query.
SELECT col1, COUNT(*) FROM table GROUP BY col1 HAVING COUNT(*) > 1;
• SELECT date, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
FROM sales_data;
15. How would you detect and analyze outliers in transactional data?
• Use statistical functions like AVG, STDDEV, or create IQR filters using subqueries.
Python (12 Questions)
1. How do you handle large datasets in Python?
o apply(): Series/DataFrame.
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine('db_string')
o Drop/Impute nulls, fix data types, remove duplicates, outlier detection, encoding.
6. Role of NumPy?
8. loc[] vs iloc[]?
• def calc_corr(df):
return df.corr()
• Use unittest, pytest, and mock I/O or small test data sets.
Excel (13 Questions)
1. Use Pivot Tables?
o Summarize, filter, and analyze data interactively with rows, columns, and values.
2. VLOOKUP vs INDEX-MATCH?
5. Use of INDIRECT()?
6. Create dashboards?
7. Array Formulas?
9. SUMIFS vs SUMPRODUCT?
• Use Data > Get External Data > From Web/SQL Server/API.
Power BI (10 Questions)
1. Calculated columns vs measures?
2. Optimize performance?
o Use star schema, reduce columns, optimize DAX, disable auto date/time.
6. Row-level security?
o Define roles in Modeling tab and assign DAX filters to limit data access.
o Used for ETL steps like cleaning, merging, transforming datasets before modeling.
o Use Power BI Service with workspaces, or Git with .pbix files for versioning.