Top 100 Data Analyst Questions 1 to 60
Top 100 Data Analyst Questions 1 to 60
SQL (Structured Query Language) is a standard programming language used to manage and query data in
relational databases. It helps data analysts retrieve, manipulate, and analyze structured data stored in
databases.
Example:
WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use WHERE when
filtering individual records and HAVING with GROUP BY to filter aggregated results.
Example:
Joins combine rows from two or more tables based on a related column.
Example:
SELECT c.name, o.order_id FROM customers c INNER JOIN orders o ON c.id = o.customer_id;
Use GROUP BY, SUM() to aggregate sales, ORDER BY to sort, and LIMIT to get top results.
Example:
Normalization reduces redundancy and ensures data integrity by organizing data into related tables. It
involves breaking large tables into smaller ones and linking them via keys.
Example: Separate customer and order data into 'Customers' and 'Orders' tables linked by customer_id.
Pivot Tables summarize and analyze data interactively. You can group data, compute aggregates (sum, avg),
Example: Summarize total sales per region using Region, Sales, and Product columns.
VLOOKUP searches for a value in the first column of a range and returns a value in the same row from
another column.
Example:
=VLOOKUP(101, A2:C10, 3, FALSE) returns the 3rd column value for ID 101.
Power BI Desktop is for offline report creation; Power BI Service is a cloud platform for sharing and
collaborating on dashboards.
DAX (Data Analysis Expressions) is a formula language in Power BI for custom calculations and measures.
Example:
Techniques include IFERROR, filtering blanks, replacing with average/median, or using interpolation.
Example:
11. What are Pandas and why is it important for data analysts?
Pandas is a Python library that provides data structures and functions for working with structured data. It
Example:
import pandas as pd
df = pd.read_csv("data.csv")
df.groupby("Region")["Sales"].sum()
Example:
Example:
Example:
Example:
A database is used for recording day-to-day transactions (OLTP), while a data warehouse is used for
analytical processing (OLAP). Databases are optimized for CRUD operations, whereas data warehouses are
Example: A retail POS system uses a database to store real-time sales, while a data warehouse aggregates
Data cleaning is the process of identifying and correcting (or removing) errors and inconsistencies in data. It
Example: Removing duplicates, filling missing values, correcting typos in city names like 'Banglore' to
'Bangalore'.
- BOOLEAN: True/False
Example:
Use ROW_NUMBER or GROUP BY to identify duplicates. Then filter where row number > 1.
Example:
WITH temp AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS rn FROM
20. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?
Window functions perform calculations across a set of rows related to the current row without collapsing rows
Example:
SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) as Rank FROM Employees;
GROUP BY groups rows with the same values in specified columns so aggregate functions like SUM,
Example:
Use a subquery or window function to calculate total, then divide individual value by total.
Example:
GROUP BY Department;
Top 100 Data Analyst Interview Questions with Detailed Answers
Power BI supports:
- Bar/Column Charts
- Pie/Donut Charts
- Line Charts
- Area Charts
- Maps
- Tree Maps
- Scatter plots
Example: Use a pie chart to show sales by region or a line chart for trend analysis over time.
Query Editor is used to clean, transform, and shape data before it's loaded into Power BI. You can filter rows,
Example: Replace missing values with 'Unknown' or remove extra spaces in column names.
27. What is the difference between INNER JOIN and LEFT JOIN?
- LEFT JOIN: Returns all rows from the left table, and matching rows from the right (NULL if no match).
Example:
- Profit Margin
29. What are the basic steps of the data analysis process?
5. Interpret results
6. Communicate findings
Example: For declining sales, analyze trends, product lines, and customer feedback.
Correlation measures the strength and direction of a relationship between two variables, ranging from -1 to 1.
- 1: Perfect positive
- 0: No correlation
SQL (Structured Query Language) is a standard programming language used to manage and query data in
relational databases. It helps data analysts retrieve, manipulate, and analyze structured data stored in
databases.
Example:
WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use WHERE when
filtering individual records and HAVING with GROUP BY to filter aggregated results.
Example:
Joins combine rows from two or more tables based on a related column.
Example:
SELECT c.name, o.order_id FROM customers c INNER JOIN orders o ON c.id = o.customer_id;
Use GROUP BY, SUM() to aggregate sales, ORDER BY to sort, and LIMIT to get top results.
Example:
Normalization reduces redundancy and ensures data integrity by organizing data into related tables. It
involves breaking large tables into smaller ones and linking them via keys.
Example: Separate customer and order data into 'Customers' and 'Orders' tables linked by customer_id.
Pivot Tables summarize and analyze data interactively. You can group data, compute aggregates (sum, avg),
Example: Summarize total sales per region using Region, Sales, and Product columns.
VLOOKUP searches for a value in the first column of a range and returns a value in the same row from
another column.
Example:
=VLOOKUP(101, A2:C10, 3, FALSE) returns the 3rd column value for ID 101.
Power BI Desktop is for offline report creation; Power BI Service is a cloud platform for sharing and
collaborating on dashboards.
DAX (Data Analysis Expressions) is a formula language in Power BI for custom calculations and measures.
Example:
Techniques include IFERROR, filtering blanks, replacing with average/median, or using interpolation.
Example:
11. What are Pandas and why is it important for data analysts?
Pandas is a Python library that provides data structures and functions for working with structured data. It
Example:
import pandas as pd
df = pd.read_csv("data.csv")
df.groupby("Region")["Sales"].sum()
Example:
Example:
Example:
Example:
A database is used for recording day-to-day transactions (OLTP), while a data warehouse is used for
analytical processing (OLAP). Databases are optimized for CRUD operations, whereas data warehouses are
Example: A retail POS system uses a database to store real-time sales, while a data warehouse aggregates
Data cleaning is the process of identifying and correcting (or removing) errors and inconsistencies in data. It
Example: Removing duplicates, filling missing values, correcting typos in city names like 'Banglore' to
'Bangalore'.
- BOOLEAN: True/False
Example:
Use ROW_NUMBER or GROUP BY to identify duplicates. Then filter where row number > 1.
Example:
WITH temp AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS rn FROM
20. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?
Window functions perform calculations across a set of rows related to the current row without collapsing rows
Example:
SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) as Rank FROM Employees;
GROUP BY groups rows with the same values in specified columns so aggregate functions like SUM,
Example:
Use a subquery or window function to calculate total, then divide individual value by total.
Example:
GROUP BY Department;