0% found this document useful (0 votes)
15 views14 pages

Top 100 Data Analyst Questions 1 to 60

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Top 100 Data Analyst Questions 1 to 60

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Top 100 Data Analyst Interview Questions with Detailed Answers

1. What is SQL and why is it important for Data Analysts?

SQL (Structured Query Language) is a standard programming language used to manage and query data in

relational databases. It helps data analysts retrieve, manipulate, and analyze structured data stored in

databases.

Example:

SELECT * FROM Orders WHERE Order_Amount > 10000;

2. Difference between WHERE and HAVING clause?

WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use WHERE when

filtering individual records and HAVING with GROUP BY to filter aggregated results.

Example:

WHERE age > 30 filters individuals.

HAVING COUNT(*) > 2 filters groups with more than 2 members.

3. What are JOINS in SQL? Explain types with examples.

Joins combine rows from two or more tables based on a related column.

- INNER JOIN: Returns matching records in both tables.

- LEFT JOIN: All records from left + matching right.

- RIGHT JOIN: All from right + matching left.

- FULL JOIN: All records with matches wherever possible.

Example:

SELECT c.name, o.order_id FROM customers c INNER JOIN orders o ON c.id = o.customer_id;

4. How do you get the top 5 customers by total sales?

Use GROUP BY, SUM() to aggregate sales, ORDER BY to sort, and LIMIT to get top results.

Example:

SELECT customer_id, SUM(order_value) as total_sales FROM orders GROUP BY customer_id ORDER BY

total_sales DESC LIMIT 5;


Top 100 Data Analyst Interview Questions with Detailed Answers

5. Explain the concept of normalization in databases.

Normalization reduces redundancy and ensures data integrity by organizing data into related tables. It

involves breaking large tables into smaller ones and linking them via keys.

- 1NF: Remove duplicate columns.

- 2NF: Remove partial dependencies.

- 3NF: Remove transitive dependencies.

Example: Separate customer and order data into 'Customers' and 'Orders' tables linked by customer_id.

6. What are Pivot Tables in Excel? How do you use them?

Pivot Tables summarize and analyze data interactively. You can group data, compute aggregates (sum, avg),

and generate dynamic reports.

Example: Summarize total sales per region using Region, Sales, and Product columns.

7. What is VLOOKUP and when would you use it?

VLOOKUP searches for a value in the first column of a range and returns a value in the same row from

another column.

Example:

=VLOOKUP(101, A2:C10, 3, FALSE) returns the 3rd column value for ID 101.

8. What is the difference between Power BI Desktop and Power BI Service?

Power BI Desktop is for offline report creation; Power BI Service is a cloud platform for sharing and

collaborating on dashboards.

Example: Create dashboard in Desktop, publish it to Power BI Service for sharing.

9. What is DAX in Power BI?

DAX (Data Analysis Expressions) is a formula language in Power BI for custom calculations and measures.

Example:

TotalSales = SUM(Sales[Amount]) sums all sales.


Top 100 Data Analyst Interview Questions with Detailed Answers

10. How would you handle missing values in Excel?

Techniques include IFERROR, filtering blanks, replacing with average/median, or using interpolation.

Example:

=IF(ISBLANK(A2), "Missing", A2) replaces blank with 'Missing'.

11. What are Pandas and why is it important for data analysts?

Pandas is a Python library that provides data structures and functions for working with structured data. It

makes data cleaning, transformation, and analysis efficient.

Example:

import pandas as pd

df = pd.read_csv("data.csv")

df.groupby("Region")["Sales"].sum()

12. How do you handle missing values in Pandas?

Use isnull(), dropna(), and fillna() functions.

Example:

df['column'].fillna(df['column'].mean(), inplace=True) replaces nulls with mean.

13. Explain the difference between .loc[] and .iloc[].

loc[] is label-based indexing; iloc[] is integer-based.

Example:

df.loc[2, "Name"] accesses by label

df.iloc[2, 1] accesses by position.

14. How do you merge two datasets in Python?

Use pd.merge() for SQL-style joins or concat() for stacking data.

Example:

pd.merge(customers, orders, on="Customer_ID", how="left")


Top 100 Data Analyst Interview Questions with Detailed Answers

15. What is a lambda function in Python?

Lambda functions are anonymous functions defined with lambda keyword.

Example:

df['New'] = df['Sales'].apply(lambda x: x * 1.1) applies 10% increase to all sales.

16. What is the difference between a database and a data warehouse?

A database is used for recording day-to-day transactions (OLTP), while a data warehouse is used for

analytical processing (OLAP). Databases are optimized for CRUD operations, whereas data warehouses are

optimized for read-heavy queries.

Example: A retail POS system uses a database to store real-time sales, while a data warehouse aggregates

monthly sales for reporting.

17. What is data cleaning and why is it important?

Data cleaning is the process of identifying and correcting (or removing) errors and inconsistencies in data. It

improves data quality and ensures accurate analysis.

Example: Removing duplicates, filling missing values, correcting typos in city names like 'Banglore' to

'Bangalore'.

18. What are the different data types in SQL?

Common SQL data types include:

- INT: Integer values

- VARCHAR: Text strings

- DATE: Date values

- FLOAT/DECIMAL: Decimal numbers

- BOOLEAN: True/False

Example:

CREATE TABLE Students (ID INT, Name VARCHAR(50), GPA DECIMAL(3,2));

19. How do you identify and remove duplicate rows in SQL?


Top 100 Data Analyst Interview Questions with Detailed Answers

Use ROW_NUMBER or GROUP BY to identify duplicates. Then filter where row number > 1.

Example:

WITH temp AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS rn FROM

Students) SELECT * FROM temp WHERE rn = 1;

20. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

- COUNT(*) counts all rows including NULLs.

- COUNT(column) counts non-null values.

- COUNT(DISTINCT column) counts unique non-null values.

Example: COUNT(DISTINCT Region) returns the number of unique regions.

21. What are window functions in SQL?

Window functions perform calculations across a set of rows related to the current row without collapsing rows

like GROUP BY.

Example:

SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) as Rank FROM Employees;

22. What is the use of GROUP BY in SQL?

GROUP BY groups rows with the same values in specified columns so aggregate functions like SUM,

COUNT, AVG can be applied.

Example:

SELECT Department, AVG(Salary) FROM Employees GROUP BY Department;

23. How do you calculate percentage contribution in SQL?

Use a subquery or window function to calculate total, then divide individual value by total.

Example:

SELECT Department, SUM(Salary)*100.0 / SUM(SUM(Salary)) OVER () AS Percent FROM Employees

GROUP BY Department;
Top 100 Data Analyst Interview Questions with Detailed Answers

24. What are the main types of charts in Power BI?

Power BI supports:

- Bar/Column Charts

- Pie/Donut Charts

- Line Charts

- Area Charts

- Maps

- Tree Maps

- Scatter plots

Example: Use a pie chart to show sales by region or a line chart for trend analysis over time.

25. What is the Query Editor in Power BI used for?

Query Editor is used to clean, transform, and shape data before it's loaded into Power BI. You can filter rows,

change types, merge queries, create custom columns, and more.

Example: Replace missing values with 'Unknown' or remove extra spaces in column names.

26. Explain calculated column vs measure in Power BI.

- Calculated Column: Computed row-by-row and stored in the table.

- Measure: Calculated on aggregation level (real-time) and not stored.

Example: Total = Quantity * Price (column), TotalSales = SUM(Sales[Amount]) (measure).

27. What is the difference between INNER JOIN and LEFT JOIN?

- INNER JOIN: Returns only matching rows from both tables.

- LEFT JOIN: Returns all rows from the left table, and matching rows from the right (NULL if no match).

Example:

SELECT * FROM A LEFT JOIN B ON A.id = B.id;

28. What are some common KPIs for a data analyst?


Top 100 Data Analyst Interview Questions with Detailed Answers

Key Performance Indicators depend on business goals. Common ones include:

- Total Sales / Revenue

- Profit Margin

- Customer Retention Rate

- Website Conversion Rate

- Average Order Value

Example: Monthly Active Users (MAU) for an app.

29. What are the basic steps of the data analysis process?

The steps are:

1. Define the problem

2. Collect the data

3. Clean the data

4. Explore and analyze

5. Interpret results

6. Communicate findings

Example: For declining sales, analyze trends, product lines, and customer feedback.

30. What is correlation in statistics and how do you interpret it?

Correlation measures the strength and direction of a relationship between two variables, ranging from -1 to 1.

- 1: Perfect positive

- 0: No correlation

- -1: Perfect negative

Example: Height and weight typically have a positive correlation (~0.7).


Top 100 Data Analyst Interview Questions with Detailed Answers

1. What is SQL and why is it important for Data Analysts?

SQL (Structured Query Language) is a standard programming language used to manage and query data in

relational databases. It helps data analysts retrieve, manipulate, and analyze structured data stored in

databases.

Example:

SELECT * FROM Orders WHERE Order_Amount > 10000;

2. Difference between WHERE and HAVING clause?

WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use WHERE when

filtering individual records and HAVING with GROUP BY to filter aggregated results.

Example:

WHERE age > 30 filters individuals.

HAVING COUNT(*) > 2 filters groups with more than 2 members.

3. What are JOINS in SQL? Explain types with examples.

Joins combine rows from two or more tables based on a related column.

- INNER JOIN: Returns matching records in both tables.

- LEFT JOIN: All records from left + matching right.

- RIGHT JOIN: All from right + matching left.

- FULL JOIN: All records with matches wherever possible.

Example:

SELECT c.name, o.order_id FROM customers c INNER JOIN orders o ON c.id = o.customer_id;

4. How do you get the top 5 customers by total sales?

Use GROUP BY, SUM() to aggregate sales, ORDER BY to sort, and LIMIT to get top results.

Example:

SELECT customer_id, SUM(order_value) as total_sales FROM orders GROUP BY customer_id ORDER BY

total_sales DESC LIMIT 5;


Top 100 Data Analyst Interview Questions with Detailed Answers

5. Explain the concept of normalization in databases.

Normalization reduces redundancy and ensures data integrity by organizing data into related tables. It

involves breaking large tables into smaller ones and linking them via keys.

- 1NF: Remove duplicate columns.

- 2NF: Remove partial dependencies.

- 3NF: Remove transitive dependencies.

Example: Separate customer and order data into 'Customers' and 'Orders' tables linked by customer_id.

6. What are Pivot Tables in Excel? How do you use them?

Pivot Tables summarize and analyze data interactively. You can group data, compute aggregates (sum, avg),

and generate dynamic reports.

Example: Summarize total sales per region using Region, Sales, and Product columns.

7. What is VLOOKUP and when would you use it?

VLOOKUP searches for a value in the first column of a range and returns a value in the same row from

another column.

Example:

=VLOOKUP(101, A2:C10, 3, FALSE) returns the 3rd column value for ID 101.

8. What is the difference between Power BI Desktop and Power BI Service?

Power BI Desktop is for offline report creation; Power BI Service is a cloud platform for sharing and

collaborating on dashboards.

Example: Create dashboard in Desktop, publish it to Power BI Service for sharing.

9. What is DAX in Power BI?

DAX (Data Analysis Expressions) is a formula language in Power BI for custom calculations and measures.

Example:

TotalSales = SUM(Sales[Amount]) sums all sales.


Top 100 Data Analyst Interview Questions with Detailed Answers

10. How would you handle missing values in Excel?

Techniques include IFERROR, filtering blanks, replacing with average/median, or using interpolation.

Example:

=IF(ISBLANK(A2), "Missing", A2) replaces blank with 'Missing'.

11. What are Pandas and why is it important for data analysts?

Pandas is a Python library that provides data structures and functions for working with structured data. It

makes data cleaning, transformation, and analysis efficient.

Example:

import pandas as pd

df = pd.read_csv("data.csv")

df.groupby("Region")["Sales"].sum()

12. How do you handle missing values in Pandas?

Use isnull(), dropna(), and fillna() functions.

Example:

df['column'].fillna(df['column'].mean(), inplace=True) replaces nulls with mean.

13. Explain the difference between .loc[] and .iloc[].

loc[] is label-based indexing; iloc[] is integer-based.

Example:

df.loc[2, "Name"] accesses by label

df.iloc[2, 1] accesses by position.

14. How do you merge two datasets in Python?

Use pd.merge() for SQL-style joins or concat() for stacking data.

Example:

pd.merge(customers, orders, on="Customer_ID", how="left")


Top 100 Data Analyst Interview Questions with Detailed Answers

15. What is a lambda function in Python?

Lambda functions are anonymous functions defined with lambda keyword.

Example:

df['New'] = df['Sales'].apply(lambda x: x * 1.1) applies 10% increase to all sales.

16. What is the difference between a database and a data warehouse?

A database is used for recording day-to-day transactions (OLTP), while a data warehouse is used for

analytical processing (OLAP). Databases are optimized for CRUD operations, whereas data warehouses are

optimized for read-heavy queries.

Example: A retail POS system uses a database to store real-time sales, while a data warehouse aggregates

monthly sales for reporting.

17. What is data cleaning and why is it important?

Data cleaning is the process of identifying and correcting (or removing) errors and inconsistencies in data. It

improves data quality and ensures accurate analysis.

Example: Removing duplicates, filling missing values, correcting typos in city names like 'Banglore' to

'Bangalore'.

18. What are the different data types in SQL?

Common SQL data types include:

- INT: Integer values

- VARCHAR: Text strings

- DATE: Date values

- FLOAT/DECIMAL: Decimal numbers

- BOOLEAN: True/False

Example:

CREATE TABLE Students (ID INT, Name VARCHAR(50), GPA DECIMAL(3,2));

19. How do you identify and remove duplicate rows in SQL?


Top 100 Data Analyst Interview Questions with Detailed Answers

Use ROW_NUMBER or GROUP BY to identify duplicates. Then filter where row number > 1.

Example:

WITH temp AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS rn FROM

Students) SELECT * FROM temp WHERE rn = 1;

20. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

- COUNT(*) counts all rows including NULLs.

- COUNT(column) counts non-null values.

- COUNT(DISTINCT column) counts unique non-null values.

Example: COUNT(DISTINCT Region) returns the number of unique regions.

21. What are window functions in SQL?

Window functions perform calculations across a set of rows related to the current row without collapsing rows

like GROUP BY.

Example:

SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) as Rank FROM Employees;

22. What is the use of GROUP BY in SQL?

GROUP BY groups rows with the same values in specified columns so aggregate functions like SUM,

COUNT, AVG can be applied.

Example:

SELECT Department, AVG(Salary) FROM Employees GROUP BY Department;

23. How do you calculate percentage contribution in SQL?

Use a subquery or window function to calculate total, then divide individual value by total.

Example:

SELECT Department, SUM(Salary)*100.0 / SUM(SUM(Salary)) OVER () AS Percent FROM Employees

GROUP BY Department;

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy