0% found this document useful (0 votes)
18 views5 pages

Top 50 Industry-Relevant Data Analyst Interview Q - A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Top 50 Industry-Relevant Data Analyst Interview Q - A

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SQL (15 Questions)

1. Explain different types of JOINs and when to use each.

o INNER JOIN: Returns records with matching values in both tables. Use when you need only
common records.

o LEFT JOIN: Returns all records from the left table and matched records from the right. Use when
you want all records from the left.

o RIGHT JOIN: Opposite of LEFT JOIN.

o FULL OUTER JOIN: Returns all records when there's a match in either table.

o CROSS JOIN: Returns Cartesian product. Use carefully.

2. What is the difference between WHERE and HAVING clauses?

o WHERE filters rows before aggregation; HAVING filters after aggregation.

3. How would you find the second highest salary from an Employee table?

SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);

4. What are Window Functions? Give practical examples.

o Used to perform calculations across rows related to the current row. Example: RANK() OVER
(PARTITION BY dept ORDER BY salary DESC)

5. How do you optimize slow-running SQL queries?

o Use indexes, avoid SELECT *, limit subqueries, use EXPLAIN PLAN, and check joins and filters.

6. What’s the use of CTEs over subqueries?

o CTEs improve readability, can be recursive, and can be reused within the same query.

7. How can you detect duplicate records in a large dataset?

SELECT col1, COUNT(*) FROM table GROUP BY col1 HAVING COUNT(*) > 1;

8. How do you handle NULLs in SQL joins and aggregations?

o Use IS NULL, COALESCE(), or IFNULL() to manage null values.

9. Explain RANK(), DENSE_RANK(), and ROW_NUMBER().

o ROW_NUMBER() gives unique rank.

o RANK() skips ranking after ties.

o DENSE_RANK() doesn’t skip ranks after ties.

10. What is normalization? What are 1NF, 2NF, and 3NF?

• Normalization organizes data to reduce redundancy.

• 1NF: Atomic columns.

• 2NF: No partial dependency.

• 3NF: No transitive dependency.


11. How would you calculate a rolling average over a 7-day period?

• SELECT date, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
FROM sales_data;

12. Explain ACID properties in databases.

• Atomicity, Consistency, Isolation, Durability — ensure reliable transactions.

13. How do you track changes in a table?

• Use triggers, audit tables, or change tracking/versioning columns (created_at, updated_at).

14. Difference between clustered and non-clustered index?

• Clustered sorts table data physically. One per table.

• Non-clustered creates a separate index structure.

15. How would you detect and analyze outliers in transactional data?

• Use statistical functions like AVG, STDDEV, or create IQR filters using subqueries.
Python (12 Questions)
1. How do you handle large datasets in Python?

o Use Dask, PySpark, chunking in pandas.read_csv(), or optimized data types.

2. Difference between apply(), map(), and applymap()?

o map(): Series only.

o apply(): Series/DataFrame.

o applymap(): Element-wise function across DataFrame.

3. Connect to SQL database and run a query?

import pandas as pd

import sqlalchemy

engine = sqlalchemy.create_engine('db_string')

df = pd.read_sql('SELECT * FROM table', engine)

4. Strategies for data cleaning?

o Drop/Impute nulls, fix data types, remove duplicates, outlier detection, encoding.

5. Detect and treat missing values?

o Use df.isnull().sum(), fillna(), dropna(), or imputation strategies.

6. Role of NumPy?

o Efficient array operations, broadcasting, and base for pandas.

7. Handle categorical variables?

o Use pd.get_dummies(), LabelEncoder, OneHotEncoder.

8. loc[] vs iloc[]?

o loc[] for label-based access, iloc[] for index-based.

9. Automate data cleaning pipeline?

o Define functions → use apply() or pipeline() → automate with Airflow or Luigi.

10. Calculate correlation function?

• def calc_corr(df):

return df.corr()

11. Parse nested JSON file?

• Use json module and pd.json_normalize().

12. Write unit tests for pipelines?

• Use unittest, pytest, and mock I/O or small test data sets.
Excel (13 Questions)
1. Use Pivot Tables?

o Summarize, filter, and analyze data interactively with rows, columns, and values.

2. VLOOKUP vs INDEX-MATCH?

o INDEX-MATCH is more flexible and faster with large datasets.

3. Dynamic Named Ranges?

o Use OFFSET() + COUNTA() in Name Manager.

4. Automate reports with VBA?

o Write macros to update filters, charts, pivot tables on click.

5. Use of INDIRECT()?

o Refers to a cell by text/string reference.

6. Create dashboards?

o Combine PivotCharts, slicers, KPIs, and conditional formatting.

7. Array Formulas?

o Perform multi-cell calculations like {=SUM(A1:A5*B1:B5)}.

8. Clean data in Excel?

o Use TRIM(), CLEAN(), TEXT TO COLUMNS, and Remove Duplicates.

9. SUMIFS vs SUMPRODUCT?

o SUMIFS: conditional summing. SUMPRODUCT: flexible, array-based math.

10. Sales trend analysis?

• Use pivot tables, slicers, trendlines, and moving average charts.

11. Outlier detection?

• Use Z-Score, conditional formatting, box plots.

12. Performance comparison YoY?

• Create YoY % change = (Current - Last Year)/Last Year in pivot.

13. Live data feeds?

• Use Data > Get External Data > From Web/SQL Server/API.
Power BI (10 Questions)
1. Calculated columns vs measures?

o Columns calculate row-by-row; Measures calculate on aggregates (context-dependent).

2. Optimize performance?

o Use star schema, reduce columns, optimize DAX, disable auto date/time.

3. Relationships & many-to-many?

o Use bridge tables or composite models to handle many-to-many joins.

4. Slicers and filters?

o Use slicers for visual interaction, filters for scoped control.

5. Power BI Desktop vs Service?

o Desktop for report creation, Service for sharing/collaboration.

6. Row-level security?

o Define roles in Modeling tab and assign DAX filters to limit data access.

7. KPI dashboard creation?

o Use card visuals, conditional formatting, KPI visuals, and alerts.

8. Time intelligence calculations?

o Use DAX like TOTALYTD(), SAMEPERIODLASTYEAR().

9. Power Query usage?

o Used for ETL steps like cleaning, merging, transforming datasets before modeling.

10. Version control and collaboration?

o Use Power BI Service with workspaces, or Git with .pbix files for versioning.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy