EXL Data Analyst Interview Questions
EXL Data Analyst Interview Questions
SQL
Question 1: Write a query to retrieve the top 3 revenue-
generating products within each category.
Input Table: Products
MySQL Query:
SELECT
Category,
ProductName,
Revenue,
ProductRank
FROM (
SELECT
Category,
ProductName,
Revenue,
FROM
Products
) AS RankedProducts
WHERE
ProductRank <= 3
ORDER BY
Category, ProductRank;
Output Table:
MySQL Query:
SELECT
ProductName,
Revenue AS TotalProductRevenue
FROM
Products
WHERE
Output Table:
ProductName TotalProductRevenue
Gaming PC 180000
Smartphone 120000
Note: The overall average revenue for the given Products table is approx. 100625.
1 C1 2023-01-10 100
2 C1 2023-01-25 50
3 C2 2023-01-15 200
4 C1 2023-02-05 180
5 C2 2023-02-20 150
6 C3 2023-02-10 300
7 C1 2023-03-01 250
8 C2 2023-03-10 220
9 C3 2023-03-15 280
MySQL Query:
WITH MonthlySpending AS (
SELECT
CustomerID,
SUM(Amount) AS MonthlyTotalSpending
FROM
Transactions
GROUP BY
),
LaggedSpending AS (
SELECT
CustomerID,
TransactionMonth,
MonthlyTotalSpending,
FROM
MonthlySpending
SELECT
CustomerID,
TransactionMonth AS Month,
MonthlyTotalSpending AS CurrentMonthSpending,
PreviousMonthSpending,
CASE
END AS SpendingTrend
FROM
LaggedSpending
WHERE
ORDER BY
CustomerID, TransactionMonth;
Output Table:
2023-
C1 250 180 Increase
03
2023-
C2 220 150 Increase
03
101 U1 2023-01-05 50
103 U1 2023-01-15 75
105 U2 2023-01-22 90
109 U1 2023-02-25 80
MySQL Query:
SELECT
TransactionID,
UserID,
TransactionDate,
Amount,
FROM (
SELECT
TransactionID,
UserID,
TransactionDate,
Amount,
FROM
UserTransactions
) AS RankedTransactions
ORDER BY
UserID, TransactionDate;
Output Table:
103 U1 2023-01-15 75 No No
106 U1 2023-02-01 100 No No
105 U2 2023-01-22 90 No No
1 Alice 10 60000
2 Bob 10 70000
3 Charlie 11 65000
4 David 10 60000
5 Eve 11 65000
6 Frank 12 80000
7 Grace 10 70000
8 Heidi 11 60000
MySQL Query:
SELECT
e1.EmployeeName AS Employee1,
e2.EmployeeName AS Employee2,
e1.ManagerID,
e1.Salary
FROM
Employees e1
JOIN
ORDER BY
Output Table:
PYTHON
Python Code:
def reverse_list_manual(input_list):
"""
Args:
"""
left = 0
right = len(input_list) - 1
left += 1
right -= 1
# Example Usage:
my_list = [1, 2, 3, 4, 5, 6]
reversed_list = reverse_list_manual(my_list)
reversed_string_list = reverse_list_manual(my_string_list)
empty_list = []
reversed_empty_list = reverse_list_manual(empty_list)
single_element_list = [10]
reversed_single_element_list = reverse_list_manual(single_element_list)
Output:
Python Code:
def to_uppercase_manual(input_string):
"""
Args:
Returns:
"""
result_string = ""
else:
result_string += char
return result_string
# Example Usage:
s1 = "abc123xyz"
converted_s1 = to_uppercase_manual(s1)
converted_s2 = to_uppercase_manual(s2)
s3 = ""
converted_s3 = to_uppercase_manual(s3)
Output:
1. Is the value an integer? This is important to avoid errors if values are strings, floats,
etc.
2. Is the integer value even? We can check this using the modulo operator (%). If value
% 2 == 0, it's an even number. If both conditions are true, we print the corresponding
key.
Python Code:
def print_keys_with_even_values(input_dict):
"""
Prints all keys from a dictionary that are associated with even-numbered values.
Args:
"""
found_even_value_key = False
print(key)
found_even_value_key = True
if not found_even_value_key:
# Example Usage:
my_dict1 = {
'apple': 1,
'banana': 2,
'cherry': 3,
'date': 4,
'elderberry': 5,
'fig': 6
print(f"Dictionary 1: {my_dict1}")
print_keys_with_even_values(my_dict1)
print("-" * 30)
my_dict2 = {
'itemA': 100,
'itemB': 201,
'itemC': 300,
'itemD': 403,
'itemE': 500
print(f"Dictionary 2: {my_dict2}")
print_keys_with_even_values(my_dict2)
print("-" * 30)
my_dict3 = {
'name': 'Alice',
'age': 30,
'score': 95,
'year': 2024
print_keys_with_even_values(my_dict3)
print("-" * 30)
my_dict4 = {
'one': 1,
'three': 3,
'five': 'not_a_number'
print_keys_with_even_values(my_dict4)
print("-" * 30)
empty_dict = {}
print_keys_with_even_values(empty_dict)
print("-" * 30)
Output:
banana
date
fig
------------------------------
Dictionary 2: {'itemA': 100, 'itemB': 201, 'itemC': 300, 'itemD': 403, 'itemE': 500}
itemA
itemC
itemE
------------------------------
Dictionary 3 (mixed types): {'name': 'Alice', 'age': 30, 'score': 95, 'year': 2024}
age
year
------------------------------
------------------------------
Empty dictionary: {}
------------------------------
Question 4: Define a function to check if two strings are
anagrams of each other.
Approach: Two strings are anagrams if they contain the same characters with the same
frequencies, regardless of order. A robust way to check this is to:
1. Normalize Strings: Convert both strings to lowercase and remove any non-
alphabetic characters (like spaces, punctuation) to ensure a fair comparison.
3. Compare Frequencies: If the two frequency dictionaries are identical, then the
strings are anagrams.
Python Code:
"""
Args:
Returns:
"""
# Normalize strings: convert to lowercase and filter out non-alphabetic characters
def normalize_string(s):
normalized_str1 = normalize_string(str1)
normalized_str2 = normalize_string(str2)
if len(normalized_str1) != len(normalized_str2):
return False
# Option 1: Sort the normalized strings and compare (simpler, but potentially less
efficient for very long strings)
# Option 2: Use frequency counters (more explicit for "same characters with same
frequencies")
# Example Usage:
Output:
• If the character is not yet a key, we add it to the dictionary with a value (count) of 1.
Python's collections.Counter provides a highly optimized way to do this, but the question
implies a manual implementation.
Python Code:
def build_frequency_dictionary(input_string):
"""
Args:
Returns:
A dictionary where keys are characters and values are their frequencies.
"""
frequency_dict = {}
if char in frequency_dict:
frequency_dict[char] += 1
else:
frequency_dict[char] = 1
return frequency_dict
# Example Usage:
s1 = "hello world"
freq_dict1 = build_frequency_dictionary(s1)
print(f"String: '{s1}'")
print("-" * 30)
s2 = "programming"
freq_dict2 = build_frequency_dictionary(s2)
print(f"String: '{s2}'")
print("-" * 30)
s3 = "Python is fun!"
freq_dict3 = build_frequency_dictionary(s3)
print(f"String: '{s3}'")
print("-" * 30)
s4 = ""
freq_dict4 = build_frequency_dictionary(s4)
print(f"String: '{s4}'")
print("-" * 30)
s5 = "aaaaabbbccc"
freq_dict5 = build_frequency_dictionary(s5)
print(f"String: '{s5}'")
print("-" * 30)
Output:
Frequency Dictionary: {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}
------------------------------
String: 'programming'
Frequency Dictionary: {'p': 1, 'r': 3, 'o': 1, 'g': 2, 'a': 1, 'm': 2, 'i': 1, 'n': 1}
------------------------------
Frequency Dictionary: {'P': 1, 'y': 1, 't': 1, 'h': 1, 'o': 1, 'n': 2, ' ': 2, 'i': 1, 's': 1, 'f': 1, 'u': 1, '!': 1}
------------------------------
String: ''
Frequency Dictionary: {}
------------------------------
String: 'aaaaabbbccc'
MS EXCEL
Question 1: Explain the difference between COUNT, COUNTA,
and COUNTBLANK with practical examples.
These three functions are used to count cells in a range, but they differ in what types of
cells they count.
• COUNT: Counts the number of cells in a range that contain numbers. It ignores
blank cells, text, error values, and boolean values.
• COUNTA: Counts the number of cells in a range that are not empty. It counts
numbers, text, logical values (TRUE/FALSE), and error values. It only ignores truly
empty cells.
Practical Example:
A1 100
A2 Hello
A3
A4 50
A5 TRUE
A6 #N/A!
A7 (empty cell)
• COUNT:
o Formula: =COUNT(A1:A7)
• COUNTA:
o Formula: =COUNTA(A1:A7)
• COUNTBLANK:
o Formula: =COUNTBLANK(A1:A7)
o Output: 2 (counts A3, which contains an empty string, and A7, which is truly
empty)
By nesting MATCH inside INDEX, MATCH determines the row (or column) number, and
INDEX then retrieves the value from that position in a specified column (or row).
Practical Example:
101 Alice HR
103 Charlie IT
You want to find the EmployeeName (column B) given an EmployeeID (column A). Here,
EmployeeName is to the right of EmployeeID, but if you wanted to find EmployeeID given
EmployeeName, it would be a left lookup. Let's demonstrate finding EmployeeName
(Column B) for a given EmployeeID (Column A) which is a simple INDEX+MATCH scenario,
and then how to do a "left" lookup for EmployeeID based on Department.
Formula:
• Explanation:
o MATCH(E1, A:A, 0): Looks for 103 in column A and returns its position (e.g., 4
if 103 is in A4).
o INDEX(B:B, 4): Retrieves the value from the 4th row of column B.
• Output: Charlie
Formula:
• Explanation:
o MATCH(E2, C:C, 0): Looks for IT in column C and returns its position (e.g., 4 if
IT is in C4).
o INDEX(A:A, 4): Retrieves the value from the 4th row of column A.
• Output: 103
This clearly demonstrates INDEX and MATCH can retrieve values from any column within
the INDEX range, based on a lookup in any other column within the MATCH range,
regardless of their relative positions.
• IFERROR(value, value_if_error):
o value: The formula or expression that you want to check for an error.
Practical Example:
Let's combine IFERROR with VLOOKUP. Assume you have a list of Product IDs and Prices,
and you want to look up a product. If the product ID is not found, VLOOKUP would return
#N/A!. We can use IFERROR to display a more user-friendly message.
ProductID Price
P001 150
P002 200
P003 120
Scenario:
• In cell D1, you enter P002. In cell E1, you want to find its price.
• In cell D2, you enter P005 (which does not exist). In cell E2, you want to find its price.
o Output: 200
o Explanation: VLOOKUP(D2, A1:B4, 2, FALSE) tries to find 'P005', but it's not
in the list, so it returns #N/A!. IFERROR detects this error and returns the
value_if_error which is "Product Not Found".
IFERROR makes your spreadsheets more robust and user-friendly by preventing unsightly
error messages from appearing.
Question 4: Create a formula to highlight duplicate values
excluding their first instance.
This task is typically done using Excel's Conditional Formatting with a custom formula. The
idea is to apply a format (like a fill color) to cells that are duplicates but are not the first
occurrence of that value in the list.
Approach: We will use the COUNTIF function combined with a specific range reference
that expands as the formula is applied down the column.
• COUNTIF(range, criteria): Counts the number of cells within a range that meet the
specified criteria.
Practical Example:
Cell Content
A1 Apple
A2 Banana
A3 Apple
A4 Cherry
A5 Banana
A6 Apple
1. Select the range you want to apply the formatting to (e.g., A1:A6).
4. Enter the following formula in the "Format values where this formula is true:" box:
5. =COUNTIF($A$1:A1,A1)>1
▪ A1: The ending point is relative to the current cell in the selection.
When Excel applies this formula to A2, it becomes
COUNTIF($A$1:A2,A2). For A3, it's COUNTIF($A$1:A3,A3), and so on.
o Explanation:
6. Click the Format... button to choose your desired formatting (e.g., a light red fill).
7. Click OK twice.
Output:
Cells A3, A5, and A6 would be highlighted, as they are duplicate occurrences of 'Apple' and
'Banana' after their first instance.
Practical Example:
South Mar 80
Formula:
• Explanation:
o C:C: This is the sum_range. Excel will sum values from column C.
o "West": This is criteria1. Excel will only consider rows where the value in
column A is "West".
o "Jan": This is criteria2. Excel will only consider rows where the value in
column B is "Jan".
The function will look for rows where BOTH Region is "West" AND Month is "Jan", and then
sum the corresponding "Sales" values.
Output:
• In the example data, the rows that match both criteria are:
• Output: 270
Power bi
Calculated Column:
• Definition: A new column added to an existing table in your data model. Its values
are computed row by row based on a DAX expression.
• Calculation Time: Values are calculated and stored in the data model at the time of
data refresh.
• Storage: Consumes memory and increases the size of your data model, as the
calculated values are physically stored.
• Context: Calculated in row context. This means the calculation for each row is
independent of filters applied to the report; it only considers the values in that
specific row.
• Usage: Can be used like any other column in your tables – for filtering, slicing,
grouping, or as part of other calculations. They are suitable for segmenting data or
adding new attributes to rows.
o OrderYear = YEAR([OrderDate])
Measure:
• Definition: A dynamic calculation that aggregates data based on the current filter
context in a report. Measures are not stored physically in the data model.
• Calculation Time: Values are computed on-the-fly when they are used in a visual
(e.g., a table, chart, or card) and respond to interactions (like slicers or cross-
filtering).
• Storage: Does not consume significant memory, as only the DAX expression is
stored, not the results of the calculation.
• Context: Primarily evaluated in filter context. This means the calculation considers
all filters applied to the visual, page, or report (e.g., selections from slicers,
interactions with other visuals).
• Usage: Used for aggregation and performing calculations across multiple rows,
such as sums, averages, counts, ratios, or complex business logic. They are the
backbone of analytical reports.
• Example Use Case: Calculating Total Sales, Average Order Value, or Total Profit.
• Use a Calculated Column when you need a new column that adds characteristics
to individual rows, which can then be used for filtering, grouping, or as input for
other calculations.
• Use a Measure when you need to perform aggregations or calculations that change
based on user interactions, filters, or the visual's context. Measures are for
analytical results, not for adding new dimensions to your data.
The recommended and most robust way to manage many-to-many relationships in Power
BI is by using a bridge table (also known as a "junction table" or "factless fact table").
1. Identify the two "many" tables: These are the tables that have a many-to-many
relationship (e.g., Students and Courses).
2. Create a Bridge Table: This table acts as an intermediary. It typically contains only
the unique keys from the two "many" tables, representing each unique combination
that connects them.
o Example: For Students and Courses, the bridge table would be Enrollment,
containing StudentID and CourseID as foreign keys.
4. Use the Bridge Table for Filtering and Aggregation: When you want to filter one
"many" table based on values from the other, or to aggregate data that spans both,
the filter context flows through the bridge table.
• Table 1: Students
o StudentID (PK)
o StudentName
• Table 2: Courses
o CourseID (PK)
o CourseName
Why this works: When you select a StudentName from the Students table, the filter
propagates to the Enrollment table, showing only the courses that student is enrolled in.
Similarly, when you select a CourseName from the Courses table, the filter propagates to
the Enrollment table, showing only the students enrolled in that course. Measures can then
correctly aggregate data based on these filters without ambiguity.
This bridge table pattern ensures that relationships are explicit and filter propagation is
clear, leading to accurate calculations and analyses.
Question 3: Write a DAX measure to compute the cumulative sales
(running total) per customer.
To calculate the cumulative sales (running total) per customer, we need to sum sales up to
the current date for each customer independently. This involves using CALCULATE to
modify the filter context and FILTER with ALL to consider all dates up to the current one for
each customer.
Assume you have a Sales table with [CustomerKey], [OrderDate], and [SalesAmount], and
a DimDate table marked as a date table, connected to Sales[OrderDate].
DAX Measure:
CALCULATE (
ALL ( DimDate[Date] ), -- a. Clear all date filters from the date table
),
Sales,
When you place CustomerKey and OrderDate (e.g., by month) in a table visual along with
this measure, for each customer and each month, the measure will calculate the sum of
sales for that customer from their first sale up to the end of that specific month.
2. Propagation: This filter context is then propagated throughout the data model,
influencing all visuals on the report page that are connected to the slicer's
underlying data.
1. Add a Slicer Visual: From the Visualizations pane, select the "Slicer" icon.
2. Drag a Field to the Slicer: Drag the field you want to filter by (e.g., Region, Year,
Product Category) from the Fields pane to the "Field" well of the slicer visual.
o Selection:
When you have multiple slicers on a page, they interact with each other and with other
visuals. Power BI allows you to control these interactions:
1. Default Behavior: By default, all slicers on a page will cross-filter each other and all
other visuals on that page.
2. Edit Interactions:
▪ Filter Icon: (Default) The selected slicer will filter the other visual.
▪ Highlight Icon: The selected slicer will highlight (rather than filter)
data in the other visual (e.g., showing a subset of bars in a bar chart
without removing others).
▪ None Icon: The selected slicer will have no effect on the other visual.
o To apply a slicer's selection across multiple report pages, use the Sync
Slicers pane (View tab > Show panes > Sync slicers).
o When you select a slicer, the Sync Slicers pane shows which pages it's visible
on and which pages it's synced to.
▪ Sync its selections across specific pages (meaning if you filter on Page
1, the same filter is applied to Page 2).
• Placement: Place slicers prominently where users can easily find and interact with
them (e.g., top, left sidebar).
• Hierarchy Slicers: Use hierarchical slicers (e.g., Year > Quarter > Month) for drill-
down filtering.
• Clear Labels: Ensure slicer headers are clear and indicate what they filter.
Assume you have a Sales table with a [SalesAmount] column and a DimDate table marked
as a date table, connected to your Sales table on a date column.
This measure will show the sales for the latest month selected (or the current month if no
filter is applied).
CALCULATE (
LASTNONBLANK ( -- Find the last non-blank date in the current filter context
DimDate[Date],
),
)
Refined Current Month Sales (more robust if filtering to single month):
RETURN
CALCULATE(
[Total Sales],
FILTER(
ALL(DimDate[Date]),
This measure calculates the sales for the month immediately preceding the current month.
CALCULATE (
DimDate[Date],
[Total Sales]
),
Alternative Previous Month Sales (if Current Month Sales uses VAR MaxDateSelected):
RETURN
CALCULATE(
[Total Sales],
FILTER(
ALL(DimDate[Date]),
(The DATEADD version is generally more robust for time intelligence functions).
1. Add a KPI Visual: From the Visualizations pane, select the "KPI" icon.
2. Assign Fields:
o Indicator: Drag the Current Month Sales measure to the "Indicator" field
well.
o Trend axis: Drag the Date hierarchy (e.g., Year, Month) from your DimDate
table to the "Trend axis" field well. This allows the visual to show the trend of
sales over time.
o Target goals: Drag the Previous Month Sales measure to the "Target goals"
field well.
3. Format the KPI Visual (Format Pane):
o Goal: Enable "Show goals" and customize goal labels and colors.
o Target Goal Formating: Choose if the goal is higher or lower for "good"
performance.
o Display Units: Format the display units for sales amounts (e.g., 'K' for
thousands).
By following these steps, your KPI visual will effectively compare the current month's sales
against the previous month's sales, providing a clear visual representation of performance.