0% found this document useful (0 votes)
33 views16 pages

American Express Data Analyst DSA Interview Questions

The document outlines a series of interview questions and solutions for a Data Analyst position at American Express, focusing on data structures and algorithms. Key topics include finding the longest subarray with equal numbers of 0s and 1s, longest increasing subsequence, longest consecutive sequence, longest common subsequence, and various graph algorithms. Each problem is accompanied by a solution outline, Python code examples, and time and space complexity analyses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views16 pages

American Express Data Analyst DSA Interview Questions

The document outlines a series of interview questions and solutions for a Data Analyst position at American Express, focusing on data structures and algorithms. Key topics include finding the longest subarray with equal numbers of 0s and 1s, longest increasing subsequence, longest consecutive sequence, longest common subsequence, and various graph algorithms. Each problem is accompanied by a solution outline, Python code examples, and time and space complexity analyses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

American Express Data Analyst

Interview Questions
(0-3 Years)
17-19 lpa
DSA Questions
1. Longest Subarray with Equal Number of 0s and 1s (Array /
Prefix Sum + Hash Map)
Problem:
Given a binary array (containing only 0s and 1s), find the length of the longest contiguous subarray
with an equal number of 0s and 1s.
Solution:
1. Transform the array: treat 0 as -1, and 1 remains as +1.
2. Compute prefix sums, and store each sum's first occurrence index in a hash map.
3. As you scan:
o If the same prefix sum reappears, the subarray between indices has sum zero →
equal 0s and 1s.
o Length = current_index − first_occurrence_index.
4. Answer = maximum such length found.

def findMaxLength(nums):
prefix_sum = 0
first_idx = {0: -1} # sum 0 before starting
max_len = 0
for i, x in enumerate(nums):
prefix_sum += (1 if x == 1 else -1)
if prefix_sum in first_idx:
max_len = max(max_len, i - first_idx[prefix_sum])
else:
first_idx[prefix_sum] = i
return max_len
• Time Complexity: O(n)
• Space Complexity: O(n)
2. Longest Increasing Subsequence (LIS) (Array /
Dynamic Programming + Binary Search)
Problem:
Given an integer array nums, return the length of the longest strictly increasing subsequence.

Solution Outline:
1. Dynamic Programming (O(n²))
o Define dp[i] = length of the longest increasing subsequence ending at index i.
o Build by comparing each nums[i] with all previous nums[j] where j < i and nums[j] <
nums[i]:

dp[i] = max(dp[i], dp[j] + 1)

oAnswer is max(dp)
2. Optimized with Binary Search (O(n log n))
o Maintain an array tails[], where tails[k] = smallest possible tail value of an increasing
subsequence of length k+1.
o For each x in nums:
▪ Use binary search to find the position in tails to replace or append x.
o Length of tails at end = length of LIS

Python Code Example (Binary Search approach):


import bisect

def lengthOfLIS(nums):
tails = []
for x in nums:
pos = bisect.bisect_left(tails, x)
if pos == len(tails):
tails.append(x)
else:
tails[pos] = x
return len(tails)

• Time Complexity:
o DP approach: O(n²)
o Optimized: O(n log n)
• Space Complexity: O(n)
3. Longest Consecutive Sequence (Array / Hashing)
Problem:
Given an unsorted integer array nums, find the length of the longest consecutive elements
sequence. The solution must run in O(n) time.

Solution Outline:
1. Use a HashSet to store all numbers for O(1) presence checks.
2. For each number num:
o If num - 1 is not in the set (indicating num is the start of a sequence):
▪ Initialize current_num = num and current_streak = 1.
▪ While current_num + 1 exists in the set:
▪ Increment current_num and current_streak.
o Update max_streak = max(max_streak, current_streak).
3. Return max_streak.

This ensures each number is checked at most twice (when it's sequence start and during
counting), giving O(n) time and O(n) space.

def longestConsecutive(nums):
num_set = set(nums)
max_streak = 0

for num in nums:


# start of a sequence
if num - 1 not in num_set:
current_num = num
current_streak = 1

while current_num + 1 in num_set:


current_num += 1
current_streak += 1

max_streak = max(max_streak, current_streak)

return max_streak

Time Complexity: O(n)

Space Complexity: O(n)


4. Longest Common Subsequence (LCS) (Strings / Dynamic
Programming)
Problem:
Given two strings text1 and text2, return the length of their longest common subsequence
(not necessarily contiguous). If no common subsequence exists, return 0.

Solution Outline:
1. Dynamic Programming Table
o Let dp[i][j] represent the length of the LCS between text1[0..i-1] and text2[0..j-
1].
o Initialize a (n+1)×(m+1) table filled with zeros.
2. Recurrence Relation
o If text1[i-1] == text2[j-1], then:

dp[i][j] = dp[i-1][j-1] + 1
o Otherwise:

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

3. Result
o The desired length is dp[n][m], where n = len(text1), m = len(text2)
o This fills the table in O(n⋅m) time and uses O(n⋅m) space.

def longestCommonSubsequence(text1, text2):


n, m = len(text1), len(text2)
dp = [[0] * (m+1) for _ in range(n+1)]

for i in range(1, n+1):


for j in range(1, m+1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])

return dp[n][m]
Time Complexity: O(n × m)

Space Complexity: O(n × m)

You can optimize space to O(min(n, m)) by keeping only two rows at a time
5. 3Sum (Array / Two Pointers)
Problem:
Given an integer array nums, find all unique triplets (a, b, c) such that a + b + c = 0. Return the
list of triplets, without duplicates.

Solution Outline:
1. Sort the array to make it easy to avoid duplicates and use two-pointer.
2. Iterate i from 0 to n−3:
o Skip duplicates for nums[i].
o Set two pointers: left = i + 1, right = n − 1.
o While left < right:
▪ Compute s = nums[i] + nums[left] + nums[right].
▪ If s == 0, record the triplet, move both pointers, skipping duplicates.
▪ If s < 0, move left to the right.
▪ If s > 0, move right to the left.
3. Continue until pointers meet.
This gives O(n²) time and O(log n) or O(n) extra space depending on sort implementation.

def threeSum(nums):
nums.sort()
res = []
n = len(nums)
for i in range(n - 2):
if i > 0 and nums[i] == nums[i-1]:
continue
left, right = i + 1, n - 1
while left < right:
s = nums[i] + nums[left] + nums[right]
if s == 0:
res.append([nums[i], nums[left], nums[right]])
left += 1
right -= 1
while left < right and nums[left] == nums[left-1]:
left += 1
while left < right and nums[right] == nums[right+1]:
right -= 1
elif s < 0:
left += 1
else:
right -= 1
return res

Time Complexity: O(n²)


Space Complexity: O(1) extra + output list
6. Detect a Cycle in a Directed Graph (Graph / DFS +
Recursion Stack)
Problem:

Solution Outline:
Use Depth-First Search (DFS) with a recursion stack (path_visited) to detect back edges:
1. Build adjacency list from edge list.
2. Initialize two boolean arrays of size V:
o visited[] — tracks globally visited nodes.
o path_visited[] — tracks nodes in the current DFS path.
3. For each vertex i from 0 to V-1:
o If not visited, call dfs(i).
4. dfs(node):
o Mark visited[node] = True and path_visited[node] = True.
o For each neighbor nbr:
▪ If nbr is unvisited, recursively call dfs(nbr); if True, return True.
▪ Else if path_visited[nbr] is True, we've found a back edge, so return True.
o On exit, set path_visited[node] = False and return False.
5. If any DFS returns True, a cycle exists; otherwise, no cycle.
This runs in O(V + E) time and O(V + E) space (for adjacency list and recursion tracking)

def hasCycle(V, edges):


adj = [[] for _ in range(V)]
for u, v in edges:
adj[u].append(v)

visited = [False] * V
path_vis = [False] * V

def dfs(u):
visited[u] = True
path_vis[u] = True
for v in adj[u]:
if not visited[v]:
if dfs(v):
return True
elif path_vis[v]:
return True
path_vis[u] = False
return False

for i in range(V):
if not visited[i]:
if dfs(i):
return True
• return False
Time Complexity: O(V + E)
• Space Complexity: O(V + E) for adjacency list + O(V) recursion stack.
7. Longest Palindromic Substring (Strings / Expand
Around Center or DP)

Problem:
Given a string s, find the longest substring that reads the same forwards and
backwards.

Solution Outline (Expand Around Center – O(n²) time, O(1) space):


1. For each index i in the string, expand around:
o One center: both left and right pointers start at i (odd-length palindromes).
o Two centers: left = i, right = i+1 (even-length palindromes).
2. Define a helper to expand and return the longest palindrome around the given left/right.
3. Update the global maximum substring when a longer palindrome is found

def longestPalindrome(s):
if not s: return ""
start, end = 0, 0

def expand(l, r):


while l >= 0 and r < len(s) and s[l] == s[r]:
l -= 1
r += 1
return l + 1, r - 1

for i in range(len(s)):
l1, r1 = expand(i, i)
l2, r2 = expand(i, i + 1)
if r1 - l1 > end - start:
start, end = l1, r1
if r2 - l2 > end - start:
start, end = l2, r2

return s[start:end + 1]

Time Complexity: O(n²) (due to nested expansion)


Space Complexity: O(1)
This method is often recommended in interview prep guides as the easiest performant solution
8. Single-Source Shortest Paths: Dijkstra’s Algorithm (Graph
/ Greedy + Min-Heap)Assume table sales_data has:
Problem:
Given a weighted graph with non-negative edge weights and a source node src, find the shortest
path (minimum total weight) from src to all other vertices.

Solution Outline:
1. Initialization:
o Set dist[v] = ∞ for all vertices, except dist[src] = 0.
o Create a min-heap (priority queue) and push (0, src).
2. Main Loop (Greedy Relaxation):
o While the heap is not empty:
1. Pop (d, u) — the vertex u with the minimum tentative distance.
2. If d > dist[u], skip (it's an outdated entry).
3. For each neighbor v of u with edge weight w:
▪ If dist[u] + w < dist[v]:
▪ Update dist[v] = dist[u] + w
▪ Push (dist[v], v) into the heap.
o Continue until all reachable nodes are processed.
3. Result:
o dist[v] holds the shortest path weight from src to each vertex v.
This runs in O((V + E) log V) time using a binary heap (or O(E + V log V) with Fibonacci heap)

import heapq

def dijkstra(adj, src):


# adj: dict mapping u -> list of (v, w)
dist = {u: float('inf') for u in adj}
dist[src] = 0
pq = [(0, src)]

while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue
for v, w in adj[u]:
nd = d + w
if nd < dist.get(v, float('inf')):
dist[v] = nd
heapq.heappush(pq, (nd, v))
return dist

• Time Complexity: O(E log V)


• Space Complexity: O(V + E) for adjacency list and heap
9. Implement Trie (Prefix Tree) (Trie / N-ary Tree)

Problem:
Design a Trie with the following operations:
• insert(word): Inserts the string word.
• search(word): Returns true if word is in the Trie.
• startsWith(prefix): Returns true if any word in the Trie starts with prefix.
Solution Outline:
1. Trie Node Structure:
o Each node has:
▪ children: A map (or array of size 26) pointing to next chars.
▪ isWord: A boolean flag indicating end of a valid word.
2. Insert Operation (O(L)):
o Start at the root.
o For each character c in word:
▪ If c not in current node’s children → create a new node.
▪ Move to the child node.
o After the last char, mark isWord = True.
3. Search Word (O(L)):
o Traverse nodes following each character.
o If a character isn't found → return false.
o At end, check isWord flag.
4. Search Prefix (O(L)):
o Similar traversal without checking isWord.
o If traversal completes, return true.

class TrieNode:
def __init__(self):
self.children = {}
self.isWord = False

class Trie:
def __init__(self):
self.root = TrieNode()

def insert(self, word):


node = self.root
for c in word:
if c not in node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.isWord = True

def search(self, word):


node = self.root
for c in word:
if c not in node.children:
return False
node = node.children[c]
return node.isWord

def startsWith(self, prefix):


node = self.root
for c in prefix:
if c not in node.children:
return False
node = node.children[c]
return True
• Time Complexity: O(L) per operation, where L = length of input string
• Space Complexity: O(N × L) in worst case, N = number of inserted words
10. Container With Most Water (Array / Two-
Pointer)
CASE lets you write conditional logic in SQL (similar to IF/ELSE).
SELECT name, salary,

Problem:
Given n non-negative integers height[0..n−1], where each value represents a vertical line
at coordinate i with height = height[i], find two lines that, together with the x-axis, form a
container such that the container holds the most water. Return the maximum area.

Solution Outline:
1. Initialize two pointers: left = 0 and right = n − 1.
2. Loop while left < right:
o Compute area = (right − left) * min(height[left], height[right]), update max_area.
o Move the pointer with the shorter line inward, since moving the taller one cannot
increase area (width shrinks but height limited by shorter line).
3. Continue until pointers meet. Return max_area.
This gives O(n) time and O(1) space.

def maxArea(height):
left, right = 0, len(height) - 1
max_area = 0
while left < right:
width = right - left
h = min(height[left], height[right])
max_area = max(max_area, width * h)
if height[left] < height[right]:
left += 1
else:
right -= 1
return max_area

Time Complexity: O(n)


Space Complexity: O(1)

11. Find the First and Last Position of Element in Sorted Array
(Array / Binary Search)
Problem:
Given a sorted array nums and a target value target, return the starting and ending index of the
target in the array. If the target is not found, return [-1, -1]. Your solution must run in O(log n)
time.

Solution Outline:
Use binary search twice:
1. First Occurrence:
o Customize binary search to look for the first index where nums[mid] == target.
o If found, continue searching left (setting high = mid - 1) to find earlier occurrences.
2. Last Occurrence:
o Similar, but search right (setting low = mid + 1) when you find target to locate the last
index.
Since each search runs in O(log n), the total time is O(log n).
• Binary Search Details: Achieves O(log n) lookups on sorted arrays thanks to halving the
search range each step

def searchRange(nums, target):


def findBound(isFirst):
left, right = 0, len(nums) - 1
bound = -1
while left <= right:
mid = (left + right) // 2
if nums[mid] == target:
bound = mid
if isFirst:
right = mid - 1
else:
left = mid + 1
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return bound

first = findBound(True)
if first == -1:
return [-1, -1]
last = findBound(False)
return [first, last]

Time Complexity: O(log n)


Space Complexity: O(1)
12. Longest Substring Without Repeating Characters
(Strings / Sliding Window)

Problem:
Given a string s, return the length of the longest substring that contains no repeated
characters.

Solution Outline (Optimal O(n) time, O(min(n, m)) space):


1. Use a sliding-window with two pointers, left and right, to maintain the current
substring window.
2. Create a dictionary/hash map (last_seen) to store the last index of each character
encountered.
3. Move right over the string:
o If s[right] was seen and its last index ≥ left, slide left to last_seen[s[right]] + 1
to skip the repeating character.
o Update last_seen[s[right]] = right.
o Update max_len = max(max_len, right - left + 1).
4. At the end, max_len holds the length of the longest non-repeating-character substring.
This runs in O(n) time and uses O(min(n, m)) space, where m is the character set size

def lengthOfLongestSubstring(s):
last_seen = {}
left = 0
max_len = 0

for right, c in enumerate(s):


if c in last_seen and last_seen[c] >= left:
left = last_seen[c] + 1
last_seen[c] = right
max_len = max(max_len, right - left + 1)

return max_len

Time Complexity: O(n)


Space Complexity: O(min(n, m)) — for the hash map storing character indices

13. Median of Two Sorted Arrays (Array / Binary


Search + Divide and Conquer)
Problem:
Given two sorted arrays nums1 and nums2 of sizes m and n, return the median of the combined
array in O(log (m+n)) time.

Solution Outline (Optimal O(log(min(m,n))) time, O(1) extra space):


1. Ensure nums1 is the smaller array (m ≤ n).
2. Use binary search on nums1 (range [0, m]), choosing a partition i. Let j = (m + n + 1) / 2 -
i.
3. Define:
o maxLeft1 = (i == 0) ? -∞ : nums1[i-1]
o minRight1 = (i == m) ? +∞ : nums1[i]
o Similarly for nums2 with j.
4. Check partitions:
o If maxLeft1 ≤ minRight2 and maxLeft2 ≤ minRight1, we've partitioned correctly:
▪ If (m + n) is odd, median = max(maxLeft1, maxLeft2)
▪ If even, median = (max(maxLeft1, maxLeft2) + min(minRight1, minRight2)) / 2
o Else:
▪ If maxLeft1 > minRight2, reduce i (shortcut right in nums1).
▪ Otherwise, increase i.
5. Return the median once valid partition is found.
This method efficiently finds the median without merging arrays, achieving the required
logarithmic runtime

def findMedianSortedArrays(nums1, nums2):


# Ensure nums1 is smaller
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
low, high = 0, m

while True:
i = (low + high) // 2
j = (m + n + 1) // 2 - i

maxL1 = -float('inf') if i == 0 else nums1[i-1]


minR1 = float('inf') if i == m else nums1[i]
maxL2 = -float('inf') if j == 0 else nums2[j-1]
minR2 = float('inf') if j == n else nums2[j]

if maxL1 <= minR2 and maxL2 <= minR1:


# Valid partition
if (m + n) % 2:
return max(maxL1, maxL2)
return (max(maxL1, maxL2) + min(minR1, minR2)) / 2
elif maxL1 > minR2:
high = i - 1
else:
low = i + 1

Time Complexity: O(log(min(m, n)))


Space Complexity: O(1)

14. Minimum Spanning Tree via Kruskal’s Algorithm


(Graph / Greedy + Union-Find)
Problem:
Given a connected, undirected, and weighted graph with V nodes and an edge list (u, v, w)
(where w is weight), find a Minimum Spanning Tree (MST) — a subset of edges that connects all
vertices with the minimum total weight and no cycles.

Solution Outline:
1. Sort all edges in non-decreasing order by weight.
2. Initialize a Union-Find (Disjoint Set) structure to track connected components.
3. Iterate through sorted edges:
o For each edge (u, v, w):
▪ If u and v are in different sets, add the edge to MST and union their sets.
▪ Otherwise, skip (to avoid cycle).
o Continue until MST has (V − 1) edges.
4. Total weight of selected edges is the MST’s cost.

This runs in O(E log E) due to edge sorting plus nearly O(E α(V)) for Union-Find, effectively
O(E log V)

class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0]*n

def find(self, x):


if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]

def union(self, x, y):


rx, ry = self.find(x), self.find(y)
if rx == ry:
return False
if self.rank[rx] < self.rank[ry]:
self.parent[rx] = ry
elif self.rank[ry] < self.rank[rx]:
self.parent[ry] = rx
else:
self.parent[ry] = rx
self.rank[rx] += 1
return True

def kruskalMST(V, edges):


edges.sort(key=lambda x: x[2])
dsu = DSU(V)
mst_cost = 0
mst_edges = []
for u, v, w in edges:
if dsu.union(u, v):
mst_cost += w
mst_edges.append((u, v, w))
if len(mst_edges) == V - 1:
break
return mst_cost, mst_edges

Time Complexity:
• Sorting: O(E log E) (~ O(E log V))
• Union-Find: O(E α(V)) ≈ O(E)
Space Complexity: O(V + E)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy