American Express Data Analyst DSA Interview Questions
American Express Data Analyst DSA Interview Questions
Interview Questions
(0-3 Years)
17-19 lpa
DSA Questions
1. Longest Subarray with Equal Number of 0s and 1s (Array /
Prefix Sum + Hash Map)
Problem:
Given a binary array (containing only 0s and 1s), find the length of the longest contiguous subarray
with an equal number of 0s and 1s.
Solution:
1. Transform the array: treat 0 as -1, and 1 remains as +1.
2. Compute prefix sums, and store each sum's first occurrence index in a hash map.
3. As you scan:
o If the same prefix sum reappears, the subarray between indices has sum zero →
equal 0s and 1s.
o Length = current_index − first_occurrence_index.
4. Answer = maximum such length found.
def findMaxLength(nums):
prefix_sum = 0
first_idx = {0: -1} # sum 0 before starting
max_len = 0
for i, x in enumerate(nums):
prefix_sum += (1 if x == 1 else -1)
if prefix_sum in first_idx:
max_len = max(max_len, i - first_idx[prefix_sum])
else:
first_idx[prefix_sum] = i
return max_len
• Time Complexity: O(n)
• Space Complexity: O(n)
2. Longest Increasing Subsequence (LIS) (Array /
Dynamic Programming + Binary Search)
Problem:
Given an integer array nums, return the length of the longest strictly increasing subsequence.
Solution Outline:
1. Dynamic Programming (O(n²))
o Define dp[i] = length of the longest increasing subsequence ending at index i.
o Build by comparing each nums[i] with all previous nums[j] where j < i and nums[j] <
nums[i]:
oAnswer is max(dp)
2. Optimized with Binary Search (O(n log n))
o Maintain an array tails[], where tails[k] = smallest possible tail value of an increasing
subsequence of length k+1.
o For each x in nums:
▪ Use binary search to find the position in tails to replace or append x.
o Length of tails at end = length of LIS
def lengthOfLIS(nums):
tails = []
for x in nums:
pos = bisect.bisect_left(tails, x)
if pos == len(tails):
tails.append(x)
else:
tails[pos] = x
return len(tails)
• Time Complexity:
o DP approach: O(n²)
o Optimized: O(n log n)
• Space Complexity: O(n)
3. Longest Consecutive Sequence (Array / Hashing)
Problem:
Given an unsorted integer array nums, find the length of the longest consecutive elements
sequence. The solution must run in O(n) time.
Solution Outline:
1. Use a HashSet to store all numbers for O(1) presence checks.
2. For each number num:
o If num - 1 is not in the set (indicating num is the start of a sequence):
▪ Initialize current_num = num and current_streak = 1.
▪ While current_num + 1 exists in the set:
▪ Increment current_num and current_streak.
o Update max_streak = max(max_streak, current_streak).
3. Return max_streak.
This ensures each number is checked at most twice (when it's sequence start and during
counting), giving O(n) time and O(n) space.
def longestConsecutive(nums):
num_set = set(nums)
max_streak = 0
return max_streak
Solution Outline:
1. Dynamic Programming Table
o Let dp[i][j] represent the length of the LCS between text1[0..i-1] and text2[0..j-
1].
o Initialize a (n+1)×(m+1) table filled with zeros.
2. Recurrence Relation
o If text1[i-1] == text2[j-1], then:
dp[i][j] = dp[i-1][j-1] + 1
o Otherwise:
3. Result
o The desired length is dp[n][m], where n = len(text1), m = len(text2)
o This fills the table in O(n⋅m) time and uses O(n⋅m) space.
return dp[n][m]
Time Complexity: O(n × m)
You can optimize space to O(min(n, m)) by keeping only two rows at a time
5. 3Sum (Array / Two Pointers)
Problem:
Given an integer array nums, find all unique triplets (a, b, c) such that a + b + c = 0. Return the
list of triplets, without duplicates.
Solution Outline:
1. Sort the array to make it easy to avoid duplicates and use two-pointer.
2. Iterate i from 0 to n−3:
o Skip duplicates for nums[i].
o Set two pointers: left = i + 1, right = n − 1.
o While left < right:
▪ Compute s = nums[i] + nums[left] + nums[right].
▪ If s == 0, record the triplet, move both pointers, skipping duplicates.
▪ If s < 0, move left to the right.
▪ If s > 0, move right to the left.
3. Continue until pointers meet.
This gives O(n²) time and O(log n) or O(n) extra space depending on sort implementation.
def threeSum(nums):
nums.sort()
res = []
n = len(nums)
for i in range(n - 2):
if i > 0 and nums[i] == nums[i-1]:
continue
left, right = i + 1, n - 1
while left < right:
s = nums[i] + nums[left] + nums[right]
if s == 0:
res.append([nums[i], nums[left], nums[right]])
left += 1
right -= 1
while left < right and nums[left] == nums[left-1]:
left += 1
while left < right and nums[right] == nums[right+1]:
right -= 1
elif s < 0:
left += 1
else:
right -= 1
return res
Solution Outline:
Use Depth-First Search (DFS) with a recursion stack (path_visited) to detect back edges:
1. Build adjacency list from edge list.
2. Initialize two boolean arrays of size V:
o visited[] — tracks globally visited nodes.
o path_visited[] — tracks nodes in the current DFS path.
3. For each vertex i from 0 to V-1:
o If not visited, call dfs(i).
4. dfs(node):
o Mark visited[node] = True and path_visited[node] = True.
o For each neighbor nbr:
▪ If nbr is unvisited, recursively call dfs(nbr); if True, return True.
▪ Else if path_visited[nbr] is True, we've found a back edge, so return True.
o On exit, set path_visited[node] = False and return False.
5. If any DFS returns True, a cycle exists; otherwise, no cycle.
This runs in O(V + E) time and O(V + E) space (for adjacency list and recursion tracking)
visited = [False] * V
path_vis = [False] * V
def dfs(u):
visited[u] = True
path_vis[u] = True
for v in adj[u]:
if not visited[v]:
if dfs(v):
return True
elif path_vis[v]:
return True
path_vis[u] = False
return False
for i in range(V):
if not visited[i]:
if dfs(i):
return True
• return False
Time Complexity: O(V + E)
• Space Complexity: O(V + E) for adjacency list + O(V) recursion stack.
7. Longest Palindromic Substring (Strings / Expand
Around Center or DP)
Problem:
Given a string s, find the longest substring that reads the same forwards and
backwards.
def longestPalindrome(s):
if not s: return ""
start, end = 0, 0
for i in range(len(s)):
l1, r1 = expand(i, i)
l2, r2 = expand(i, i + 1)
if r1 - l1 > end - start:
start, end = l1, r1
if r2 - l2 > end - start:
start, end = l2, r2
return s[start:end + 1]
Solution Outline:
1. Initialization:
o Set dist[v] = ∞ for all vertices, except dist[src] = 0.
o Create a min-heap (priority queue) and push (0, src).
2. Main Loop (Greedy Relaxation):
o While the heap is not empty:
1. Pop (d, u) — the vertex u with the minimum tentative distance.
2. If d > dist[u], skip (it's an outdated entry).
3. For each neighbor v of u with edge weight w:
▪ If dist[u] + w < dist[v]:
▪ Update dist[v] = dist[u] + w
▪ Push (dist[v], v) into the heap.
o Continue until all reachable nodes are processed.
3. Result:
o dist[v] holds the shortest path weight from src to each vertex v.
This runs in O((V + E) log V) time using a binary heap (or O(E + V log V) with Fibonacci heap)
import heapq
while pq:
d, u = heapq.heappop(pq)
if d > dist[u]:
continue
for v, w in adj[u]:
nd = d + w
if nd < dist.get(v, float('inf')):
dist[v] = nd
heapq.heappush(pq, (nd, v))
return dist
Problem:
Design a Trie with the following operations:
• insert(word): Inserts the string word.
• search(word): Returns true if word is in the Trie.
• startsWith(prefix): Returns true if any word in the Trie starts with prefix.
Solution Outline:
1. Trie Node Structure:
o Each node has:
▪ children: A map (or array of size 26) pointing to next chars.
▪ isWord: A boolean flag indicating end of a valid word.
2. Insert Operation (O(L)):
o Start at the root.
o For each character c in word:
▪ If c not in current node’s children → create a new node.
▪ Move to the child node.
o After the last char, mark isWord = True.
3. Search Word (O(L)):
o Traverse nodes following each character.
o If a character isn't found → return false.
o At end, check isWord flag.
4. Search Prefix (O(L)):
o Similar traversal without checking isWord.
o If traversal completes, return true.
class TrieNode:
def __init__(self):
self.children = {}
self.isWord = False
class Trie:
def __init__(self):
self.root = TrieNode()
Problem:
Given n non-negative integers height[0..n−1], where each value represents a vertical line
at coordinate i with height = height[i], find two lines that, together with the x-axis, form a
container such that the container holds the most water. Return the maximum area.
Solution Outline:
1. Initialize two pointers: left = 0 and right = n − 1.
2. Loop while left < right:
o Compute area = (right − left) * min(height[left], height[right]), update max_area.
o Move the pointer with the shorter line inward, since moving the taller one cannot
increase area (width shrinks but height limited by shorter line).
3. Continue until pointers meet. Return max_area.
This gives O(n) time and O(1) space.
def maxArea(height):
left, right = 0, len(height) - 1
max_area = 0
while left < right:
width = right - left
h = min(height[left], height[right])
max_area = max(max_area, width * h)
if height[left] < height[right]:
left += 1
else:
right -= 1
return max_area
11. Find the First and Last Position of Element in Sorted Array
(Array / Binary Search)
Problem:
Given a sorted array nums and a target value target, return the starting and ending index of the
target in the array. If the target is not found, return [-1, -1]. Your solution must run in O(log n)
time.
Solution Outline:
Use binary search twice:
1. First Occurrence:
o Customize binary search to look for the first index where nums[mid] == target.
o If found, continue searching left (setting high = mid - 1) to find earlier occurrences.
2. Last Occurrence:
o Similar, but search right (setting low = mid + 1) when you find target to locate the last
index.
Since each search runs in O(log n), the total time is O(log n).
• Binary Search Details: Achieves O(log n) lookups on sorted arrays thanks to halving the
search range each step
first = findBound(True)
if first == -1:
return [-1, -1]
last = findBound(False)
return [first, last]
Problem:
Given a string s, return the length of the longest substring that contains no repeated
characters.
def lengthOfLongestSubstring(s):
last_seen = {}
left = 0
max_len = 0
return max_len
while True:
i = (low + high) // 2
j = (m + n + 1) // 2 - i
Solution Outline:
1. Sort all edges in non-decreasing order by weight.
2. Initialize a Union-Find (Disjoint Set) structure to track connected components.
3. Iterate through sorted edges:
o For each edge (u, v, w):
▪ If u and v are in different sets, add the edge to MST and union their sets.
▪ Otherwise, skip (to avoid cycle).
o Continue until MST has (V − 1) edges.
4. Total weight of selected edges is the MST’s cost.
This runs in O(E log E) due to edge sorting plus nearly O(E α(V)) for Union-Find, effectively
O(E log V)
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0]*n
Time Complexity:
• Sorting: O(E log E) (~ O(E log V))
• Union-Find: O(E α(V)) ≈ O(E)
Space Complexity: O(V + E)