|
| 1 | +--- |
| 2 | +title: Virtual Trees |
| 3 | +tags: |
| 4 | + - Original |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +This article explores the concept of *virtual trees*, proves their key properties, describes an efficient algorithm for their construction, and then applies the method to solve a specific problem using dynamic programming (DP) on trees. |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +Many tree-related problems require working with a subset of vertices while preserving the tree structure induced by their pairwise lowest common ancestors (LCA). The concept of a *virtual tree* allows us to transition from the original tree $T$ with $N$ vertices to a substructure whose size linearly depends on the size of the selected set $X$. This significantly accelerates algorithms, particularly for DP computations. |
| 13 | + |
| 14 | +## Definition of a Virtual Tree |
| 15 | + |
| 16 | +Let $T$ be a given rooted tree, and $X$ be some subset of its vertices. A **virtual tree** $A(X)$ is defined as follows: |
| 17 | + |
| 18 | +$$ A(X) = \{ \operatorname{lca}(x,y) \mid x,y \in X \},$$ |
| 19 | + |
| 20 | +where $\operatorname{lca}(x,y)$ denotes the lowest common ancestor of vertices $x$ and $y$ in tree $T$. In this tree, an edge is drawn between each pair of vertices if one of them is an ancestor of the other in $T$. |
| 21 | + |
| 22 | +This definition ensures that the structure includes all vertices "important" for analyzing $X$ (i.e., all LCAs for pairs of vertices from $X$), and connectivity is inherited from the original tree. |
| 23 | + |
| 24 | +## Key Properties and Their Proofs |
| 25 | + |
| 26 | +### Property 1: Size Estimation |
| 27 | + |
| 28 | +**Statement.** |
| 29 | +For any set $X$ of vertices of tree $T$, the following inequality holds: |
| 30 | + |
| 31 | +$$ |
| 32 | +\vert A(X) \vert \le 2 \vert X \vert - 1. |
| 33 | +$$ |
| 34 | + |
| 35 | +**Proof.** |
| 36 | +Let's choose a depth-first search (DFS) traversal order and consider the vertices of set $X$ arranged in traversal order: |
| 37 | + |
| 38 | +$$ x_1, x_2, \dots, x_m. $$ |
| 39 | + |
| 40 | +Denote |
| 41 | + |
| 42 | +$$ l = \operatorname{lca}(x_1, x_2). $$ |
| 43 | + |
| 44 | +We need to prove the following lemma. |
| 45 | + |
| 46 | +!!! note "Lemma" |
| 47 | + For any integer $n \ge 3$, if $\operatorname{lca}(x_1,x_n) \neq l$, then $\operatorname{lca}(x_1,x_n) = \operatorname{lca}(x_2,x_n).$ |
| 48 | + |
| 49 | +??? hint "Proof" |
| 50 | + Suppose that for some $n \ge 3$, both $\operatorname{lca}(x_1,x_n) \neq l$ and $\operatorname{lca}(x_1,x_n) \neq \operatorname{lca}(x_2,x_n)$ are true. Let $k = \operatorname{lca}(x_1,x_n)$. |
| 51 | + If $k$ is an ancestor of $l$, then by definition both $\operatorname{lca}(x_1,x_n)$ and $\operatorname{lca}(x_2,x_n)$ equal $k$, which contradicts our assumption. Therefore, $k$ must be a descendant of $l$. But then in the DFS order, we would get the sequence $x_1, x_n, x_2$, which contradicts our chosen traversal order. |
| 52 | + This contradiction completes the proof of the lemma. |
| 53 | + |
| 54 | +Returning to the proof of the property, we note that according to the lemma, each LCA arising from sequential consideration of the vertices in $X$ equals either: |
| 55 | +1. the vertex $x_1$ itself, |
| 56 | +2. $\operatorname{lca}(x_1,x_2)$, or |
| 57 | +3. $\operatorname{lca}(x_2,x_n)$ for some $n$. |
| 58 | + |
| 59 | +Thus, we can recursively write: |
| 60 | + |
| 61 | +$$ |
| 62 | +A(\{x_1,\dots,x_m\}) = A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \}. |
| 63 | +$$ |
| 64 | + |
| 65 | +From this, it follows that |
| 66 | + |
| 67 | +$$ |
| 68 | +\vert A(X) \vert \le \vert A(\{x_2,\dots,x_m\}) \vert + 2. |
| 69 | +$$ |
| 70 | + |
| 71 | +Applying induction on the size of set $X$, we obtain the final estimate: |
| 72 | + |
| 73 | +$$ |
| 74 | +\vert A(X) \vert \le 2 \vert X \vert - 1. |
| 75 | +$$ |
| 76 | + |
| 77 | +Moreover, if tree $T$ is a perfect binary tree and $X$ consists of its leaves, the inequality is achieved exactly. |
| 78 | + |
| 79 | +### Property 2: Representation through Sequential LCAs |
| 80 | + |
| 81 | +**Statement.** |
| 82 | +Let the vertices of $X$ be ordered by DFS traversal order: $x_1, x_2, \dots, x_m$. Then |
| 83 | + |
| 84 | +$$ |
| 85 | +A(X) = X \cup \{ \operatorname{lca}(x_i, x_{i+1}) \mid 1 \le i < m \}. |
| 86 | +$$ |
| 87 | + |
| 88 | +**Proof.** |
| 89 | +We start with the recursive representation: |
| 90 | + |
| 91 | +$$ |
| 92 | +A(\{x_1,\dots,x_m\}) = A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \}. |
| 93 | +$$ |
| 94 | + |
| 95 | +Let's expand it recursively: |
| 96 | + |
| 97 | +$$ |
| 98 | +\begin{aligned} |
| 99 | +A(\{x_1,\dots,x_m\}) &= A(\{x_2,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2) \} \\ |
| 100 | +&= A(\{x_3,\dots,x_m\}) \cup \{ x_1, \operatorname{lca}(x_1,x_2), x_2, \operatorname{lca}(x_2,x_3) \} \\ |
| 101 | +&\quad \vdots \\ |
| 102 | +&= \{ x_1, \operatorname{lca}(x_1,x_2), x_2, \dots, x_{m-1}, \operatorname{lca}(x_{m-1},x_m), x_m \}. |
| 103 | +\end{aligned} |
| 104 | +$$ |
| 105 | + |
| 106 | +Thus, we obtain the required representation. |
| 107 | + |
| 108 | +## Construction of a Virtual Tree |
| 109 | + |
| 110 | +Given a set of vertices $X$, the virtual tree $A(X)$ can be constructed in time |
| 111 | + |
| 112 | +$$ |
| 113 | +O\left(|X| (\log |X| + \log N)\right), |
| 114 | +$$ |
| 115 | + |
| 116 | +with preprocessing of the original tree $T$ in $O(N \log N)$ to ensure fast LCA queries. |
| 117 | + |
| 118 | +**Main construction steps:** |
| 119 | + |
| 120 | +1. **Preprocessing.** |
| 121 | + Using depth-first search (DFS) or methods such as binary lifting or HLD, we compute entry times for vertices and LCAs for any two vertices. |
| 122 | + |
| 123 | +2. **Vertex Sorting.** |
| 124 | + We sort the vertices of set $X$ in the order of their appearance during DFS traversal (using entry times). |
| 125 | + |
| 126 | +3. **Adding LCAs.** |
| 127 | + For consecutive vertices $x_i$ and $x_{i+1}$, we compute $\operatorname{lca}(x_i,x_{i+1})$. The union of $X$ and these LCAs gives the set of vertices $A(X)$ (according to Property 2). |
| 128 | + |
| 129 | +4. **Tree Construction.** |
| 130 | + Having obtained the set of vertices $A(X)$ sorted by DFS order, we can traverse the sequence using a stack to restore the tree structure. During traversal, we maintain a stack whose top element is the current vertex, and if a new vertex is not a descendant of the vertex at the top of the stack, elements are "popped" until the corresponding ancestor is found, after which connecting occurs. |
| 131 | + |
| 132 | +This algorithm guarantees that the virtual tree contains $O(|X|)$ vertices, allowing efficient application of DP. |
| 133 | + |
| 134 | +## Application: Counting Subtrees in a Colored Tree |
| 135 | + |
| 136 | +### Example [Problem](https://atcoder.jp/contests/abc340/tasks/abc340_g) Statement |
| 137 | + |
| 138 | +Given a tree $T$ with $N$ vertices numbered from $1$ to $N$. Edge $i$ connects vertices $u[i]$ and $v[i]$. Each vertex $i$ is colored with color $A[i]$. |
| 139 | +It is necessary to find (modulo 998244353) the number of (non-empty) subsets $S$ of vertices of tree $T$ satisfying the condition: |
| 140 | + |
| 141 | +- The induced graph $G[S]$ is a tree. |
| 142 | +- All vertices of $G[S]$ with degree 1 have the same color. |
| 143 | + |
| 144 | +### Main Solution Idea |
| 145 | + |
| 146 | +The main idea is to break down the problem by colors. For each color $c$, we consider the set of vertices $X$ colored with $c$ and build a virtual tree for $X$. Then, on the resulting tree, we perform DP to count valid subtrees where all vertices with degree 1 have color $c$. The final answer is obtained by summing the results for all colors. |
| 147 | + |
| 148 | +Thanks to the construction of the virtual tree, although the original tree contains $N$ vertices, each virtual tree for a specific color has size $O(|X|)$, which allows DP to be performed in an acceptable time. |
| 149 | + |
| 150 | +We get that the total complexity of preprocessing and building virtual trees does not exceed $O(N \log N + \sum_{c \in C} (|X_c| (\log |X_c| + \log N)) = O(N \log N)$, where $C$ is the set of different vertex colors and $X_c$ is the set of vertices with color $c$. |
| 151 | + |
| 152 | +### Solution Implementation |
| 153 | + |
| 154 | +Below is the implemenation with comments describing the main components of the algorithm: |
| 155 | + |
| 156 | +```{.cpp file=virtual_trees} |
| 157 | +const int MOD = 998244353; |
| 158 | + |
| 159 | +vector<int> g[MAXN]; // Adjacency list of the given graph |
| 160 | +vector<int> vertex_sets[MAXN]; // For each color c, vertex_sets[c] stores vertices with color c |
| 161 | +int tmr, n; // Global time counter and number of vertices |
| 162 | +int up[LOGN][MAXN], dep[MAXN], tin[MAXN]; // For computing LCA and entry time |
| 163 | +int col[MAXN]; // Vertex colors array |
| 164 | +vector<int> virtual_g[MAXN]; // Adjacency list for the virtual tree |
| 165 | + |
| 166 | +// DP arrays for dynamic programming on the virtual tree |
| 167 | +int dp[MAXN][2], sum[MAXN]; |
| 168 | + |
| 169 | +// Preprocessing function: DFS to compute tin, up array, and dep |
| 170 | +void dfs_precalc(int v, int p) { |
| 171 | + tin[v] = ++tmr; |
| 172 | + up[0][v] = p; |
| 173 | + dep[v] = dep[p] + 1; |
| 174 | + for (int i = 1; i < LOGN; ++i) |
| 175 | + up[i][v] = up[i - 1][up[i - 1][v]]; |
| 176 | + for (auto to : g[v]) |
| 177 | + if (to != p) |
| 178 | + dfs_precalc(to, v); |
| 179 | +} |
| 180 | + |
| 181 | +// Function to compute LCA using binary lifting |
| 182 | +int getlca(int x, int y) { |
| 183 | + if (dep[x] < dep[y]) swap(x, y); |
| 184 | + for (int i = LOGN - 1; i >= 0; --i) |
| 185 | + if (dep[up[i][x]] >= dep[y]) |
| 186 | + x = up[i][x]; |
| 187 | + if (x == y) return x; |
| 188 | + for (int i = LOGN - 1; i >= 0; --i) |
| 189 | + if (up[i][x] != up[i][y]) { |
| 190 | + x = up[i][x]; |
| 191 | + y = up[i][y]; |
| 192 | + } |
| 193 | + return up[0][x]; |
| 194 | +} |
| 195 | + |
| 196 | +// DFS on the virtual tree to perform DP. |
| 197 | +// Parameter c — target color for which counting is performed. |
| 198 | + |
| 199 | +void dfs_calc(int v, int p, int c, int &ans) { |
| 200 | + dp[v][0] = dp[v][1] = 0; |
| 201 | + sum[v] = 0; |
| 202 | + for(auto to : virtual_g[v]) { |
| 203 | + if(to == p) continue; |
| 204 | + dfs_calc(to, v, c, ans); |
| 205 | + // DP transitions: combining current state with result from subtree. |
| 206 | + int nxt0 = (dp[v][0] + sum[to]) % MOD; |
| 207 | + int nxt1 = ((dp[v][0] + dp[v][1]) * 1ll * sum[to] % MOD + dp[v][1]) % MOD; |
| 208 | + dp[v][0] = nxt0; |
| 209 | + dp[v][1] = nxt1; |
| 210 | + } |
| 211 | + sum[v] = (dp[v][0] + dp[v][1]) % MOD; |
| 212 | + if(col[v] == c) { |
| 213 | + // If the vertex has the target color, it can participate in a valid subtree. |
| 214 | + sum[v] = (sum[v] + 1) % MOD; |
| 215 | + ans = (ans + sum[v]) % MOD; |
| 216 | + } else { |
| 217 | + ans = (ans + dp[v][1]) % MOD; |
| 218 | + } |
| 219 | +} |
| 220 | + |
| 221 | +// Function to build a virtual tree for color c and perform DP. |
| 222 | +void calc_virtual(int c, int &ans) { |
| 223 | + auto p = vertex_sets[c]; |
| 224 | + if (p.empty()) return; |
| 225 | + // Sort vertices by entry time (tin) — inorder traversal order. |
| 226 | + sort(p.begin(), p.end(), [&](const int a, const int b) { return tin[a] < tin[b]; }); |
| 227 | + vector<int> stack = {1}; // Initialize stack with the root of tree T (vertex 1). |
| 228 | + virtual_g[1].clear(); |
| 229 | + auto add = [&](int u, int v) { |
| 230 | + virtual_g[u].push_back(v); |
| 231 | + virtual_g[v].push_back(u); |
| 232 | + }; |
| 233 | + // Process each vertex from set p, maintaining a stack to build the virtual tree. |
| 234 | + for (auto u : p) { |
| 235 | + if (u == 1) continue; |
| 236 | + int lca = getlca(u, stack.back()); |
| 237 | + if (lca != stack.back()) { |
| 238 | + while (stack.size() >= 2 && tin[lca] < tin[stack[stack.size() - 2]]) { |
| 239 | + add(stack.back(), stack[stack.size() - 2]); |
| 240 | + stack.pop_back(); |
| 241 | + } |
| 242 | + if (stack.size() >= 2 && tin[lca] != tin[stack[stack.size() - 2]]) { |
| 243 | + virtual_g[lca].clear(); |
| 244 | + add(stack.back(), lca); |
| 245 | + stack.back() = lca; |
| 246 | + } else { |
| 247 | + add(stack.back(), lca); |
| 248 | + stack.pop_back(); |
| 249 | + } |
| 250 | + } |
| 251 | + virtual_g[u].clear(); |
| 252 | + stack.push_back(u); |
| 253 | + } |
| 254 | + while (stack.size() > 1) { |
| 255 | + add(stack.back(), stack[stack.size() - 2]); |
| 256 | + stack.pop_back(); |
| 257 | + } |
| 258 | + // Perform DP on the virtual tree, starting from root 1. |
| 259 | + return dfs_calc(1, 0, c, ans); |
| 260 | +} |
| 261 | + |
| 262 | +// The main function where we read input data and calculate total answer |
| 263 | + |
| 264 | +int solve(int N, const vector<pair<int, int>> &edges, const vector<int> &colors) { |
| 265 | + n = N; |
| 266 | + for(auto [x, y] : edges) g[x].push_back(y), g[y].push_back(x); |
| 267 | + copy(colors.begin(), colors.end(), col + 1); |
| 268 | + // Group vertices by color. |
| 269 | + for(int i = 1; i <= n; ++i) vertex_sets[col[i]].push_back(i); |
| 270 | + dfs_precalc(1, 0); |
| 271 | + int ans = 0; |
| 272 | + // Process the corresponding virtual tree for each possible color. |
| 273 | + for (int i = 1; i <= n; ++i) calc_virtual(i, ans); |
| 274 | + return ans; |
| 275 | +} |
| 276 | + |
| 277 | +``` |
| 278 | +
|
| 279 | +### Implementation Explanation |
| 280 | +
|
| 281 | +- **Preprocessing.** |
| 282 | + The `dfs_precalc` function performs a depth-first traversal of tree $T$, computing entry time (`tin`), depth, and filling the binary lifting table `up` for fast LCA queries. |
| 283 | +
|
| 284 | +- **Computing LCA.** |
| 285 | + The `getlca` function implements binary lifting for quickly finding the lowest common ancestor of two vertices. |
| 286 | +
|
| 287 | +- **Building the Virtual Tree.** |
| 288 | + The `calc_virtual` function takes a color $c$, extracts the set of vertices `vertex_sets[c]`, sorts it by `tin`, and uses a stack to build the virtual tree. For each pair of consecutive vertices, the LCA is computed, corresponding to Property 2. |
| 289 | +
|
| 290 | +- **Dynamic Programming.** |
| 291 | + The `dfs_calc` function traverses the virtual tree and combines the results from subtrees according to DP transitions. The DP states are calculated in such a way that the contribution of a vertex is accounted for if it has the target color, and only those subtrees where all leaves have the same color are counted. |
| 292 | +
|
| 293 | +- **Collecting the Result.** |
| 294 | + The `main` function reads the input data, performs preprocessing, and then sums up the results for each color, outputting the final answer modulo 998244353. |
| 295 | +
|
| 296 | +## Practice Problems |
| 297 | +1. [Leaf Color](https://atcoder.jp/contests/abc340/tasks/abc340_g) (problem from the article) |
| 298 | +2. [Unique Occurrences](https://codeforces.com/contest/1681/problem/F) |
| 299 | +3. [Yet Another Tree Problem](https://www.codechef.com/DEC21A/problems/YATP) |
0 commit comments