Skip to content

Commit 4da1ca1

Browse files
authored
Update nearest_points.md - add randomized algorithm
Explanation of randomized algorithms for closest pair of points.
1 parent b4aba0b commit 4da1ca1

File tree

1 file changed

+139
-0
lines changed

1 file changed

+139
-0
lines changed

src/geometry/nearest_points.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,145 @@ mindist = 1E20;
162162
rec(0, n);
163163
```
164164

165+
## Linear time randomized algorithms
166+
167+
### A linear time (with high probability) algorithm
168+
169+
An alternative method arises from a very simple idea to heuristically improve the runtime: We can divide the plane into a grid of $d \times d$ squares, then it is only required to test distances between same-block or adjacent-block points (unless all squares are disconnected from each other, we will avoid this by design), since any other pair has larger distance that the two points in the same square.
170+
171+
<div style="text-align: center;">
172+
<img src="nearest_points_blocks_example.png" alt="Example of the squares strategy" height="300px">
173+
</div>
174+
175+
176+
We will consider only the squares containing at least one point. Denote by $n_1, n_2, \dots, n_k$ the number of points in each of the $k$ remaining squares. Assuming at least two points are in the same or in adjacent squares, the time complexity is $\Theta(\sum_{i=1}^{k} n_i^2)$.
177+
178+
**Proof.** For the $i$-th square containing $n_i$ points, the number of pairs inside is $\Theta(n_i^2)$. If the $i$-th square is adjacent to the $j$-th square, then we also perform $n_i n_j \le \max(n_i, n_j)^2 \le n_i^2 + n_j^2$ distance comparisons. Notice that each cube has at most $8$ adjacent cubes, so we can bound the sum of all comparisons by $\Theta(\sum_{i=1}^{k} n_i^2)$. $\quad \blacksquare$
179+
180+
Now we need to decide on how to set $d$ so that it minimizes $\Theta(\sum_{i=1}^{k} n_i^2)$.
181+
182+
#### Choosing d
183+
184+
We need $d$ to be an approximation of the minimum distance $d$, and the trick is to just sample $n$ distances randomly and choose $d$ to be the smallest of these distances. We now prove that with high probability this has linear cost.
185+
186+
**Proof.** Assume with a particular choice of $d$, the resulting squares have $C \coloneqq \sum_{i=1}^{k} n_i^2 = \lambda n$. What is the probability that such $d$ survives the sampling of $n$ independent distances? If a single pair among the sampled ones has distance smaller than $d$, this arrangement is not possible. Inside a square, at least half of the pairs would raise a smaller distance, so we have $\sum_{i=1}^{k} \frac{1}{2} {n_i \choose 2}$ pairs which yield a smaller final $d$. This is, approximately, $\frac{1}{4} \sum_{i=1}^{k} n_i^2 = \frac{\lambda}{4} n$. On the other hand, there are about $\frac{1}{2} n^2$ pairs that can be sampled. We have that the probability of sampling a pair with distance smaller than $d$ is at least (approximately) $\frac{\lambda n / 4}{n^2 / 2} = \frac{\lambda/2}{n}$, so the probability of at least one such pair being chosen during the $n$ rounds (and therefore avoiding this situation) is $1 - (1 - \frac{\lambda/2}{n})^n \approx 1 - e^{-\lambda/2}$. This goes to $1$ as $\lambda$ increases. $\quad \blacksquare$
187+
188+
#### Implementation of the algorithm
189+
190+
The advantage of this algorithm is that it is straightforward to implement, but still has good performance in practise.
191+
192+
```{.cpp file=nearest_pair_randomized}
193+
using ll = long long;
194+
using ld = long double;
195+
196+
struct RealPoint {
197+
ld x, y;
198+
RealPoint() {}
199+
RealPoint(T x_, T y_) : x(x_), y(y_) {}
200+
};
201+
using pt = RealPoint;
202+
203+
struct CustomHash {
204+
size_t operator()(const pair<ll,ll>& p) const {
205+
static const uint64_t C = chrono::steady_clock::now().time_since_epoch().count();
206+
return C ^ ((p.first << 32) ^ p.second);
207+
}
208+
};
209+
210+
ld dist(pt a, pt b) {
211+
ld dx = a.x - b.x;
212+
ld dy = a.y - b.y;
213+
return sqrt(dx*dx + dy*dy);
214+
}
215+
216+
pair<pt,pt> closest_pair_of_points_rand_reals(vector<pt> P) {
217+
const ld eps = 1e-9;
218+
219+
int n = int(P.size());
220+
assert(n >= 2);
221+
unordered_map<pair<ll,ll>,vector<pt>,CustomHash> grid;
222+
grid.reserve(n);
223+
224+
mt19937 rd(chrono::system_clock::now().time_since_epoch().count());
225+
uniform_int_distribution<int> dis(0, n-1);
226+
227+
ld d = dist(P[0], P[1]);
228+
pair<pt,pt> closest = {P[0], P[1]};
229+
230+
auto consider_pair = [&](const pt& a, const pt& b) -> void {
231+
ld ab = dist(a, b);
232+
if (ab + eps < d) {
233+
d = ab;
234+
closest = {a, b};
235+
}
236+
};
237+
238+
for (int i = 0; i < n; ++i) {
239+
int j = dis(rd);
240+
int k = dis(rd);
241+
while (j == k)
242+
k = dis(rd);
243+
consider_pair(P[j], P[k]);
244+
}
245+
246+
for (const pt& p : P)
247+
grid[{ll(p.x/d), ll(p.y/d)}].push_back(p);
248+
249+
for (const auto& it : grid) { // same block
250+
int k = int(it.second.size());
251+
for (int i = 0; i < k; ++i) {
252+
for (int j = i+1; j < k; ++j)
253+
consider_pair(it.second[i], it.second[j]);
254+
}
255+
}
256+
257+
for (const auto& it : grid) { // adjacent blocks
258+
auto coord = it.first;
259+
for (int dx = 0; dx <= 1; ++dx) {
260+
for (int dy = -1; dy <= 1; ++dy) {
261+
if (dx == 0 and dy == 0) continue;
262+
pair<ll,ll> neighbour = {
263+
coord.first + dx,
264+
coord.second + dy
265+
};
266+
for (const pt& p : it.second) {
267+
if (not grid.count(neighbour)) continue;
268+
for (const pt& q : grid.at(neighbour))
269+
candidate_closest(p, q);
270+
}
271+
}
272+
}
273+
}
274+
275+
return closest;
276+
}
277+
```
278+
279+
280+
### A randomized algorithm with expected linear time
281+
282+
Now we introduce a different randomized algorithm which is less practical but very easy to show that it runs in expected linear time.
283+
284+
- Permute the $n$ points randomly
285+
- Take $\delta \coloneqq \operatorname{dist}(p_1, p_2)$
286+
- Partition the plane in squares of side $\delta/2$
287+
- For $i = 1,2,\dots,n$:
288+
- Take the square corresponding to $p_i$
289+
- Interate over the $25$ squares within two steps to our square in the grid of squares partitioning the plane
290+
- If some $p_j$ in those squares has $\operatorname{dist}(p_j, p_i) < \delta$, then
291+
- Recompute the partition and squares with $\delta \coloneqq \operatorname{dist}(p_j, p_i)$
292+
- Store points $p_1, \dots, p_i$ in the corresponding squares
293+
- else, store $p_i$ in the corresponding square
294+
- output $\delta$
295+
296+
While this algorithm may look slow, because of recomputing everything multiple times, we can show the total expected cost is linear.
297+
298+
**Proof.** Let $X_i$ the random variable that is $1$ when point $p_i$ causes a change of $\delta$ and a recomputation of the data structures, and $0$ if not. It is easy to show that the cost is $O(n + \sum_{i=1}^{n} i X_i)$, since on the $i$-th step we are considering only the first $i$ points. However, turns out that $\Pr(X_i = 1) \le \frac{2}{i}$. This is because on the $i$-th step, $\delta$ is the distance of the closest pair in $\{p_1,\dots,p_i\}$, and $\Pr(X_i = 1)$ is the probability of $p_i$ belonging to the closest pair, which only happens in $2(i-1)$ pairs out of the $i(i-1)$ possible pairs (assuming all distances are different), so the probability is at most $\frac{2(i-1)}{i(i-1)} = \frac{2}{i}$, since we previously shuffled the points uniformly.
299+
300+
We can therefore see that the expected cost is
301+
$$O(n + \sum_{i=1}^{n} i \Pr(X_i = 1)) \le O(n + \sum_{i=1}^{n} i \frac{2}{i}) = O(3n) = O(n) \quad \quad \blacksquare $$
302+
303+
165304
## Generalization: finding a triangle with minimal perimeter
166305

167306
The algorithm described above is interestingly generalized to this problem: among a given set of points, choose three different points so that the sum of pairwise distances between them is the smallest.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy