0% found this document useful (0 votes)
112 views10 pages

Notes On Gans, Energy-Based Models, and Saddle Points

This document explores the connections between generative adversarial networks (GANs) and energy-based models. It first derives a minimax expression for the negative log-likelihood of energy-based models. By adding a particular cost regularization term, the objective can be shown to recover the original GAN objective, drawing a close connection. The objectives considered are convex-concave functions of the cost and sampling density, allowing gradient-based optimization to converge.

Uploaded by

Nguyễn Việt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views10 pages

Notes On Gans, Energy-Based Models, and Saddle Points

This document explores the connections between generative adversarial networks (GANs) and energy-based models. It first derives a minimax expression for the negative log-likelihood of energy-based models. By adding a particular cost regularization term, the objective can be shown to recover the original GAN objective, drawing a close connection. The objectives considered are convex-concave functions of the cost and sampling density, allowing gradient-based optimization to converge.

Uploaded by

Nguyễn Việt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Notes On GANs, Energy-Based Models, and Saddle Points

John Schulman joschu@openai.com

1 Introduction
These notes explore a family of methods related to generative adversarial networks (GANs)
[Goo+14]; these methods try to estimate a probability distribution from samples, using a min-
imax objective that involves a generator/sampler and discriminator/cost. First we’ll derive a
minimax expression for negative log-likelihood of energy-based models. Reviewing an idea from
[HE16], we’ll show that after adding a particular regularizer to the cost function, we can recover
the original GAN objective, drawing a close connection between energy-based models and GANs.
The objectives we consider are convex-concave functions of the cost and sampling density (but not
necessarily with respect to their parameters). Convex-concave problems are tractable (like con-
vex optimization problems) and we discuss some results from the theory that may provide useful
intuitions.

Q&A
• The connections between energy-based models and GANs have already been pointed out in
[KB16] and [Chr16]. How is this writeup different? Those papers show that the gradient
expressions for GANs are almost the same as the gradient expressions for energy-based
models. That’s a useful observation, but it provides an incomplete picture of what the
algorithms converge to after many updates. This writeup focuses on how the objective
functions are similar and different.

• We already know from [Goo+14] that GANs converge “in function space”—i.e., when we
assume that the discriminator is optimized over the space of all functions each iteration.
Given that your analysis relies on the convex-concave property in function space, what does
it add? The limitation of the convergence anlaysis in [Goo+14] is that the actual opti-
mization procedure for GANs doesn’t fully optimize the discriminator each iteration, it
simultaneously performs small updates to both the generator and discriminator. This pro-
cedure does not converge in general for minimax problems minx maxy f (x, y). However, it
turns out that for convex-concave problems (i.e., where f is convex in x and concave in y),
gradient-based updates do converge to an optimal point (called an equilibrium or saddle
point) where minx maxy f (x, y) = maxy minx f (x, y).

2 Energy-Based Models
Energy based models define a probability distribution in terms of a cost (energy) function c:

p(x) = e−c(x) /Zc , (1)


Z
where Zc = dx e−c(x) (2)

1
Given a set of datapoints x1 , x2 , . . . , xN , the total negative log-likelihood is
N
X N
X
total nll = − log p(xn ) = (c(xn ) + log(Zc )) (3)
n=1 n=1

It is more meaningful to consider the loss per datapoint, which is


1
nll per datapoint = (total nll) = Edata [c(x)] + log(Zc ). (4)
N
where Edata [. . . ] indicates that we sample x uniformly from x1 , x2 , . . . , xN .
We can write down an importance sampling estimator for Z, as an expectation under a distri-
bution q(x).

e−c(x)
 −c(x) 
e
Z Z
−c(x)
Zc = dx e = dx q(x) = Eq (5)
q(x) q(x)

Thus,

e−c(x)
 
nll per datapoint = Edata [c(x)] + log Eq (6)
q(x)

Jensen’s inequality implies that log(E [y]) ≥ E [log(y)], with equality where y is constant. Thus,

e−c(x)
 
nll per datapoint ≥ Edata [c(x)] + Eq log (7)
q(x)
= Edata [c(x)] − Eq [c(x)] − Eq [log q(x)] (8)
= Edata [c(x)] − Eq [c(x)] + H(q) (9)

Hence, the right-hand side expression is a lower bound on bound on the negative log-likelihood.
Equality holds when q(x) = e−c(x) /Zc , hence we can write
 
nll per datapoint = max Edata [c(x)] − Eq [c(x)] + H(q) (10)
q

We are minimizing negative log-likelihood, so our full optimization problem looks like
   
min nll per datapoint = min max Edata [c(x)] − Eq [c(x)] + H(q) (11)
c c q

In summary, we wrote the likelihood maximization problem (wrt c) as a minimax problem,


involving a sampling distribution q.r This could be turned into an algorithm, which jointly learns
a sampling distribution q and the cost (energy) function c. In practice, when c and q are represented
by nonlinear function approximators, we will need to jointly optimize them by SGD, thus q will
not have a chance to fully catch up with c. With this formulation c can grow without bound, so
the minimax objective above may behave unstably.

2
3 Cost Regularization
Let’s introduce a regularization term ψ(c) which encourages c to be small. This change can be
interpreted as introducing a prior on c. Since c encodes a probability distribution, we’d prefer for
it to be smooth and simple, rather than putting a delta function at the data points. The objective
is redefined as follows, to be the nll plus a regularization term:

L(c) = Edata [c(x)] + log(Zc ) + ψ(c) (12)

Repeating the derivation of the previous section (but with ψ(c) added), we get
 
L(c) = max Edata [c(x)] − Eq [c(x)] + H(q) + ψ(c) (13)
q

= max L(c, q) (14)


q

where the last line is the definition of L(c, q).

3.1 Deriving the GAN Objective


Now we’ll show that by reparameterizing c and choosing a particular regularizer ψ(c), we can
derive the original GAN objective, plus the entropy term H(q). Let c(x) = log σ(−f (x)), where σ
is the sigmoid function σ(z) = 1/(1 + e−z ), and f is a function approximator with a scalar output,
e.g., the output of a neural network, whose last layer has output size 1 and linear activation. (Note
that with this definition, large f ↔ low cost ↔ the sample looks like it came from pdata .) This
function is plotted below.

0.0
log σ(−z)

−0.5

−1.0

−1.5

−2.0

−2.5

−3.0

−3.5
−3 −2 −1 0 1 2 3

Previously we referred to ψ(c), but now c is defined in terms of f , so we’ll write ψ(f ) for the
regularization term. The objective becomes

L(f, q) = Edata [log σ(−f (x))] − Eq [log σ(−f (x))] + H(q) + ψ(f ) (15)

We’re going to define ψ(f ) as the following expectation over the data:

ψ(f ) = Edata [− log σ(f (x)) − log σ(−f (x))] (16)

We designed this term so it would cancel out the log σ(−f (x)) and replace it with a − log σ(f (x)).
The regularizer − log σ(z) − log σ(−z) is plotted below. It’s ≈ z 2 + 2 log 2 around the origin but
then becomes ≈ |z| as |z| → ∞.

3
3.5
− log σ(z) − log σ(−z)

3.0

2.5

2.0

1.5

1.0
−3 −2 −1 0 1 2 3

L(f, q) = Edata [log σ(−f (x))] − Eq [log σ(−f (x))] + H(q) + Edata [− log σ(f (x)) − log σ(−f (x))]
= −Edata [log σ(f (x))] − Eq [log σ(−f (x))] (17)

The sigmoid function has the nice property that


1 ez
sigmoid(−z) = = 1 − = 1 − sigmoid(z). (18)
1 + ez 1 + ez
Thus we get

L(f, q) = −Edata [log σ(f (x))] − Eq [log(1 − σ(f (x)))] + H(q) (19)
= −Edata [log(D(x))] − Eq [log(1 − D(x))] + H(q) (20)
defining D(x) = σ(f (x))

and our optimization problem is

min max L(f, q) (21)


f q

There are two difference between the optimization problem we’ve derived in this section and
the original GAN formulation of [Goo+14].

1. The entropy regularization, H(q)

2. The min and the max are switched—in the original GAN formulation, the generator is on
the outside, but here, the generator is on the inside.

Why does it make sense to switch the min and max? When we add the entropy regular-
ization term, we can freely switch the min and the max, both orderings yield the same solution
(f ∗ , q ∗ ). That follows from the properties of convex-concave functions, which are discussed in
Section 4, and specialized to the GAN case in Section 5.

One ugly detail—normalization. There is one problem with the entropy-regularized formula-
tion. Since cost is parameterized as c(x) = log σ(−f (x)), we have that c(x) ≤ 0. If the domain of
x is infinite, then e−c(x) will have an infinite integral, i.e., we won’t have a finite partition function.
The underlying issue is that on an infinite space, the entropy regularization is too strong—the
“pressure” to spread out q is stronger than the pressure to stay near the low-cost regions c(x).
This problem can be fixed by using a KL divergence penalty −K(q, q0 ) instead of the entropy
bonus, where, q0 could be a Gaussian distribution covering the range of reasonable values for x.

4
4 Saddle Points
This section provides a general discussion of the optimization problems we’ve encountered above,
which involve a minimization over one set of variables and a maximization over the others. This
section will make it possible to answer questions such as “what happens if we switch the min
and max?” and “does gradient descent converge to the solution of a minimax problem?” We’ll
start out by defining three key concepts: minimax problems, saddle points, and convex-concave
problems.

1. Minimax problems are optimization problems that take the form minx maxy f (x, y). In
general, the min and the max do not commute: it’s not true in general that the value
minx maxy f (x, y) = maxy minx f (x, y), and it’s not true in general that a solution (x∗ , y ∗ ) to
one ordering will be a solution to the other. (When we say (x∗ , y ∗ ) is a solution for the order-
ing minx maxy f (x, y), we mean that x∗ minimizes maxy f (x, y), and y ∗ ∈ argmaxy f (x∗ , y).)

2. A saddle point is defined as a pair (x∗ , y ∗ ) satisfying x∗ ∈ argminx f (x, y ∗ ), and y ∗ ∈


argmaxy f (x∗ , y). That is, we can exchange the min and max.

3. A convex-concave function f (x, y) is convex in its first argument, and concave in its second
argument.

A basic theorem states that if f is convex-concave, then we can always exchange the min and
max and the value is unchanged: minx maxy f (x, y) = maxy minx f (x, y). Furthermore, for convex-
concave function, any point where the gradient vanishes, ∇x f (x, y) = ∇y f (x, y) = 0, is a saddle
point.
Minimax problems have an interpretation as a two player game between Xander and Yasaman.
x is Xander’s move, and y is Yasaman’s move. Xander is trying to minimize f (x, y), and Yasaman
is trying to maximize it. The ordering minx maxy f (x, y) means that Xander goes first, whereas
maxy minx f (x, y) means that Yasaman goes first. The second player has an advantage, because
he or she can see the first player’s move and respond accordingly—that is, maxy minx f (x, y) ≤
minx maxy f (x, y).
Finding saddle points of convex-concave functions is tractable, unlike solving general minimax
problems. In a way, finding these saddle points is on the same level of hardness as finding the
minimizers of convex functions. In fact, much of the theory for analyzing (stochastic) gradient
descent carries over from convex minimization to the problem of finding saddle points.
Saddle points play a key role in constrained convex optimization problems, where the solu-
tion corresponds to finding the saddle point of the Lagrangian L(x, λ, ν), which is convex in the
argument x and concave in the Lagrange multipliers (λ, ν). We can find the saddle point using
Newton’s method. (See [BV04], 10.3.)
One issue that makes saddle point problems harder to understand than minimization problems
is that it’s less straightforward to measure optimization progress. For minimization problems
minx f (x), we can trivially measure progress through the objective f , which should decrease. For
saddle point problems minx maxy f (x, y), we have two ways of measuring progress:

1. The Gap. Given a point (x, y), define

gap(x, y) = max
0
f (x, y 0 ) − min
0
f (x0 , y) (22)
y x

5
Recall that the saddle point (x∗ , y ∗ ) satisfies
f (x∗ , y ∗ ) = min max f (x, y) = max min f (x, y) (23)
x y y x

so gap(x∗ , y ∗ ) = 0. For arbitrary (x, y), gap(x, y) ≥ 0. Most of the convergence theory
of gradient descent methods for convex-concave problems relies on showing that the gap is
small after optimization.
2. Gradient Norm. Another measure of convergence is given by the norm of the gradient with
respect to x and y: k∇x f (x, y)k2 + k∇y f (x, y)k2 . The most effective algorithms for solving
constrained convex optimization problems are primal-dual methods, which perform Newton
steps on the Lagrangian. These methods typically perform line searches on this gradient
norm, called the primal-dual residual [BV04].
There are several different techniques used to prove and analyze the convergence of gradient
descent in convex-concave problems. Convergence for saddle point problems is less intuitively
clear than for convex minimization problems, so these proof techniques might provide some helpful
intuitions.
 
x
1. Show that gradient norm is reduced along the step direction. Let z = , and we’ll write
y
f (z) to mean f (x, y). Consider taking a small step z → z + a. Taking a first-order Taylor
expansion
f 0 (z + a) = f 0 (z) + f 00 (z)a + O(kak2 ) (24)
(a) Second order methods compute the step that solves f 00 (z)a = −f 0 (z), e.g., see the
discussion on the infeasible-start Newton method in [BV04]. They perform a line search
in this direction, which is guaranteed to reduce the gradient norm.
(b) For large-scale applications, we’re more interested
 ∂ in first-order
 methods, which take a
f (x, y)
step in the gradient direction. Let a = −α ∂x∂ = −αSf 0 (z) where we define
− ∂y f (x, y)
 
I 0
S= . Substituting back into Equation (24),
0 −I
f 0 (z + a) = f 0 (z) − αf 00 (z)Sf 0 (z) = (I − αHS)f 0 (z) (25)
where H = f 00 (z) is the Hessian of f . If we require that f is strongly convex wrt x and
strongly concave wrt y, the HS is positive definite, and for small α, k(I −αHS)f 0 (z)k <
kf 0 (z)k, so the norm of the gradient strictly decreases.
2. Online gradient descent / online mirror descent. The standard analysis from online learning
mostly carries through. See [Bub].
3. Standard subgradient descent convergence analysis. [NO09] provide a convergence analysis
that looks like the standard convergence proofs for subgradient descent, which is also quite
similar to the online learning results.
4. Theory of monotone operators. One can show that the subgradient update is a contraction
using a general and elegant theory of monotone operators [RB16].
Unfortunately it’s not the case that the gap monotonically decreases during gradient descent,
rather, the gradient norm decreases and the gap reduces as a result.

6
Simple Examples
The following two-dimensional problem illustrates some of the properties of the GAN problem.

min max x2 − y(x − 1) (26)


x y

Here, y corresponds to the generator, and x to the discriminator. The problem is convex-concave
and has a saddle point at (1, 2), which one can see by solving for ∇x f (x, y) = ∇y f (x, y) = 0.
For each fixed value of y, there is a unique solution for x. But for each fixed value of x 6= 1, the
objective is unbounded in y, and for x = 1, all values y achieve the optimum. The objective above
is the Lagrangian of the problem

min x2 , subject to x = 1 (27)


x

Now consider adding regularization to y:

min max x2 − y(x − 1) − y 2 (28)


x y

1 2

This problem has a different saddle point: 1−4 , 1−4 . With this objective, if we fix x and optimize
over y, there is always a unique solution—we can always recover y from x. However, if  is too
large, the saddle point goes to infinity.

5 GANs and Saddles


Recall the GAN optimization problem, and the entropy-regularized version, which we derived as
a likelihood maximization problem.
 
max min − Edata [log(σ(f (x)))] − Eq [log(1 − σ(f (x)))] [ + H(q)] (29)
q f

The training procedure for GANs is to perform gradient descent on f and q simultaneously. Thus,
the training procedure is agnostic to the ordering of the min and max. The key questions are (1)
what should this training procedure converge to, and (2) under what conditions does it actually
converge? These questions are nontrivial even when optimizing in function space, e.g., with tabular
representations of f and q.
Let’s consider the cases with and without entropy regularization.

Without Entropy Regularization


• Saddle. (f ∗ , q ∗ ) = (0, pdata ) is a saddle point, since it satisfies q ∗ ∈ argmaxq L(q, f ∗ ) and
f ∗ ∈ argminf L(q ∗ , f ).

• Discriminator on inside. If the discriminator is the inner optimization problem, i.e., if we


solve for maxq minf L(q, f ), then for every fixed generator, there’s a unique optimal solution
for the discriminator f . If the discriminator ranges over all functions, then it’s the odds ratio
f (x) = pdata (x)/(pdata (x) + q(x)).

7
• Generator on inside. If we take the generator to be the inner optimization problem, then
for a fixed discriminator, the problem L(f, q) may have multiple solutions. If f has a unique
global maximum, then argmaxq L(f, q) is a delta function at the global maximum of f . If f
is constant (as in the optimal solution), then all functions q are maximizers.

Hence, while the ordering minf maxq does yield the same optimal value as maxq minf , this
ordering has some unwholesome properties. In particular, it doesn’t satisfy the recoverability
condition—given f , we can’t recover q. (But given q, we can recover f .)

With Entropy Regularization


• Saddle. The saddle exists1 , but is different from the saddle point of the unregularized problem
and can’t be computed in closed form.

• Discriminator on inside. The optimal discriminator is the same as in the unregularized case.

• Generator on inside. maxq L(f, q) now has a unique nontrivial solution: q(x) = e−c(x) /Zc =
e− log σ(−f (x)) /Zc = σ(−f (x))Z
1
c
1
= (1−D(x))Zc
. Thus the recoverability property holds—given f ,
we can recover q, and vice versa.

Does the Saddle Point Theory Have Practical Implications for GAN-like
Problems?
• It might be possible to use an approximation of the gap as a convergence diagnostic for GANs:
perform a small number of generator-only updates and a small number of discriminator-only
updates and measure the gap.

• Since we don’t have access to the density of the generator, it’s not straightforward to ap-
proximate its entropy. However, we may be able to devise more tractable regularizers that
make the problem convex-concave, and thus make the generator recoverable in terms of the
discriminator/cost.

• Convergence guarantees only hold under restrictive assumptions, for example, that the func-
tion is convex-concave. The GAN objective is convex-concave in the functions c and q, but
not in the parameters. However, it may still be possible to obtain a convergent algorithm
when optimizing in terms of parameters. Let’s suppose we have an algorithm that is guaran-
teed to converge to the saddle point of a convex-concave problem, and it works by solving a
series a subproblems, as with proximal methods and trust region methods. Then we can try
to mimic the behavior of this algorithm by solving these subproblems in terms of parameters.
Natural gradient algorithms can be derived this way.

6 Generalizations: φ-risks and f -divergences


There are a couple of interesting generalizations of GANs that have appeared recently. [NCT16]
show how the objective can be altered to optimize various f -divergences between the generator’s
distribution and the data distribution. [HE16] uncover a related generalization—that there is a
1
Given that x is restricted to a finite space, due to the issue we discussed under “One ugly detail”.

8
connection between classification risks (φ-risks), and f -divergences—GANs naturally emerge from
using a log-loss, but other divergences arise from other losses. Both [NCT16; HE16] build on
[NWJ09] where some key mathematical ideas originated. There is a close correspondence between
the regularizer ψ(c) (in Equation (13), for example) and the resulting f -divergence / φ-risk being
minimized.

f -divergences ⇔ φ-risks ⇔ cost regularizers ψ(c)

6.1 φ-risks
Section 3 showed how the difference-of-costs objective in Equation (13) can be converted into the
GAN-like objective in Equation (19), after choosing the appropriate regularizer ψ(c). Moreover,
minimization wrt the discriminator results in the Jensen-Shannon divergence between q and the
data distribution, minq L(c, q) = DJS (pdata , q). As shown in [HE16], we can generalize this con-
struction by using different regularizers ψ(c), and we end up with different divergence measures.
Specifically, let c(x) = φ(−h(x)), where h is some function approximator with real-valued out-
put. (The analysis above used φ(z) = log σ(z) to arrive at the GAN objective). Let’s define
ψ(c) = Edata [−φ(h(x)) − φ(−h(x))].

L(c, q) = Edata [c(x)] − Eq [c(x)] + H(q) + ψ(c) (30)


= Edata [φ(−h(x))] + Eq [−φ(−h(x))] + H(q) + Edata [−φ(h(x)) − φ(−h(x))] (31)
= −Edata [φ(h(x))] − Eq [φ(−h(x))] + H(q) (32)

[NWJ09] show that when h is allowed to range over the space of all functions, then the sum of
expectations turns into an f -divergence:
 
max Edata [φ(h(x))] − Eq [φ(−h(x))] = Df (pdata , q) (33)
h

where f (u) = max(−φ(−h) − φ(h)u) (34)


h

Choosing φ(z) = log σ(z) results in the Jensen-Shannon divegence, whereas other choices give
different f -divergences; some possibilities are catalogued in [NWJ09].

6.2 f -GAN
Another approach for generalizing GANs and approximating f -divergences is in [NCT16]. That
approach is more general than the one above using φ-risks, as it allows one to approximate asym-
metric f -divergences, however, the derivation of the Jensen-Shannon divergence involves a less
natural set of choices.

7 Applications of GAN-like Methods


• Inverse reinforcement learning, as shown in [HE16]. Also [FLA16] frames their IRL
approach using an energy-based model, and [Chr16] shows the close connections to GANs,
including some interesting points about estimating the partition function using a mixture of
pdata and q, rather than q alone.

9
• Semi-supervised learning, as shown in [Sal+16] and [Che+16].

• Better unsupervised learning via lossy compression. A natural method of formulating


lossy compression in a universal way yields a minimax problem, as I described in my “Noise
Should be Free” presentation. It may be possible to develop methods for density modeling
that are better able to identify the interesting aspects of data using these ideas.

• Model-based reinforcement learning. Ask Jonathan Ho for details.

• Sample-efficient reinforcement learning. Ask Peter Chen for details.

References
[Bub] Sebastian Bubeck. ORF523 Course Notes. https://blogs.princeton.edu/imabandit/
2013/04/18/orf523-mirror-descent-part-iiii/.
[BV04] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university
press, 2004.
[Che+16] Xi Chen et al. “InfoGAN: Interpretable Representation Learning by Information Max-
imizing Generative Adversarial Nets”. In: arXiv preprint arXiv:1606.03657 (2016).
[Chr16] Paul Christiano. “Guided cost learning is generative adversarial modeling”. In: unpub-
lished tech report (2016).
[FLA16] Chelsea Finn, Sergey Levine, and Pieter Abbeel. “Guided Cost Learning: Deep In-
verse Optimal Control via Policy Optimization”. In: arXiv preprint arXiv:1603.00448
(2016).
[Goo+14] Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in Neural Information
Processing Systems. 2014, pp. 2672–2680.
[HE16] Jonathan Ho and Stefano Ermon. “Generative Adversarial Imitation Learning”. In:
arXiv preprint arXiv:1606.03476 (2016).
[KB16] Taesup Kim and Yoshua Bengio. “Deep Directed Generative Models with Energy-
Based Probability Estimation”. In: arXiv preprint arXiv:1606.03439 (2016).
[NCT16] Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. “f-GAN: Training Genera-
tive Neural Samplers using Variational Divergence Minimization”. In: arXiv preprint
arXiv:1606.00709 (2016).
[NO09] Angelia Nedić and Asuman Ozdaglar. “Subgradient methods for saddle-point prob-
lems”. In: Journal of optimization theory and applications 142.1 (2009), pp. 205–228.
[NWJ09] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. “On surrogate loss
functions and f-divergences”. In: The Annals of Statistics (2009), pp. 876–904.
[RB16] Ernest K Ryu and Stephen Boyd. “Primer on monotone operator methods”. In: Appl.
Comput. Math 15.1 (2016), pp. 3–43.
[Sal+16] Tim Salimans et al. “Improved Techniques for Training GANs”. In: arXiv preprint
arXiv:1606.03498 (2016).

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy