L4 Sampling
L4 Sampling
• Otherwise, the training data will be biased and not represent the underlying process
distribution
Sampling Distributions
Sampling of a population is done from an unknown population distribution,
The law of large numbers is a theorem that states that statistics of independent
random samples converge to the population values as more samples are used.
• Example, for a population distribution, (μ,σ), the sample mean is:
This result is reassuring that the larger the sample the more the statistic
coverages to the population parameter.
The law of large numbers is foundational to statistics
• Within each zip code, people are then sampled by income bracket strata
Simulation enables data scientists to study the behavior of stochastic processes with
complex probability distributions
• Understand processes with complex probability distributions: In these cases, simulation provides a
powerful and flexible computational technique to understand behavior
• As cheap computational power has become ubiquitous, simulation has become
a widely used technique
• Simulations compute a large number of cases, or realizations
• The computing cost of each realization must be low in any practical simulation
• Realizations are drawn from complex probability distributions of the process model
• In many cases, realizations are computed using conditional probability
distributions
• The final or posterior distribution of the process is comprised of these realizations
Representation as a Directed Acyclic Graphical Model
When creating a simulation with multiple conditionally dependent variables it is useful to draw a directed graph;
a directed acyclic graphical model or DAG
• The graph is a communications device showing which variables are independent and which are conditionally
dependent on others with the shapes used representing the type of nodes