0% found this document useful (0 votes)

202 views23 pages

Bestnormalize Package

The bestNormalize package provides several normalizing transformations and introduces a new transformation called orderNorm. The key function, bestNormalize, attempts all the transformations and selects the best one based on a goodness of fit statistic to provide the optimal normalizing transformation for a vector. It implements transformations like Box-Cox, Yeo-Johnson, Lambert W, and the new orderNorm transformation.

Uploaded by

Panxoabasolo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

202 views23 pages

Bestnormalize Package

Uploaded by

Panxoabasolo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Package ‘bestNormalize’

September 25, 2018

Type Package
Title Normalizing Transformation Functions
Version 1.3.0
Date 2018-09-25
Description Estimate a suite of normalizing transformations, including
a new adaptation of a technique based on ranks which can guarantee
normally distributed transformed data if there are no ties: ordered
quantile normalization (ORQ). ORQ normalization combines a rank-mapping
approach with a shifted logit approximation that allows
the transformation to work on data outside the original domain. It is
also able to handle new data within the original domain via linear
interpolation. The package is built to estimate the best normalizing
transformation for a vector consistently and accurately. It implements
the Box-Cox transformation, the Yeo-Johnson transformation, three types
of Lambert WxF transformations, and the ordered quantile normalization
transformation. It also estimates the normalization efficacy of other
commonly used transformations.
URL https://github.com/petersonR/bestNormalize
License GPL-3
Depends R (>= 3.1.0)
Imports LambertW, nortest, dplyr, doParallel, foreach, doRNG
Suggests knitr, rmarkdown, MASS, testthat, mgcv, parallel
VignetteBuilder knitr
LazyData true
RoxygenNote 6.1.0
Encoding UTF-8
NeedsCompilation no
Author Ryan Andrew Peterson [aut, cre]
Maintainer Ryan Andrew Peterson <ryan-peterson@uiowa.edu>
Repository CRAN
Date/Publication 2018-09-25 17:40:02 UTC

1
2 bestNormalize-package

R topics documented:
bestNormalize-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
arcsinh_x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
autotrader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
bestNormalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
binarize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
boxcox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
exp_x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
lambert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
log_x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
no_transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
orderNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
plot.bestNormalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
sqrt_x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
yeojohnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Index 23

bestNormalize-package bestNormalize: Flexibly calculate the best normalizing transformation

for a vector

Description

The bestNormalize package provides several normalizing transformations, and introduces a new
transformation based off of the order statistics, orderNorm. Perhaps the most useful function is
bestNormalize, which attempts all of these transformations and picks the best one based off of a
goodness of fit statistic.

Author(s)

Maintainer: Ryan Andrew Peterson <ryan-peterson@uiowa.edu>

arcsinh_x arcsinh(x) Transformation

Description
Perform a arcsinh(x) transformation

Usage
arcsinh_x(x, standardize = TRUE)

## S3 method for class 'arcsinh_x'

predict(object, newdata = NULL, inverse = FALSE,
...)

## S3 method for class 'arcsinh_x'

print(x, ...)

Arguments
x A vector to normalize with with x
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
object an object of class ’arcsinh_x’
newdata a vector of data to be (potentially reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Details
arcsinh_x performs an arcsinh transformation in the context of bestNormalize, such that it creates
a transformation that can be estimated and applied to new data via the predict function.
The function is explicitly: log(x + sqrt(x^2 + 1))

Value
A list of class arcsinh_x with elements

x.t transformed original data

x original data
mean mean after transformation but prior to standardization
sd sd after transformation but prior to standardization
n number of nonmissing observations
norm_stat Pearson’s P / degrees of freedom
4 autotrader

standardize was the transformation standardized

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

Examples
x <- rgamma(100, 1, 1)

arcsinh_x_obj <- arcsinh_x(x)

arcsinh_x_obj
p <- predict(arcsinh_x_obj)
x2 <- predict(arcsinh_x_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

autotrader Prices of 6,283 cars listed on Autotrader

Description
A dataset containing the prices and other attributes of over 6000 cars in the Minneapolis area.

Usage
autotrader

Format
A data frame with 6283 rows and 10 variables:
price price, in US dollars
Car_Info Raw description from website
Link hyperlink to listing (must be appended to https://www.autotrader.com/)
Make Car manufacturer
Year Year car manufactured
Location Location of listing
Radius Radius chosen for search
mileage mileage on vehicle
status used/new/certified
model make and model, separated by space

Source
https://www.autotrader.com/
bestNormalize 5

bestNormalize Calculate and perform best normalizing transformation

Description
Performs a suite of normalizing transformations, and selects the best one on the basis of the Pearson
P test statistic for normality. The transformation that has the lowest P (calculated on the transformed
data) is selected. See details for more information.

Usage
bestNormalize(x, standardize = TRUE, allow_orderNorm = TRUE,
allow_lambert_s = FALSE, allow_lambert_h = FALSE,
out_of_sample = TRUE, cluster = NULL, k = 10, r = 5,
loo = FALSE, warn = TRUE, quiet = FALSE)

## S3 method for class 'bestNormalize'

predict(object, newdata = NULL,
inverse = FALSE, ...)

## S3 method for class 'bestNormalize'

print(x, ...)

Arguments
x A vector to normalize
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal. This will not change the normality
statistic.
allow_orderNorm
set to FALSE if orderNorm should not be applied
allow_lambert_s
Set to TRUE if the lambertW of type "s" should be applied (see details)
allow_lambert_h
Set to TRUE if the lambertW of type "h" should be applied (see details)
out_of_sample if FALSE, estimates quickly in-sample performance
cluster name of cluster set using makeCluster
k number of folds
r number of repeats
loo should leave-one-out CV be used instead of repeated CV? (see details)
warn Should bestNormalize warn when a method doesn’t work?
quiet Should a progress-bar not be displayed for cross-validation progress?
object an object of class ’bestNormalize’
6 bestNormalize

newdata a vector of data to be (reverse) transformed

inverse if TRUE, performs reverse transformation
... additional arguments

Details
bestNormalize estimates the optimal normalizing transformation. This transformation can be per-
formed on new data, and inverted, via the predict function.
This function currently estimates the Yeo-Johnson transformation, the Box Cox transformation (if
the data is positive), the log_10(x+a) transformation, the square-root (x+a) transformation, and the
arcsinh transformation. a is set to max(0, -min(x) + eps) by default. If allow_orderNorm == TRUE
and if out_of_sample == FALSE then the ordered quantile normalization technique will likely be
chosen since it essentially forces the data to follow a normal distribution. More information on the
orderNorm technique can be found in the package vignette, or using ?orderNorm.
Repeated cross-validation is used by default to estimate the out-of-sample performance of each
transformation if out_of_sample = TRUE. While this can take some time, users can speed it up by
creating a cluster via the parallel package’s makeCluster function, and passing the name of this
cluster to bestNormalize via the cl argument. For best performance, we recommend the number
of clusters to be set to the number of repeats r. Care should be taken to account for the number of
observations per fold; to small a number and the estimated normality statistic could be inaccurate,
or at least suffer from high variability.
As of version 1.3, users can use leave-one-out cross-validation as well for each method by setting
loo to TRUE. This will take a lot of time for bigger vectors, but it will have the most accurate estimate
of normalization efficacy. Note that if this method is selected, arguments k, r are ignored. This
method will still work in parallel with the cl argument.
NOTE: Only the Lambert technique of type = "s" (skew) ensures that the transformation is consis-
tently 1-1, so it is the only method currently used in bestNormalize(). Use type = "h" or type =
’hh’ at risk of not having this estimate 1-1 transform. These alternative types are effective when
the data has exceptionally heavy tails, e.g. the Cauchy distribution. Additionally, as of v. 1.2.0,
Lambert of type "s" is not used by default in bestNormalize() since it uses multiple threads on
some Linux systems, which is not allowed on CRAN checks. Set allow_lambert_s = TRUE in order
to test this transformation as well. Note that the Lambert of type "h" can also be done by setting
allow_lambert_h = TRUE, however this can take significantly longer to run.

Value
A list of class bestNormalize with elements

x.t transformed original data

x original data
norm_stats Pearson’s Pearson’s P / degrees of freedom
method out-of-sample or in-sample, number of folds + repeats
chosen_transform
the chosen transformation (of appropriate class)
other_transforms
the other transformations (of appropriate class)
bestNormalize 7

oos_preds Out-of-sample predictions (if loo == TRUE) or normalization stats

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

boxcox, orderNorm, yeojohnson

Examples

x <- rgamma(100, 1, 1)

## Not run:
# With Repeated CV
BN_obj <- bestNormalize(x)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

## End(Not run)

## Not run:
# With leave-one-out CV
BN_obj <- bestNormalize(x, loo = TRUE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

## End(Not run)

# Without CV
BN_obj <- bestNormalize(x, allow_orderNorm = FALSE, out_of_sample = FALSE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
8 binarize

binarize Binarize

Description
This function will perform a binarizing transformation, which could be used as a last resort if the
data cannot be adequately normalized. This may be useful when accidentally attempting normaliza-
tion of a binary vector (which could occur if implementing bestNormalize in an automated fashion).
Note that the transformation is not one-to-one, in contrast to the other functions in this package.

Usage
binarize(x, location_measure = "median")

## S3 method for class 'binarize'

predict(object, newdata = NULL, inverse = FALSE,
...)

## S3 method for class 'binarize'

print(x, ...)

Arguments
x A vector to binarize
location_measure
which location measure should be used? can either be "median", "mean", "mode",
a number, or a function.
object an object of class ’binarize’
newdata a vector of data to be (reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Value
A list of class binarize with elements
x.t transformed original data
x original data
method location_measure used for original fitting
location estimated location_measure
n number of nonmissing observations
norm_stat Pearson’s P / degrees of freedom
The predict function with inverse = FALSE returns the numeric value (0 or 1) of the transforma-
tion on newdata (which defaults to the original data).
If inverse = TRUE, since the transform is not 1-1, it will create and return a factor that indicates
where the original data was cut.
boxcox 9

Examples
x <- rgamma(100, 1, 1)
binarize_obj <- binarize(x)
(p <- predict(binarize_obj))

predict(binarize_obj, newdata = p, inverse = TRUE)

boxcox Box-Cox Normalization

Description

Perform a Box-Cox transformation and center/scale a vector to attempt normalization

Usage

boxcox(x, standardize = TRUE, ...)

## S3 method for class 'boxcox'

predict(object, newdata = NULL, inverse = FALSE, ...)

## S3 method for class 'boxcox'

print(x, ...)

Arguments

x A vector to normalize with Box-Cox

standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
... Additional arguments that can be passed to the estimation of the lambda param-
eter (lower, upper, epsilon)
object an object of class ’boxcox’
newdata a vector of data to be (reverse) transformed
inverse if TRUE, performs reverse transformation

Details

boxcox estimates the optimal value of lambda for the Box-Cox transformation. This transformation
can be performed on new data, and inverted, via the predict function.
The function will return an error if a user attempt to transform nonpositive data.
10 exp_x

Value

A list of class boxcox with elements

x.t transformed original data

x original data
mean mean after transformation but prior to standardization
sd sd after transformation but prior to standardization
lambda estimated lambda value for skew transformation
n number of nonmissing observations
norm_stat Pearson’s P / degrees of freedom
standardize was the transformation standardized

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

References

Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. Journal of the Royal Statistical
Society B, 26, 211-252.

bc_obj <- boxcox(x)

bc_obj
p <- predict(bc_obj)
x2 <- predict(bc_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

exp_x exp(x) Transformation

Description

Perform a exp(x) transformation

exp_x 11

Usage

exp_x(x, standardize = TRUE, warn = TRUE)

## S3 method for class 'exp_x'

predict(object, newdata = NULL, inverse = FALSE, ...)

## S3 method for class 'exp_x'

print(x, ...)

Arguments

x A vector to normalize with with x

standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
warn Should a warning result from infinite values?
object an object of class ’exp_x’
newdata a vector of data to be (potentially reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Details

exp_x performs a simple exponential transformation in the context of bestNormalize, such that it
creates a transformation that can be estimated and applied to new data via the predict function.

Value

A list of class exp_x with elements

x.t transformed original data

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.
12 lambert

Examples
x <- rgamma(100, 1, 1)

exp_x_obj <- exp_x(x)

exp_x_obj
p <- predict(exp_x_obj)
x2 <- predict(exp_x_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

lambert Lambert W x F Normalization

Description

Perform Lambert’s W x F transformation and center/scale a vector to attempt normalization via the
LambertW package.

Usage

lambert(x, type = "s", standardize = TRUE, ...)

## S3 method for class 'lambert'

predict(object, newdata = NULL, inverse = FALSE, ...)

## S3 method for class 'lambert'

print(x, ...)

Arguments

x A vector to normalize with Box-Cox

type a character indicating which transformation to perform (options are "s", "h", and
"hh", see details)
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
... Additional arguments that can be passed to the LambertW::Gaussianize function
object an object of class ’lambert’
newdata a vector of data to be (reverse) transformed
inverse if TRUE, performs reverse transformation
lambert 13

Details
lambert uses the LambertW package to estimate a normalizing (or "Gaussianizing") transformation.
This transformation can be performed on new data, and inverted, via the predict function.
NOTE: The type = "s" argument is the only one that does the 1-1 transform consistently, and so
it is the only method currently used in bestNormalize(). Use type = "h" or type = ’hh’ at risk
of not having this estimate 1-1 transform. These alternative types are effective when the data has
exceptionally heavy tails, e.g. the Cauchy distribution.
Additionally, sometimes (depending on the distribution) this method will be unable to extrapolate
beyond the observed bounds. In these cases, NaN is returned.

Value
A list of class lambert with elements
x.t transformed original data
x original data
mean mean after transformation but prior to standardization
sd sd after transformation but prior to standardization
tau.mat estimated parameters of LambertW::Gaussianize
n number of nonmissing observations
norm_stat Pearson’s P / degrees of freedom
standardize was the transformation standardized
The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

References
Georg M. Goerg (2016). LambertW: An R package for Lambert W x F Random Variables. R
package version 0.6.4.
Georg M. Goerg (2011): Lambert W random variables - a new family of generalized skewed distri-
butions with applications to risk estimation. Annals of Applied Statistics 3(5). 2197-2230.
Georg M. Goerg (2014): The Lambert Way to Gaussianize heavy-tailed data with the inverse of
Tukey’s h transformation as a special case. The Scientific World Journal.

lambert_obj <- lambert(x)

lambert_obj
p <- predict(lambert_obj)
14 log_x

x2 <- predict(lambert_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

## End(Not run)

log_x Log(x + a) Transformation

Description
Perform a log_b (x+a) normalization transformation

Usage
log_x(x, a = NULL, b = 10, standardize = TRUE, eps = 0.001,
warn = TRUE)

## S3 method for class 'log_x'

predict(object, newdata = NULL, inverse = FALSE, ...)

## S3 method for class 'log_x'

print(x, ...)

Arguments
x A vector to normalize with with x
a The constant to add to x (defaults to max(0, -min(x) + eps))
b The base of the log (defaults to 10)
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
eps The allowed error in the expression for the selected a
warn Should a warning result from infinite values?
object an object of class ’log_x’
newdata a vector of data to be (potentially reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Details
log_x performs a simple log transformation in the context of bestNormalize, such that it creates a
transformation that can be estimated and applied to new data via the predict function. The param-
eter a is essentially estimated by the training set by default (estimated as the minimum possible to
some extent epsilon), while the base must be specified beforehand.
no_transform 15

Value
A list of class log_x with elements
x.t transformed original data
x original data
mean mean after transformation but prior to standardization
sd sd after transformation but prior to standardization
a estimated a value
b estimated base b value
n number of nonmissing observations
norm_stat Pearson’s P / degrees of freedom
standardize was the transformation standardized
The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

Examples
x <- rgamma(100, 1, 1)

log_x_obj <- log_x(x)

log_x_obj
p <- predict(log_x_obj)
x2 <- predict(log_x_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

no_transform Identity transformation

Description
Perform an identity transformation. Admittedly it seems odd to have a dedicated function to essen-
tially do I(x), but it makes sense to keep the same syntax as the other transformations so it plays
nicely with them. As a benefit, the bestNormalize function will also show a comparable normaliza-
tion statistic for the untransformed data.

Usage
no_transform(x, standardize = FALSE, warn = TRUE)

## S3 method for class 'no_transform'

predict(object, newdata = NULL, inverse = FALSE,
...)

## S3 method for class 'no_transform'

print(x, ...)
16 no_transform

Arguments

x A vector
standardize If TRUE, the transformed values are centered and scaled
warn Should a warning result from infinite values?
object an object of class ’no_transform’
newdata a vector of data to be (potentially reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Details

no_transform creates a identity transformation object that can be applied to new data via the
predict function.

Value

A list of class no_transform with elements

x.t transformed original data

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

Examples
x <- rgamma(100, 1, 1)

no_transform_obj <- no_transform(x)

no_transform_obj
p <- predict(no_transform_obj)
x2 <- predict(no_transform_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
orderNorm 17

orderNorm Calculate and perform Ordered Quantile normalizing transformation

Description
The Ordered Quantile (ORQ) normalization transformation, orderNorm(), is a rank-based proce-
dure by which the values of a vector are mapped to their percentile, which is then mapped to the
same percentile of the normal distribution. Without the presence of ties, this essentially guarantees
that the transformation leads to a uniform distribution.
The transformation is:

g(x) = Φ−1 ((rank(x) + .5)/(length(x) + 1))

Where Φ refers to the standard normal cdf, rank(x) refers to each observation’s rank, and length(x)
refers to the number of observations.
By itself, this method is certainly not new; the earliest mention of it that I could find is in a 1947
paper by Bartlett (see references). This formula was outlined explicitly in Van der Waerden, and
expounded upon in Beasley (2009). However there is a key difference to this version of it, as
explained below.
Using linear interpolation between these percentiles, the ORQ normalization becomes a 1-1 trans-
formation that can be applied to new data. However, outside of the observed domain of x, it is
unclear how to extrapolate the transformation. In the ORQ normalization procedure, a binomial
glm with a logit link is used on the ranks in order to extrapolate beyond the bounds of the original
domain of x. The inverse normal CDF is then applied to these extrapolated predictions in order
to extrapolate the transformation. This mitigates the influence of heavy-tailed distributions while
preserving the 1-1 nature of the transformation. The extrapolation will provide a warning unless
warn = FALSE.) However, we found that the extrapolation was able to perform very well even on
data as heavy-tailed as a Cauchy distribution (paper to be published).
This transformation can be performed on new data and inverted via the predict function.

Usage
orderNorm(x, ..., warn = TRUE)

## S3 method for class 'orderNorm'

predict(object, newdata = NULL, inverse = FALSE,
warn = TRUE, ...)

## S3 method for class 'orderNorm'

print(x, ...)

Arguments
x A vector to normalize
... additional arguments
18 orderNorm

warn transforms outside observed range or ties will yield warning

object an object of class ’orderNorm’
newdata a vector of data to be (reverse) transformed
inverse if TRUE, performs reverse transformation

Value
A list of class orderNorm with elements

x.t transformed original data

x original data
n number of nonmissing observations
ties_status indicator if ties are present
fit fit to be used for extrapolation, if needed
norm_stat Pearson’s P / degrees of freedom

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

References
Bartlett, M. S. "The Use of Transformations." Biometrics, vol. 3, no. 1, 1947, pp. 39-52. JSTOR
www.jstor.org/stable/3001536.
Van der Waerden BL. Order tests for the two-sample problem and their power. 1952;55:453-458.
Ser A.
Beasley TM, Erickson S, Allison DB. Rank-based inverse normal transformations are increasingly
used, but are they merited? Behav. Genet. 2009;39(5): 580-595. pmid:19526352

See Also
boxcox, lambert, bestNormalize, yeojohnson

Examples

x <- rgamma(100, 1, 1)

orderNorm_obj <- orderNorm(x)

orderNorm_obj
p <- predict(orderNorm_obj)
x2 <- predict(orderNorm_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
plot.bestNormalize 19

plot.bestNormalize Transformation plotting

Description
Plots transformation functions for objects produced by the bestNormalize package

Usage
## S3 method for class 'bestNormalize'
plot(x, inverse = FALSE, bounds = NULL,
cols = c("green3", 1, 2, 4:6), methods = c("boxcox", "yeojohnson",
"orderNorm", "lambert_s", "lambert_h"), leg_loc = "top", ...)

## S3 method for class 'orderNorm'

plot(x, inverse = FALSE, bounds = NULL, ...)

## S3 method for class 'boxcox'

plot(x, inverse = FALSE, bounds = NULL, ...)

## S3 method for class 'yeojohnson'

plot(x, inverse = FALSE, bounds = NULL, ...)

## S3 method for class 'lambert'

plot(x, inverse = FALSE, bounds = NULL, ...)

Arguments
x a fitted transformation
inverse if TRUE, plots the inverse transformation
bounds a vector of bounds to plot for the transformation
cols a vector of colors to use for the transforms (see details)
methods a vector of transformations to plot
leg_loc the location of the legend on the plot
... further parameters to be passed to plot and lines

Details
The plots produced by the individual transformations are simply plots of the original values by the
newly transformed values, with a line denoting where transformations would take place for new
data.
For the bestNormalize object, this plots each of the possible transformations run by the original
call to bestNormalize. The first argument in the "cols" parameter refers to the color of the chosen
transformation.
20 sqrt_x

sqrt_x sqrt(x + a) Normalization

Description
Perform a sqrt (x+a) normalization transformation

Usage
sqrt_x(x, a = NULL, standardize = TRUE, eps = 0.001)

## S3 method for class 'sqrt_x'

predict(object, newdata = NULL, inverse = FALSE, ...)

## S3 method for class 'sqrt_x'

print(x, ...)

Arguments
x A vector to normalize with with x
a The constant to add to x (defaults to max(0, -min(x) + eps))
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
eps The allowed error in the expression for the selected a
object an object of class ’sqrt_x’
newdata a vector of data to be (potentially reverse) transformed
inverse if TRUE, performs reverse transformation
... additional arguments

Details
sqrt_x performs a simple square-root transformation in the context of bestNormalize, such that it
creates a transformation that can be estimated and applied to new data via the predict function.
The parameter a is essentially estimated by the training set by default (estimated as the minimum
possible to some extent epsilon), while the base must be specified beforehand.

Value
A list of class sqrt_x with elements

x.t transformed original data

x original data
mean mean after transformation but prior to standardization
sd sd after transformation but prior to standardization
yeojohnson 21

n number of nonmissing observations

norm_stat Pearson’s P / degrees of freedom
standardize was the transformation standardized
The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

Examples
x <- rgamma(100, 1, 1)

sqrt_x_obj <- sqrt_x(x)

sqrt_x_obj
p <- predict(sqrt_x_obj)
x2 <- predict(sqrt_x_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)

yeojohnson Yeo-Johnson Normalization

Description
Perform a Yeo-Johnson Transformation and center/scale a vector to attempt normalization

Usage
yeojohnson(x, eps = 0.001, standardize = TRUE, ...)

## S3 method for class 'yeojohnson'

predict(object, newdata = NULL, inverse = FALSE,
...)

## S3 method for class 'yeojohnson'

print(x, ...)

Arguments
x A vector to normalize with Yeo-Johnson
eps A value to compare lambda against to see if it is equal to zero
standardize If TRUE, the transformed values are also centered and scaled, such that the
transformation attempts a standard normal
... Additional arguments that can be passed to the estimation of the lambda param-
eter (lower, upper)
object an object of class ’yeojohnson’
newdata a vector of data to be (reverse) transformed
inverse if TRUE, performs reverse transformation
22 yeojohnson

Details
yeojohnson estimates the optimal value of lambda for the Yeo-Johnson transformation. This trans-
formation can be performed on new data, and inverted, via the predict function.
The Yeo-Johnson is similar to the Box-Cox method, however it allows for the transformation of
nonpositive data as well. The step_YeoJohnson function in the recipes package is another useful
resource (see references).

Value
A list of class yeojohnson with elements

x.t transformed original data

The predict function returns the numeric value of the transformation performed on new data, and
allows for the inverse transformation as well.

References
Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve normality
or symmetry. Biometrika.
Max Kuhn and Hadley Wickham (2017). recipes: Preprocessing Tools to Create Design Matrices.
R package version 0.1.0.9000. https://github.com/topepo/recipes

Examples

x <- rgamma(100, 1, 1)

yeojohnson_obj <- yeojohnson(x)

yeojohnson_obj
p <- predict(yeojohnson_obj)
x2 <- predict(yeojohnson_obj, newdata = p, inverse = TRUE)

all.equal(x2, x)
Index

∗Topic datasets print.binarize (binarize), 8

autotrader, 4 print.boxcox (boxcox), 9
_PACKAGE (bestNormalize-package), 2 print.exp_x (exp_x), 10
print.lambert (lambert), 12
arcsinh_x, 3 print.log_x (log_x), 14
autotrader, 4 print.no_transform (no_transform), 15
print.orderNorm (orderNorm), 17
bestNormalize, 5, 18 print.sqrt_x (sqrt_x), 20
bestNormalize-package, 2 print.yeojohnson (yeojohnson), 21
binarize, 8
boxcox, 7, 9, 10, 18 sqrt_x, 20

exp_x, 10 yeojohnson, 7, 18, 21

Gaussianize, 13

lambert, 12, 18
log_x, 14

no_transform, 15

orderNorm, 7, 17

plot.bestNormalize, 19
plot.boxcox (plot.bestNormalize), 19
plot.lambert (plot.bestNormalize), 19
plot.orderNorm (plot.bestNormalize), 19
plot.yeojohnson (plot.bestNormalize), 19
predict.arcsinh_x (arcsinh_x), 3
predict.bestNormalize (bestNormalize), 5
predict.binarize (binarize), 8
predict.boxcox (boxcox), 9
predict.exp_x (exp_x), 10
predict.lambert (lambert), 12
predict.log_x (log_x), 14
predict.no_transform (no_transform), 15
predict.orderNorm (orderNorm), 17
predict.sqrt_x (sqrt_x), 20
predict.yeojohnson (yeojohnson), 21
print.arcsinh_x (arcsinh_x), 3
print.bestNormalize (bestNormalize), 5

Basic Mathematical Foundations Ai Hands
No ratings yet
Basic Mathematical Foundations Ai Hands
521 pages
EDAV
No ratings yet
EDAV
218 pages
MachineLearningPatternRecognition 18 Finalversion
No ratings yet
MachineLearningPatternRecognition 18 Finalversion
265 pages
CRD RBD Factorial FRBD Split SPD Design Analysis
100% (1)
CRD RBD Factorial FRBD Split SPD Design Analysis
151 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
BS IMI U4 Oct23 Complete
No ratings yet
BS IMI U4 Oct23 Complete
182 pages
High-Dimensional Statistics: Lecture Notes
No ratings yet
High-Dimensional Statistics: Lecture Notes
168 pages
R For Statistical Learning
No ratings yet
R For Statistical Learning
301 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
21mab303t - Unit 4 Oneway Anova 2023
No ratings yet
21mab303t - Unit 4 Oneway Anova 2023
33 pages
Wainwrightslides 2
No ratings yet
Wainwrightslides 2
77 pages
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
No ratings yet
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
321 pages
Rig Notes 17
No ratings yet
Rig Notes 17
168 pages
Wainwrightslides 1
No ratings yet
Wainwrightslides 1
67 pages
Convex Optimizatiom IP
No ratings yet
Convex Optimizatiom IP
97 pages
Statistical Analysis: Descriptive Statistics
No ratings yet
Statistical Analysis: Descriptive Statistics
59 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Box (1980) - R.A. FIsehr and The Design of Experiments, 1922-1926
No ratings yet
Box (1980) - R.A. FIsehr and The Design of Experiments, 1922-1926
8 pages
Boom
No ratings yet
Boom
63 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
2022lectures1-8 Optimization For DataScience
No ratings yet
2022lectures1-8 Optimization For DataScience
35 pages
Module 34. Analysis of Variance (ANOVA) PDF
No ratings yet
Module 34. Analysis of Variance (ANOVA) PDF
89 pages
Actuar
No ratings yet
Actuar
142 pages
178 HW 9
No ratings yet
178 HW 9
153 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
No ratings yet
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
64 pages
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
No ratings yet
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
6 pages
Project-Report Sample
No ratings yet
Project-Report Sample
59 pages
Ida PDF
No ratings yet
Ida PDF
62 pages
Random Quakckkk Ack
No ratings yet
Random Quakckkk Ack
38 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
ML Labs
No ratings yet
ML Labs
14 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Kollo T 2005 Advanced Multivariate Statistics With Matrices Livro Estatistica Matriz Estatistica Multivariada
No ratings yet
Kollo T 2005 Advanced Multivariate Statistics With Matrices Livro Estatistica Matriz Estatistica Multivariada
502 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
Correlation
No ratings yet
Correlation
26 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Vmls Python Companion
No ratings yet
Vmls Python Companion
192 pages
Package Actuar': September 19, 2011
No ratings yet
Package Actuar': September 19, 2011
91 pages
MIT18 S096F15 TenLec
No ratings yet
MIT18 S096F15 TenLec
165 pages
Leastsquares Minnorm Problems
No ratings yet
Leastsquares Minnorm Problems
6 pages
Chapter 24-Multivariate Statistical Analysis: True/False
No ratings yet
Chapter 24-Multivariate Statistical Analysis: True/False
13 pages
Annova and Chi Square
No ratings yet
Annova and Chi Square
20 pages
BVD Chapter Outlines648 PDF
No ratings yet
BVD Chapter Outlines648 PDF
46 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
RTV 4 Manual - Regu Tools
No ratings yet
RTV 4 Manual - Regu Tools
128 pages
Comparing Scale Parameters in Several Gamma Distributions With Known Shapes
No ratings yet
Comparing Scale Parameters in Several Gamma Distributions With Known Shapes
24 pages
Chapter 8 Stat
No ratings yet
Chapter 8 Stat
20 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Discipline and Academic Performance PDF
0% (1)
Discipline and Academic Performance PDF
14 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Exercise 01
No ratings yet
Exercise 01
3 pages
STAT501 Online - Spring2024 - FinalExam
No ratings yet
STAT501 Online - Spring2024 - FinalExam
14 pages
Chi-Square Test
No ratings yet
Chi-Square Test
5 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Business Statistics: Fourth Canadian Edition
No ratings yet
Business Statistics: Fourth Canadian Edition
33 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Helmert
No ratings yet
Helmert
3 pages
R Commands
No ratings yet
R Commands
5 pages
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
No ratings yet
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
6 pages
Evaluate and Quantify The Drift of A Measuring Instrument
No ratings yet
Evaluate and Quantify The Drift of A Measuring Instrument
13 pages
J. Educ. Manage. Stud., 5 (1) 92-97, 2015
No ratings yet
J. Educ. Manage. Stud., 5 (1) 92-97, 2015
6 pages
Anova and F Test
No ratings yet
Anova and F Test
32 pages
AGNES and SPECTRAL CLUSTERING IN R PDF
No ratings yet
AGNES and SPECTRAL CLUSTERING IN R PDF
1 page
RTV 4 Manual
No ratings yet
RTV 4 Manual
128 pages
Topic 6. Randomized Complete Block Design (RCBD)
No ratings yet
Topic 6. Randomized Complete Block Design (RCBD)
20 pages
Package Actuar': February 14, 2012
No ratings yet
Package Actuar': February 14, 2012
91 pages
Navan Ee Than
No ratings yet
Navan Ee Than
73 pages
WWW Social Research Methods Net KB Statdesc PHP
100% (1)
WWW Social Research Methods Net KB Statdesc PHP
87 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
Calibration Uncertainty: NS, Krogsh/'ljvej 51
No ratings yet
Calibration Uncertainty: NS, Krogsh/'ljvej 51
6 pages
Statistics Module 7
No ratings yet
Statistics Module 7
13 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Package Mvtnorm': R Topics Documented
No ratings yet
Package Mvtnorm': R Topics Documented
17 pages
Stats 1
No ratings yet
Stats 1
6 pages
Preface VII Mathematical Notation Xi Contents Xiii
No ratings yet
Preface VII Mathematical Notation Xi Contents Xiii
6 pages
SM Ch1
No ratings yet
SM Ch1
30 pages
Linear Mixed Effects Modeling in SPSS
No ratings yet
Linear Mixed Effects Modeling in SPSS
29 pages
Estimation
No ratings yet
Estimation
32 pages
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.