Title: | Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data |
---|---|
Description: | Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. The most useful function is 'Gaussianize', which works similarly to 'scale', but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x 'MyFavoriteDistribution' and use it in their analysis right away. |
Authors: | Georg M. Goerg [aut, cre] |
Maintainer: | Georg M. Goerg <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.6.9-1 |
Built: | 2024-12-26 04:28:16 UTC |
Source: | https://github.com/gmgeorg/lambertw |
F distributionsThis package is based on notation, definitions, and results of Goerg (2011, 2015, 2016). I will not include these references in the description of each single function.
Lambert W F distributions are a general framework to model and
transform skewed, heavy-tailed data. Lambert W
F random
variables (RV) are based on an input/ouput system with input RV
and output
, which is a
non-linearly transformed version of X – with similar properties to X,
but slightly skewed and/or heavy-tailed. Then Y has a 'Lambert W
' distribution - see References.
get_distnames
lists all implemented Lambert W F
distributions in this package. If you want to generate a
skewed/heavy-tailed version of a distribution that is not implemented,
you can use the do-it-yourself modular toolkit
(
create_LambertW_input
and
create_LambertW_output
). It allows users to quickly
implement their own Lambert W x 'MyFavoriteDistribution' and use it in
their analysis right away.
This package contains several functions to analyze skewed and heavy-tailed
data: simulate random samples (rLambertW
), evaluate pdf and
cdf (dLambertW
and pLambertW
), estimate
parameters (IGMM
and MLE_LambertW
), compute
quantiles (qLambertW
), and plot/print results nicely
(plot.LambertW_fit
, print.LambertW_fit
,
summary.LambertW_fit
).
Probably the most useful function is Gaussianize
, which works
similarly to scale
, but makes your data Gaussian (not
just centers and scales it, but also makes it symmetric and removes
excess kurtosis).
If you use this package in your work please cite it
(citation("LambertW")
). You can also send me an implementation of
your 'Lambert W YourFavoriteDistribution' to add to the
LambertW package (and I will reference your work introducing your
'Lambert W
YourFavoriteDistribution' here.)
Feel free to contact me for comments, suggestions, code improvements, implementation of new input distributions, bug reports, etc.
Author and maintainer: Georg M. Goerg (im (at) gmge.org)
Goerg, G.M. (2011). “Lambert W Random Variables - A New Family of Generalized Skewed Distributions with Applications to Risk Estimation”. Annals of Applied Statistics, 5 (3), 2197-2230. (https://arxiv.org/abs/0912.4554).
Goerg, G.M. (2015). “The Lambert Way to Gaussianize heavy-tailed data with the inverse of Tukey's h transformation as a special case”. The Scientific World Journal: Probability and Statistics with Applications in Finance and Economics. Available at https://www.hindawi.com/journals/tswj/2015/909231/.
Goerg, G.M. (2016). “Rebuttal of the “Letter to the Editor of Annals of Applied Statistics” on Lambert W x F distributions and the IGMM algorithm”. Available on arxiv.
## Not run: # Replicate parts of the analysis in Goerg (2011) data(AA) y <- AA[AA$sex=="f", "bmi"] test_normality(y) fit.gmm <- IGMM(y, type = "s") summary(fit.gmm) # gamma is significant and positive plot(fit.gmm) # Compare empirical to theoretical moments (given parameter estimates) moments.theory <- mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")], gamma = fit.gmm$tau["gamma"]), distname = "normal") TAB <- rbind(unlist(moments.theory), c(mean(y), sd(y), skewness(y), kurtosis(y))) rownames(TAB) <- c("Theoretical (IGMM)", "Empirical") TAB x <- get_input(y, fit.gmm$tau) test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE) summary(fit.ml) plot(fit.ml) ## End(Not run)
## Not run: # Replicate parts of the analysis in Goerg (2011) data(AA) y <- AA[AA$sex=="f", "bmi"] test_normality(y) fit.gmm <- IGMM(y, type = "s") summary(fit.gmm) # gamma is significant and positive plot(fit.gmm) # Compare empirical to theoretical moments (given parameter estimates) moments.theory <- mLambertW(theta = list(beta = fit.gmm$tau[c("mu_x", "sigma_x")], gamma = fit.gmm$tau["gamma"]), distname = "normal") TAB <- rbind(unlist(moments.theory), c(mean(y), sd(y), skewness(y), kurtosis(y))) rownames(TAB) <- c("Theoretical (IGMM)", "Empirical") TAB x <- get_input(y, fit.gmm$tau) test_normality(x) # input is normal -> fit a Lambert W x Gaussian by MLE fit.ml <- MLE_LambertW(y, type = "s", distname = "normal", hessian = TRUE) summary(fit.ml) plot(fit.ml) ## End(Not run)
Analyzes the feasibility of a Lambert W x F distribution for a given dataset based on bootstrapping. In particular it checks whether parameter estimates support the hypothesis that the data indeed follows a Lambert W x F distribution with finite mean and variance of the input distribution, which is an implicit assumption of Lambert W x F random variables in Goerg (2011).
See Goerg (2016) for an alternative definition that does not rely on fnite
second order moments (set use.mean.variance = FALSE
to use that type
of Lambert W F distributions).
analyze_convergence( LambertW_fit, sample.sizes = round(seq(0.2, 1, length = 5) * length(LambertW_fit$data)), ... ) ## S3 method for class 'convergence_LambertW_fit' summary(object, type = c("basic", "norm", "perc", "bca"), ...) ## S3 method for class 'convergence_LambertW_fit' plot(x, ...)
analyze_convergence( LambertW_fit, sample.sizes = round(seq(0.2, 1, length = 5) * length(LambertW_fit$data)), ... ) ## S3 method for class 'convergence_LambertW_fit' summary(object, type = c("basic", "norm", "perc", "bca"), ...) ## S3 method for class 'convergence_LambertW_fit' plot(x, ...)
LambertW_fit , object , x
|
an object of class |
sample.sizes |
sample sizes for several steps of the convergence
analysis. By default, one of them equals the length of the original
data, which leads to improved plots (see
|
... |
additional arguments passed to |
type |
type of confidence interval from bootstrap estimates. Passes this
argument along to |
Stehlik and Hermann (2015) show that when researchers use the IGMM algorithm outlined in Goerg (2011) erroneously on data that does not have finite input variance (and hence mean), the algorithm estimates do not converge.
In practice, researchers should of course first check if a given model is appropriate for their data-generating process. Since original Lambert W x F distributions assume that mean and variance are finite, it is not a given that for a specific dataset the Lambert W x F setting makes sense.
The bootstrap analysis reverses Stehlik and Hermann's argument and checks
whether the IGMM estimates
converge for increasing (bootstrapped) sample size
: if they do,
then modeling the data with a Lambert W x F distribution is appropriate;
if estimates do not converge, then this indicates that the input data is
too heavy tailed for a classic skewed location-scale Lambert W x F
framework. In this case, take a look at (double-)heavy tailed Lambert W x
F distributions (
type = 'hh'
) or unrestricted location-scale
Lambert W x F distributions (use.mean.variance = FALSE
). For
details see Goerg (2016).
Stehlik and Hermann (2015). “Letter to the Editor”. Ann. Appl. Stat. 9 2051. doi:10.1214/15-AOAS864 – https://projecteuclid.org/euclid.aoas/1453994190
## Not run: sim.data <- list("Lambert W x Gaussian" = rLambertW(n = 100, distname = "normal", theta = list(gamma = 0.1, beta = c(1, 2))), "Cauchy" = rcauchy(n = 100)) # do not use lapply() as it does not work well with match.call() in # bootstrap() igmm.ests <- list() conv.analyses <- list() for (nn in names(sim.data)) { igmm.ests[[nn]] <- IGMM(sim.data[[nn]], type = "s") conv.analyses[[nn]] <- analyze_convergence(igmm.ests[[nn]]) } plot.lists <- lapply(conv.analyses, plot) for (nn in names(plot.lists)) { plot.lists[[nn]] <- lapply(plot.lists[[nn]], "+", ggtitle(nn)) } require(gridExtra) for (jj in seq_along(plot.lists[[1]])) { grid.arrange(plot.lists[[1]][[jj]], plot.lists[[2]][[jj]], ncol = 2) } ## End(Not run)
## Not run: sim.data <- list("Lambert W x Gaussian" = rLambertW(n = 100, distname = "normal", theta = list(gamma = 0.1, beta = c(1, 2))), "Cauchy" = rcauchy(n = 100)) # do not use lapply() as it does not work well with match.call() in # bootstrap() igmm.ests <- list() conv.analyses <- list() for (nn in names(sim.data)) { igmm.ests[[nn]] <- IGMM(sim.data[[nn]], type = "s") conv.analyses[[nn]] <- analyze_convergence(igmm.ests[[nn]]) } plot.lists <- lapply(conv.analyses, plot) for (nn in names(plot.lists)) { plot.lists[[nn]] <- lapply(plot.lists[[nn]], "+", ggtitle(nn)) } require(gridExtra) for (jj in seq_along(plot.lists[[1]])) { grid.arrange(plot.lists[[1]][[jj]], plot.lists[[2]][[jj]], ncol = 2) } ## End(Not run)
The parameter specifies the input distribution
.
beta2tau
converts to the transformation vector
, which
defines the Lambert W
F random variable mapping from
to
(see
tau-utils
). Parameters and
of
in general depend on
(and may not even exist for
use.mean.variance = TRUE
; in this case
beta2tau
will throw an error).
check_beta
checks if defines a
valid distribution, e.g., for normal distribution
'sigma'
must be
positive.
estimate_beta
estimates for a given
using MLE or methods of moments. Closed form solutions
are used if they exist; otherwise the MLE is obtained numerically using
fitdistr
.
get_beta_names
returns (typical) names for each component of
.
Depending on the distribution
has different length and names: e.g.,
for a
"normal"
distribution beta
is of length
(
"mu"
, "sigma"
); for an "exp"
onential
distribution beta
is a scalar (rate "lambda"
).
beta2tau(beta, distname, use.mean.variance = TRUE) check_beta(beta, distname) estimate_beta(x, distname) get_beta_names(distname)
beta2tau(beta, distname, use.mean.variance = TRUE) check_beta(beta, distname) estimate_beta(x, distname) get_beta_names(distname)
beta |
numeric; vector |
distname |
character; name of input distribution; see
|
use.mean.variance |
logical; if |
x |
a numeric vector of real values (the input data). |
estimate_beta
does not do any data transformation as part of the
Lambert W F input/output framework. For an initial estimate
of
for Lambert W
F distributions see
get_initial_theta
and get_initial_tau
.
A quick initial estimate of is obtained by first finding the
(approximate) input
by
IGMM
, and then getting the MLE of
for this input data
(usually using
fitdistr
).
beta2tau
returns a numeric vector, which is implied by
beta
and distname
.
check_beta
throws an error if is not
appropriate for the given distribution; e.g., if it has too many values
or if they are not within proper bounds (e.g.,
beta['sigma']
of a
"normal"
distribution must be positive).
estimate_beta
returns a named vector with estimates for
given
x
.
get_beta_names
returns a vector of characters.
# By default: delta = gamma = 0 and alpha = 1 beta2tau(c(1, 1), distname = "normal") ## Not run: beta2tau(c(1, 4, 1), distname = "t") ## End(Not run) beta2tau(c(1, 4, 1), distname = "t", use.mean.variance = FALSE) beta2tau(c(1, 4, 3), distname = "t") # no problem ## Not run: check_beta(beta = c(1, 1, -1), distname = "normal") ## End(Not run) set.seed(124) xx <- rnorm(100)^2 estimate_beta(xx, "exp") estimate_beta(xx, "chisq")
# By default: delta = gamma = 0 and alpha = 1 beta2tau(c(1, 1), distname = "normal") ## Not run: beta2tau(c(1, 4, 1), distname = "t") ## End(Not run) beta2tau(c(1, 4, 1), distname = "t", use.mean.variance = FALSE) beta2tau(c(1, 4, 3), distname = "t") # no problem ## Not run: check_beta(beta = c(1, 1, -1), distname = "normal") ## End(Not run) set.seed(124) xx <- rnorm(100)^2 estimate_beta(xx, "exp") estimate_beta(xx, "chisq")
Analyzes the Lambert W x F for a given dataset based on bootstrapping. Depends
on the boot package and returns a "boot"
object.
bootstrap(object, ...) ## S3 method for class 'LambertW_fit' bootstrap(object, sample.size = length(object$data), R = 100, ...)
bootstrap(object, ...) ## S3 method for class 'LambertW_fit' bootstrap(object, sample.size = length(object$data), R = 100, ...)
object |
an object of class |
... |
additional arguments passed to |
sample.size |
sample size of the bootstrap. By default, equal to the original data length. |
R |
number of replicates for the bootstrap. See
|
An object of class "boot"
representing the bootstrap
analysis of (or
) of
an Lambert W x F estimator (
LambertW_fit
).
## Not run: yy <- rLambertW(n = 1000, theta = list(delta = c(0.1), beta = c(2, 1)), distname = "normal") mod.igmm <- IGMM(yy, type = "h") boot.est <- bootstrap(mod.igmm, R = 100) # use summary and plot from 'boot' pkg plot(boot.est, 3) summary(boot.est) ## End(Not run)
## Not run: yy <- rLambertW(n = 1000, theta = list(delta = c(0.1), beta = c(2, 1)), distname = "normal") mod.igmm <- IGMM(yy, type = "h") boot.est <- bootstrap(mod.igmm, R = 100) # use summary and plot from 'boot' pkg plot(boot.est, 3) summary(boot.est) ## End(Not run)
Reference list of most common function arguments in this package.
y |
a numeric vector of real values (the observed data). |
distname |
character; name of input distribution; see
|
type |
type of Lambert W |
theta |
list; a (possibly incomplete) list of parameters |
beta |
numeric vector (deprecated); parameter |
gamma |
scalar (deprecated); skewness parameter; default: |
delta |
scalar or vector (length 2) (deprecated); heavy-tail
parameter(s); default: |
alpha |
scalar or vector (length 2) (deprecated); heavy tail
exponent(s); default: |
tau |
named vector |
return.u |
logical; if |
use.mean.variance |
logical; if |
Collection of datasets in this package.
The Australian athletes dataset (AA
) were collected in a study of how
data on various characteristics of the blood varied with sport body size
and sex of the athlete.
The SolarFlares
data are observations of peak gamma-ray
intensity of solar flares recorded from Feb, 1980 - Dec, 1989. It was
analyzed for power-law properties in Clauset et al. (2009) and comes
originally from Dennis et al. (1991). Thanks to the authors for giving
permission to include the dataset in this package.
AA
is a data.frame
with 13 columns and 202 rows.
See ais
dataset in the DAAG package for details.
AA
was the basis for the analyses that are reported in
Telford and Cunningham (1991).
Resources on the SolarFlares
dataset can be found at:
https://sites.santafe.edu/~aaronc/powerlaws/data.htm
https://ui.adsabs.harvard.edu/abs/1991chxb.book.....D/abstract
See also References.
Telford, R.D. and Cunningham, R.B. 1991. Sex, sport and body-size dependency of hematology in highly trained athletes. Medicine and Science in Sports and Exercise 23: 788-794.
Dennis, B. R.; Orwig, L. E.; Kennard, G. S.; Labow, G. J.; Schwartz, R. A.; Shaver, A. R.; Tolbert, A. K. (1991). “The Complete Hard X Ray Burst Spectrometer Event List, 1980-1989.” NASA Technical Memorandum 4332.
Clauset, A., C. R. Shalizi, and M. E. J. Newman (2009). “Power-law distributions in empirical data”. SIAM Review 51, 661-703 (2009). See also https://sites.santafe.edu/~aaronc/powerlaws/data.htm.
Computes the input mean and standard deviation
for input
such that the resulting heavy-tail Lambert W x F RV
with
has zero-mean and unit-variance. So far works only for
Gaussian input and scalar
.
The function works for any output mean and standard deviation, but default
values are and
since they are the most
useful, e.g., to generate a standardized Lambert W white noise sequence.
delta_01(delta, mu.y = 0, sigma.y = 1, distname = "normal")
delta_01(delta, mu.y = 0, sigma.y = 1, distname = "normal")
delta |
scalar; heavy-tail parameter. |
mu.y |
output mean; default: |
sigma.y |
output standard deviation; default: |
distname |
string; distribution name. Currently this function only supports
|
5-dimensional vector (,
, 0,
, 1),
where
and
are set for the sake of compatiblity with other functions.
delta_01(0) # for delta = 0, input == output, therefore (0,1,0,0,1) # delta > 0 (heavy-tails): # Y is symmetric for all delta: # mean = 0; however, sd must be smaller delta_01(0.1) delta_01(1/3) # only moments up to order 2 exist delta_01(1) # neither mean nor variance exist, thus NA
delta_01(0) # for delta = 0, input == output, therefore (0,1,0,0,1) # delta > 0 (heavy-tails): # Y is symmetric for all delta: # mean = 0; however, sd must be smaller delta_01(0.1) delta_01(1/3) # only moments up to order 2 exist delta_01(1) # neither mean nor variance exist, thus NA
This function minimizes the Euclidean distance between the sample kurtosis of
the back-transformed data and a
user-specified target kurtosis as a function of
(see
References). Only an iterative application of this function will give a
good estimate of
(see
IGMM
).
delta_GMM( z, type = c("h", "hh"), kurtosis.x = 3, skewness.x = 0, delta.init = delta_Taylor(z), tol = .Machine$double.eps^0.25, not.negative = FALSE, optim.fct = c("nlm", "optimize"), lower = -1, upper = 3 )
delta_GMM( z, type = c("h", "hh"), kurtosis.x = 3, skewness.x = 0, delta.init = delta_Taylor(z), tol = .Machine$double.eps^0.25, not.negative = FALSE, optim.fct = c("nlm", "optimize"), lower = -1, upper = 3 )
z |
a numeric vector of data values. |
type |
type of Lambert W |
kurtosis.x |
theoretical kurtosis of the input X; default: |
skewness.x |
theoretical skewness of the input X. Only used if |
delta.init |
starting value for optimization; default: |
tol |
a positive scalar; tolerance level for terminating
the iterative algorithm; default: |
not.negative |
logical; if |
optim.fct |
which R optimization function should be used. Either |
lower , upper
|
lower and upper bound for optimization. Default: |
A list with two elements:
delta |
optimal |
iterations |
number of iterations ( |
gamma_GMM
for the skewed version of this function;
IGMM
to estimate all parameters jointly.
# very heavy-tailed (like a Cauchy) y <- rLambertW(n = 1000, theta = list(beta = c(1, 2), delta = 1), distname = "normal") delta_GMM(y) # after the first iteration
# very heavy-tailed (like a Cauchy) y <- rLambertW(n = 1000, theta = list(beta = c(1, 2), delta = 1), distname = "normal") delta_GMM(y) # after the first iteration
Computes an initial estimate of based on the Taylor
approximation of the kurtosis of Lambert W
Gaussian RVs. See
Details for the formula.
This is the initial estimate for IGMM
and delta_GMM
.
delta_Taylor(y, kurtosis.y = kurtosis(y), distname = "normal")
delta_Taylor(y, kurtosis.y = kurtosis(y), distname = "normal")
y |
a numeric vector of data values. |
kurtosis.y |
kurtosis of |
distname |
string; name of the distribution. Currently only supports |
The second order Taylor approximation of the theoretical kurtosis of a
heavy tail Lambert W x Gaussian RV around
equals
Ignoring higher order terms, using the empirical estimate on the left hand side, and
solving for yields (positive root)
where is the empirical kurtosis of
.
Since the kurtosis is finite only for ,
delta_Taylor
upper-bounds the returned estimate by .
scalar; estimated .
IGMM
to estimate all parameters jointly.
set.seed(2) # a little heavy-tailed (kurtosis does exist) y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), delta = 0.2), distname = "normal") # good initial estimate since true delta=0.2 close to 0, and # empirical kurtosis well-defined. delta_Taylor(y) delta_GMM(y) # iterative estimate y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), delta = 1), distname = "normal") # very heavy-tailed (like a Cauchy) delta_Taylor(y) # bounded by 1/4 (as otherwise kurtosis does not exist) delta_GMM(y) # iterative estimate
set.seed(2) # a little heavy-tailed (kurtosis does exist) y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), delta = 0.2), distname = "normal") # good initial estimate since true delta=0.2 close to 0, and # empirical kurtosis well-defined. delta_Taylor(y) delta_GMM(y) # iterative estimate y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), delta = 1), distname = "normal") # very heavy-tailed (like a Cauchy) delta_Taylor(y) # bounded by 1/4 (as otherwise kurtosis does not exist) delta_GMM(y) # iterative estimate
These functions have been deprecated in v0.5 of LambertW mostly for
sake of following R style guides with respect to naming of
functions. This means that all deprecated functions here have an
analogous function with a similar – more style consistent – name. See
also the NEWS
file.
As of v0.6.8-1 deprecated functions will throw errors ('stop()') and print out the suggested new function (name).
beta_names(...) bounds_theta(...) d1W_1(z, W.z = W(z, branch = -1)) p_1(...) params2theta(...) skewness_test(...) starting_theta(...) support(...) normfit(...) theta2params(...) vec.norm(...) W_1(z) W_gamma_1(z, gamma) H(...)
beta_names(...) bounds_theta(...) d1W_1(z, W.z = W(z, branch = -1)) p_1(...) params2theta(...) skewness_test(...) starting_theta(...) support(...) normfit(...) theta2params(...) vec.norm(...) W_1(z) W_gamma_1(z, gamma) H(...)
... |
arguments passed to deprecated functions. |
z , W.z
|
see |
gamma |
see |
The Lambert W F framework can take any (continuous) random variable with distribution
F and make it skewed (
type = "s"
), heavy tailed (type = "h"
),
or both (type = "hh"
).
In principle, this works for any F. Of course, this package implements only a finite
number of distributions, which can be specified with the distname
argument.
Most functions in this package, however, also allow you to pass your own distribution and parameters
and create a Lambert W F version of it.
check_distname
checks if the distribution specified by
the distname
argument is implemented in this package.
get_distname_family
determines whether a distribution is a
location, scale, or location-scale family.
It also returns whether the distribution is supported on non-negative
values only.
get_distnames
lists all currently implemented distributions .
check_distname(distname) get_distname_family(distname) get_distnames()
check_distname(distname) get_distname_family(distname) get_distnames()
distname |
character; name of input distribution; see
|
check_distname
returns (invisible) that the distribution is implemented,
or throws an error otherwise.
get_distname_family
returns a list with
location |
logical; if |
scale |
logical; if |
is.non.negative |
logical; if |
get_distnames
returns a vector of strings in alphabetical order.
It lists all supported distributions.
Each string can be passed as the distname
argument to several functions in this package.
create_LambertW_input
, create_LambertW_output
.
check_distname("normal") ## Not run: check_distname("my_great_distribution") ## End(Not run) get_distname_family("normal")
check_distname("normal") ## Not run: check_distname("my_great_distribution") ## End(Not run) get_distname_family("normal")
Heavy-tail Lambert W RV transformation: . Reduces to Tukey's h distribution
for
(
G_delta
) and Gaussian input.
G_delta_alpha(u, delta = 0, alpha = 1) G_delta(u, delta = 0) G_2delta_2alpha(u, delta = c(0, 0), alpha = c(1, 1))
G_delta_alpha(u, delta = 0, alpha = 1) G_delta(u, delta = 0) G_2delta_2alpha(u, delta = c(0, 0), alpha = c(1, 1))
u |
a numeric vector of real values. |
delta |
heavy tail parameter; default |
alpha |
exponent in |
numeric; same dimension/size as u
.
Computes the input mean and standard deviation
for input
such that the resulting skewed Lambert W x F RV
with
has zero-mean and unit-variance. So far works only for Gaussian
input and scalar
.
The function works for any output mean and standard deviation, but
and
are set as default as they
are the most useful, e.g., to generate a standardized Lambert W white noise
sequence.
gamma_01(gamma, mu.y = 0, sigma.y = 1, distname = "normal")
gamma_01(gamma, mu.y = 0, sigma.y = 1, distname = "normal")
gamma |
skewness parameter |
mu.y |
output mean; default: |
sigma.y |
output standard deviation; default: |
distname |
string; name of distribution. Currently only supports |
A 5-dimensional vector (,
,
, 0, 1),
where
and
are set for the sake of compatiblity with
other functions.
gamma_01(0) # for gamma = 0, input == output, therefore (0,1,0,0,1) # input mean must be slightly negative to get a zero-mean output gamma_01(0.1) # gamma = 0.1 means it is positively skewed gamma_01(1)
gamma_01(0) # for gamma = 0, input == output, therefore (0,1,0,0,1) # input mean must be slightly negative to get a zero-mean output gamma_01(0.1) # gamma = 0.1 means it is positively skewed gamma_01(1)
This function minimizes the Euclidean distance between the theoretical
skewness of a skewed Lambert W x Gaussian random variable and the sample
skewness of the back-transformed data as
a function of
(see References). Only an interative
application of this function will give a good estimate of
(see
IGMM
).
gamma_GMM( z, skewness.x = 0, gamma.init = gamma_Taylor(z), robust = FALSE, tol = .Machine$double.eps^0.25, not.negative = FALSE, optim.fct = c("optimize", "nlminb") )
gamma_GMM( z, skewness.x = 0, gamma.init = gamma_Taylor(z), robust = FALSE, tol = .Machine$double.eps^0.25, not.negative = FALSE, optim.fct = c("optimize", "nlminb") )
z |
a numeric vector of data values. |
skewness.x |
theoretical skewness of the input |
gamma.init |
starting value for |
robust |
logical; if |
tol |
a positive scalar; tolerance level for terminating the iterative
algorithm; default: |
not.negative |
logical; if |
optim.fct |
string; which R optimization function should be used. By
default it uses |
A list with two elements:
gamma |
scalar; optimal |
iterations |
number of iterations ( |
delta_GMM
for the heavy-tail version of this
function; medcouple_estimator
for a robust measure of asymmetry;
IGMM
for an iterative method to estimate all parameters
jointly.
# highly skewed y <- rLambertW(n = 1000, theta = list(beta = c(1, 2), gamma = 0.5), distname = "normal") gamma_GMM(y, optim.fct = "nlminb") gamma_GMM(y)
# highly skewed y <- rLambertW(n = 1000, theta = list(beta = c(1, 2), gamma = 0.5), distname = "normal") gamma_GMM(y, optim.fct = "nlminb") gamma_GMM(y)
Computes an initial estimate of based on the Taylor
approximation of the skewness of Lambert W
Gaussian RVs around
. See Details for the formula.
This is the initial estimate for IGMM
and
gamma_GMM
.
gamma_Taylor(y, skewness.y = skewness(y), skewness.x = 0, degree = 3)
gamma_Taylor(y, skewness.y = skewness(y), skewness.x = 0, degree = 3)
y |
a numeric vector of data values. |
skewness.y |
skewness of |
skewness.x |
skewness for input X; default: 0 (symmetric input). |
degree |
degree of the Taylor approximation; in Goerg (2011) it just
uses the first order approximation ( |
The first order Taylor approximation of the theoretical skewness
(not to be confused with the skewness parameter
)
of a Lambert W x Gaussian random variable around
equals
Ignoring higher order terms, using the empirical estimate on the left hand
side, and solving yields a first order Taylor approximation
estimate of
as
where is the empirical skewness of the
data
.
As the Taylor approximation is only good in a neighborhood of , the output of
gamma_Taylor
is restricted to the interval
.
The solution of the third order Taylor approximation
is also supported. See code for the solution to this third order polynomial.
Scalar; estimate of .
IGMM
to estimate all parameters jointly.
set.seed(2) # a little skewness yy <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") # Taylor estimate is good because true gamma = 0.1 close to 0 gamma_Taylor(yy) # very highly negatively skewed yy <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = -0.75), distname = "normal") # Taylor estimate is bad since gamma = -0.75 is far from 0; # and gamma = -0.5 is the lower bound by default. gamma_Taylor(yy)
set.seed(2) # a little skewness yy <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") # Taylor estimate is good because true gamma = 0.1 close to 0 gamma_Taylor(yy) # very highly negatively skewed yy <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = -0.75), distname = "normal") # Taylor estimate is bad since gamma = -0.75 is far from 0; # and gamma = -0.5 is the lower bound by default. gamma_Taylor(yy)
Gaussianize
is probably the most useful function in this package. It
works the same way as scale
, but instead of just
centering and scaling the data, it actually Gaussianizes the data
(works well for unimodal data). See Goerg (2011, 2016) and Examples.
Important: For multivariate input X
it performs a column-wise
Gaussianization (by simply calling apply(X, 2, Gaussianize)
),
which is only a marginal Gaussianization. This does not mean (and
is in general definitely not the case) that the transformed data is then
jointly Gaussian.
By default Gaussianize
returns the
input, not the zero-mean, unit-variance
input. Use
return.u = TRUE
to obtain .
Gaussianize( data = NULL, type = c("h", "hh", "s"), method = c("IGMM", "MLE"), return.tau.mat = FALSE, inverse = FALSE, tau.mat = NULL, verbose = FALSE, return.u = FALSE, input.u = NULL )
Gaussianize( data = NULL, type = c("h", "hh", "s"), method = c("IGMM", "MLE"), return.tau.mat = FALSE, inverse = FALSE, tau.mat = NULL, verbose = FALSE, return.u = FALSE, input.u = NULL )
data |
a numeric matrix-like object; either the data that should be
Gaussianized; or the data that should ”DeGaussianized” ( |
type |
what type of non-normality: symmetric heavy-tails |
method |
what estimator should be used: |
return.tau.mat |
logical; if |
inverse |
logical; if |
tau.mat |
instead of estimating |
verbose |
logical; if |
return.u |
logical; if |
input.u |
optional; if you used |
numeric matrix-like object with same dimension/size as input data
.
If inverse = FALSE
it is the Gaussianize matrix / vector;
if TRUE
it is the “DeGaussianized” matrix / vector.
The numeric parameters of mean, scale, and skewness/heavy-tail parameters
that were used in the Gaussianizing transformation are returned as
attributes of the output matrix: 'Gaussianized:mu'
,
'Gaussianized:sigma'
, and for
type = "h": |
|
type = "hh": |
|
type = "s": |
|
They can also be returned as a separate matrix using return.tau.mat =
TRUE
. In this case Gaussianize
returns a list with elements:
input |
Gaussianized input data |
tau.mat |
matrix
with |
# Univariate example set.seed(20) y1 <- rcauchy(n = 100) out <- Gaussianize(y1, return.tau.mat = TRUE) x1 <- get_input(y1, c(out$tau.mat[, 1])) # same as out$input test_normality(out$input) # Gaussianized a Cauchy! kStartFrom <- 20 y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)] x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)] plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, main="CLT in practice", xlab = "n", ylab="Cumulative sample average", ylim = range(y.cum.avg, x.cum.avg)) lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2) abline(h = 0) grid() legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), box.lty = 0, lwd = 2, lty = 1) plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output") grid() ## Not run: # multivariate example y2 <- 0.5 * y1 + rnorm(length(y1)) YY <- cbind(y1, y2) plot(YY) XX <- Gaussianize(YY, type = "hh") plot(XX) out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, verbose = TRUE, method = "IGMM") plot(out$input) out$tau.mat YY.hat <- Gaussianize(data = out$input, tau.mat = out$tau.mat, inverse = TRUE) plot(YY.hat[, 1], YY[, 1]) ## End(Not run)
# Univariate example set.seed(20) y1 <- rcauchy(n = 100) out <- Gaussianize(y1, return.tau.mat = TRUE) x1 <- get_input(y1, c(out$tau.mat[, 1])) # same as out$input test_normality(out$input) # Gaussianized a Cauchy! kStartFrom <- 20 y.cum.avg <- (cumsum(y1)/seq_along(y1))[-seq_len(kStartFrom)] x.cum.avg <- (cumsum(x1)/seq_along(x1))[-seq_len(kStartFrom)] plot(c((kStartFrom + 1): length(y1)), y.cum.avg, type="l" , lwd = 2, main="CLT in practice", xlab = "n", ylab="Cumulative sample average", ylim = range(y.cum.avg, x.cum.avg)) lines(c((kStartFrom+1): length(y1)), x.cum.avg, col=2, lwd=2) abline(h = 0) grid() legend("bottomright", c("Cauchy", "Gaussianize"), col = c(1, 2), box.lty = 0, lwd = 2, lty = 1) plot(x1, y1, xlab="Gaussian-like input", ylab = "Cauchy - output") grid() ## Not run: # multivariate example y2 <- 0.5 * y1 + rnorm(length(y1)) YY <- cbind(y1, y2) plot(YY) XX <- Gaussianize(YY, type = "hh") plot(XX) out <- Gaussianize(YY, type = "h", return.tau.mat = TRUE, verbose = TRUE, method = "IGMM") plot(out$input) out$tau.mat YY.hat <- Gaussianize(data = out$input, tau.mat = out$tau.mat, inverse = TRUE) plot(YY.hat[, 1], YY[, 1]) ## End(Not run)
get_gamma_bounds
returns lower and upper bounds for , so
that the observed data range falls within the theoretical bounds of the
support of the distribution. This is only important for location family
input.
get_gamma_bounds(y, tau)
get_gamma_bounds(y, tau)
y |
a numeric vector of real values (the observed data). |
tau |
named vector |
Skewed Lambert W F distributions have
parameter-dependent support for location family input. Thus the
parameter
must be bounded such that the observed data is
within the theoretical support of the distribution. This theoretical
bounds are determined by the Lambert W function (
W
), which
has only real-valued solutions for . Thus,
W_gamma
has real-valued solutions only for These lower and upper bounds are determined by minimum
and maxiumum of the normalized data
.
get_gamma_bounds
returns a vector of length 2 with
"lower"
and "upper"
bounds of given the range
of
y
.
get_input
back-transforms the observed data to the
(approximate) input data
using the
transformation vector
.
Note that get.input
should be deprecated; however, since it was
explicitly referenced in Goerg (2011) I keep it here for future
reference. New code should use get_input
exclusively.
get_input(y, tau, return.u = FALSE) get.input(...)
get_input(y, tau, return.u = FALSE) get.input(...)
y |
a numeric vector of data values or an object of class
|
tau |
named vector |
return.u |
should the normalized input be returned; default:
|
... |
arguments passed to |
The (approximated) input data vector .
For gamma != 0
it uses the principal branch solution
W_gamma(z, branch = 0)
to get a unique input.
For gamma = 0
the back-transformation is bijective
(for any ).
If return.u = TRUE
, then it returns a list with 2 vectors
u |
centered and normalized input |
x |
input data |
set.seed(12) # unskew very skewed data y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = 0.3), distname = "normal") test_normality(y) fit.gmm <- IGMM(y, type="s") x <- get_input(y, fit.gmm$tau) # the same as x <- get_input(fit.gmm) test_normality(x) # symmetric Gaussian
set.seed(12) # unskew very skewed data y <- rLambertW(n = 1000, theta = list(beta = c(0, 1), gamma = 0.3), distname = "normal") test_normality(y) fit.gmm <- IGMM(y, type="s") x <- get_input(y, fit.gmm$tau) # the same as x <- get_input(fit.gmm) test_normality(x) # symmetric Gaussian
get_output
transforms the input to the observed
data
given the transformation vector
.
This is the inverse of get_input
.
get_output(x, tau, return.z = FALSE)
get_output(x, tau, return.z = FALSE)
x |
a numeric vector of data values. |
tau |
named vector |
return.z |
should the shifted and scaled output also be returned?
Default: |
A numeric object of same size/dimension as input x
.
If return.z = TRUE
, then it returns a list with 2 vectors
z |
shifted and scaled input |
y |
transformed output data |
get_input
; Gaussianize
with argument inverse = TRUE
.
If the input has support on the entire real line
, then the skewed Lambert W
F
distribution has truncated support
,
depending on
and (the sign of)
.
For scale-families no truncation occurs.
get_support(tau, is.non.negative = FALSE, input.bounds = c(-Inf, Inf))
get_support(tau, is.non.negative = FALSE, input.bounds = c(-Inf, Inf))
tau |
named vector |
is.non.negative |
logical; by default it is set to |
input.bounds |
interval; the bounds of the input distribution. If
|
Half-open interval on the real line (if ) for
input with support on the entire real line. For
the
support of Y is the same as for X. Heavy-tail Lambert W RVs are not
affected by truncated support (for
); thus support is
c(lower = -Inf, upper = Inf)
.
A vector of length 2 with names 'lower'
and 'upper'
.
get_support(c(mu_x = 0, sigma_x = 1, gamma = 0)) # as gamma = 0 # truncated on the left since gamma > 0 get_support(c(mu_x = 0, sigma_x = 1, gamma = 0.1)) # no truncation for heavy tail(s) get_support(c(mu_x = 0, sigma_x = 1, delta = 0.1))
get_support(c(mu_x = 0, sigma_x = 1, gamma = 0)) # as gamma = 0 # truncated on the left since gamma > 0 get_support(c(mu_x = 0, sigma_x = 1, gamma = 0.1)) # no truncation for heavy tail(s) get_support(c(mu_x = 0, sigma_x = 1, delta = 0.1))
Skewed Lambert W F RV transformation:
.
H_gamma(u, gamma = 0)
H_gamma(u, gamma = 0)
u |
a numeric vector of real values. |
gamma |
skewness parameter; default |
numeric; same dimension/size as u
An iterative method of moments estimator to find this for
type = 's'
( for
type = 'h'
or for
type = "hh"
) which minimizes the distance between
the sample and theoretical skewness (or kurtosis) of
and X.
This algorithm is only well-defined for data with finite mean and variance
input X. See analyze_convergence
and references therein
for details.
IGMM( y, type = c("h", "hh", "s"), skewness.x = 0, kurtosis.x = 3, tau.init = get_initial_tau(y, type), robust = FALSE, tol = .Machine$double.eps^0.25, location.family = TRUE, not.negative = NULL, max.iter = 100, delta.lower = -1, delta.upper = 3 )
IGMM( y, type = c("h", "hh", "s"), skewness.x = 0, kurtosis.x = 3, tau.init = get_initial_tau(y, type), robust = FALSE, tol = .Machine$double.eps^0.25, location.family = TRUE, not.negative = NULL, max.iter = 100, delta.lower = -1, delta.upper = 3 )
y |
a numeric vector of real values. |
type |
type of Lambert W |
skewness.x |
theoretical skewness of input X; default |
kurtosis.x |
theoretical kurtosis of input X; default |
tau.init |
starting values for IGMM algorithm; default:
|
robust |
logical; only used for |
tol |
a positive scalar specifiying the tolerance level for terminating
the iterative algorithm. Default: |
location.family |
logical; tell the algorithm whether the underlying
input should have a location family distribution (for example, Gaussian
input); default: |
not.negative |
logical; if |
max.iter |
maximum number of iterations; default: |
delta.lower , delta.upper
|
lower and upper bound for
|
For algorithm details see the References.
A list of class LambertW_fit
:
tol |
see Arguments |
data |
data |
n |
number of observations |
type |
see Arguments |
tau.init |
starting values for |
tau |
IGMM estimate for |
tau.trace |
entire iteration trace of |
sub.iterations |
number of iterations only performed in GMM algorithm to find optimal |
iterations |
number of iterations to update |
hessian |
Hessian matrix (obtained from simulations; see References) |
call |
function call |
skewness.x , kurtosis.x
|
see Arguments |
distname |
a character string describing distribution characteristics given
the target theoretical skewness/kurtosis for the input. Same information as |
location.family |
see Arguments |
message |
message from the optimization method. What kind of convergence? |
method |
estimation method; here: |
Georg M. Goerg
delta_GMM
, gamma_GMM
, analyze_convergence
# estimate tau for the skewed version of a Normal y <- rLambertW(n = 100, theta = list(beta = c(2, 1), gamma = 0.2), distname = "normal") fity <- IGMM(y, type = "s") fity summary(fity) plot(fity) ## Not run: # estimate tau for the skewed version of an exponential y <- rLambertW(n = 100, theta = list(beta = 1, gamma = 0.5), distname = "exp") fity <- IGMM(y, type = "s", skewness.x = 2, location.family = FALSE) fity summary(fity) plot(fity) # estimate theta for the heavy-tailed version of a Normal = Tukey's h y <- rLambertW(n = 100, theta = list(beta = c(2, 1), delta = 0.2), distname = "normal") system.time( fity <- IGMM(y, type = "h") ) fity summary(fity) plot(fity) ## End(Not run)
# estimate tau for the skewed version of a Normal y <- rLambertW(n = 100, theta = list(beta = c(2, 1), gamma = 0.2), distname = "normal") fity <- IGMM(y, type = "s") fity summary(fity) plot(fity) ## Not run: # estimate tau for the skewed version of an exponential y <- rLambertW(n = 100, theta = list(beta = 1, gamma = 0.5), distname = "exp") fity <- IGMM(y, type = "s", skewness.x = 2, location.family = FALSE) fity summary(fity) plot(fity) # estimate theta for the heavy-tailed version of a Normal = Tukey's h y <- rLambertW(n = 100, theta = list(beta = c(2, 1), delta = 0.2), distname = "normal") system.time( fity <- IGMM(y, type = "h") ) fity summary(fity) plot(fity) ## End(Not run)
Performs a two-sided KS test for with
,
scale
, and degrees of freedom
. If parameters are not
specified, the MLE given the data will be used (see
fitdistr
).
For estimated parameters of the t-distribution the p-values are incorrect and
should be adjusted. See ks.test
and the references
therein (Durbin (1973)). As a more practical approach consider
bootstrapping and estimating the p-value empirically.
ks_test_t(x, param = NULL)
ks_test_t(x, param = NULL)
x |
a numeric vector of data values. |
param |
3-dimensional named vector |
A list of class "htest"
containing:
statistic |
the value of the Kolomogorv-Smirnov statistic. |
p.value |
the p-value for the test. |
alternative |
a character string describing the alternative hypothesis. |
method |
the character string "One-sample Kolmogorov-Smirnov test student-t" plus rounded parameter values. |
data.name |
a character string giving the name(s) of the data. |
set.seed(1021) beta.true <- c(location = 0, scale = 1, df = 4) xx <- rt(n = 1000, df = beta.true['df']) ks_test_t(xx) ks_test_t(xx, beta.true)
set.seed(1021) beta.true <- c(location = 0, scale = 1, df = 4) xx <- rt(n = 1000, df = beta.true['df']) ks_test_t(xx) ks_test_t(xx, beta.true)
kurtosis
estimates the fourth central, normalized moment from data.
skewness
estimates the third central, normalized moment from data.
kurtosis(x) skewness(x)
kurtosis(x) skewness(x)
x |
a numeric vector. |
Corresponding functions in the moments package.
F estimatesS3 methods (print
, plot
, summary
, etc.) for
LambertW_fit
class returned by the MLE_LambertW
or
IGMM
estimators.
plot.LambertW_fit
plots a (1) histogram, (2) empirical density of the
data y
. These are compared (3) to the theoretical and (4) Lambert W
densities.
print.LambertW_fit
prints only very basic information about
(to prevent an overload of data/information in the
console when executing an estimator).
print.summary.LambertW_fit
tries to be smart about formatting the
coefficients, standard errors, etc. and also displays "significance stars"
(like in the output of summary.lm
).
summary.LambertW_fit
computes some auxiliary results from
the estimate such as standard errors, theoretical support (only for
type="s"
), skewness tests (only for type="hh"
), etc. See
print.summary.LambertW_fit
for print out in the console.
## S3 method for class 'LambertW_fit' plot(x, xlim = NULL, show.qqplot = FALSE, ...) ## S3 method for class 'LambertW_fit' print(x, ...) ## S3 method for class 'summary.LambertW_fit' print(x, ...) ## S3 method for class 'LambertW_fit' summary(object, ...)
## S3 method for class 'LambertW_fit' plot(x, xlim = NULL, show.qqplot = FALSE, ...) ## S3 method for class 'LambertW_fit' print(x, ...) ## S3 method for class 'summary.LambertW_fit' print(x, ...) ## S3 method for class 'LambertW_fit' summary(object, ...)
x , object
|
object of class |
xlim |
lower and upper limit of x-axis for cdf and pdf plots. |
show.qqplot |
should a Lambert W |
... |
further arguments passed to or from other methods. |
summary
returns a list of class summary.LambertW_fit
containing
call |
function call |
coefmat |
matrix with 4 columns: |
distname |
see Arguments |
n |
number of observations |
data |
original data ( |
input |
back-transformed input data |
support |
support of output random variable Y |
data.range |
empirical data range |
method |
estimation method |
hessian |
Hessian at the optimum. Numerically obtained for |
p_m1 , p_m1n
|
Probability that one (or n) observation were caused by input
from the non-principal branch (see |
symmetry.p.value |
p-value from Wald test of identical left and right tail parameters (see
|
# See ?LambertW-package
# See ?LambertW-package
S3 methods for Lambert W input and output objects
(created by create_LambertW_input
and create_LambertW_output
).
plot.LambertW_input
plots the theoretical (1) pdf and (2) cdf of the
input .
plot.LambertW_output
plots the theoretical (1) pdf and (2) cdf of the
output RV Lambert W
. It overlays the plot with the pdf and cdf of the input RV
(setting
).
print.LambertW_input
prints an overview of the input object.
print.LambertW_output
prints an overview of the output object.
## S3 method for class 'LambertW_input' plot(x, xlim = NULL, ...) ## S3 method for class 'LambertW_output' plot(x, xlim = NULL, ...) ## S3 method for class 'LambertW_input' print(x, ...) ## S3 method for class 'LambertW_output' print(x, ...)
## S3 method for class 'LambertW_input' plot(x, xlim = NULL, ...) ## S3 method for class 'LambertW_output' plot(x, xlim = NULL, ...) ## S3 method for class 'LambertW_input' print(x, ...) ## S3 method for class 'LambertW_output' print(x, ...)
x |
object of class |
xlim |
lower and upper limit of x-axis for cdf and pdf plots. If |
... |
further arguments passed to or from other methods. |
# create a Normal(1, 2) input Gauss.input <- create_LambertW_input("normal", beta = c(1, 2)) plot(Gauss.input) # make it a bit heavy tailed (beta in theta comes from Gauss.input) LW.Gauss <- create_LambertW_output(LambertW.input = Gauss.input, theta = list(delta = c(0.3))) LW.Gauss # print a nice overview in the console plot(LW.Gauss) # draw random sample LW.Gauss$r(n=10) Gauss.input$r(n=10) # quantiles LW.Gauss$q(p=0.6) Gauss.input$q(p=0.6)
# create a Normal(1, 2) input Gauss.input <- create_LambertW_input("normal", beta = c(1, 2)) plot(Gauss.input) # make it a bit heavy tailed (beta in theta comes from Gauss.input) LW.Gauss <- create_LambertW_output(LambertW.input = Gauss.input, theta = list(delta = c(0.3))) LW.Gauss # print a nice overview in the console plot(LW.Gauss) # draw random sample LW.Gauss$r(n=10) Gauss.input$r(n=10) # quantiles LW.Gauss$q(p=0.6) Gauss.input$q(p=0.6)
F distributionIMPORTANT: This toolkit functionality is still under active development; function names, arguments, return values, etc. may change.
This do-it-yourself Lambert W F toolkit implements the flexible
input/output framework of Lambert W
F random variables (see
References). Using a modular approach, it allows users to create their
own Lambert W
'MyFavoriteDistribution' RVs. See Details
below.
If the distribution you inted to use is not already implemented
(get_distnames
), then you can create it:
use create_LambertW_input
with your
favorite distribution,
pass it as an input argument to create_LambertW_output
,
use Rs standard functionality for distributions
such as random number generation (rY
), pdf (dY
) and cdf
(pY
), quantile function (qY
), etc. for this newly generated
Lambert W 'MyFavoriteDistribution'.
create_LambertW_output
converts the input LambertW_input
representing random variable to the Lambert W
output.
create_LambertW_input( distname = NULL, beta, input.u = list(beta2tau = NULL, d = NULL, p = NULL, r = NULL, q = NULL, distname = "MyFavoriteDistribution", is.non.negative = FALSE) ) create_LambertW_output( LambertW.input = NULL, theta = NULL, distname = LambertW.input$distname )
create_LambertW_input( distname = NULL, beta, input.u = list(beta2tau = NULL, d = NULL, p = NULL, r = NULL, q = NULL, distname = "MyFavoriteDistribution", is.non.negative = FALSE) ) create_LambertW_output( LambertW.input = NULL, theta = NULL, distname = LambertW.input$distname )
distname |
character; name of input distribution; see
|
beta |
numeric vector (deprecated); parameter |
input.u |
optional; users can make their own 'Lambert W x F' distribution by supplying the necessary functions. See Description for details. |
LambertW.input |
an object of class |
theta |
list; a (possibly incomplete) list of parameters |
create_LambertW_output
takes an object of class
LambertW_input
and creates a class LambertW_output
for
standard distributions as well as the user-defined distribution. This
LambertW_output
represents the RV Y Lambert W
'MyFavoriteDistribution' with all its properties and R
functionality, such as random number generation (
rY
), pdf
(dY
) and cdf (pY
), etc.
create_LambertW_input
allows users to define their own Lambert
W F distribution by supplying the necessary functions about
the input random variable
and
. Here
is the zero mean and/or unit variance version of
(see References).
The argument input.u
must be a list containing all of the following:
beta2tau
R function of (beta)
: converts to
for the
user defined distribution
distname
optional; users can specify the name
of their input distribution. By default it's called "MyFavoriteDistribution"
.
The distribution name will be used in plots and summaries of the Lambert W F
input (and output) object.
is.non.negative
logical; users should specify whether the distribution is for non-negative random variables or not. This will help for plotting and theoretical quantile computation.
d
R function of (u, beta)
: probability density function (pdf) of U,
p
R function of (u, beta)
: cumulative distribution function (cdf) of U,
q
R function of (p, beta)
: quantile function of U,
r
R function (n, beta)
: random number generator for U,
create_LambertW_output
returns a list of class LambertW_output
with values that are (for the most part) functions themselves (see Examples):
d |
pdf of Y |
p |
cdf of Y, |
q |
quantile function for Y, |
r |
random number generator for Y, |
distname |
character string with the name of the new distribution. Format: "Lambert W x 'MyFavoriteDistribution'", |
beta , theta
|
see Arguments, |
distname.with.beta |
name of the new distribution
including the parameter |
Georg M. Goerg
# create a Gaussian N(1, 2) input Gauss.input <- create_LambertW_input("normal", beta = c(1, 2)) # create a heavy-tailed version of a normal # gamma = 0, alpha = 1 are set by default; beta comes from input params <- list(delta = c(0.3)) LW.Gauss <- create_LambertW_output(LambertW.input = Gauss.input, theta = params) LW.Gauss op <- par(no.readonly = TRUE) par(mfrow = c(2, 1), mar = c(3, 3, 2, 1)) curve(LW.Gauss$d(x, params), -7, 10, col = "red") # parameter will get detected automatically from the input curve(LW.Gauss$d(x), -7, 10, col = "blue") # same in blue; # compare to the input case (i.e. set delta = 0) params.0 <- params params.0$delta <- 0 # to evaluate the RV at a different parameter value, # it is necessary to pass the new parameter curve(LW.Gauss$d(x, params.0), -7, 10, add = TRUE, col = 1) #' par(op) curve(LW.Gauss$p(x, params), -7, 10, col = "red") curve(LW.Gauss$p(x, params.0), -7, 10, add = TRUE, col = 1) test_normality(LW.Gauss$r(n = 100), add.legend = FALSE) ## generate a positively skewed version of a shifted, scaled t_3 t.input <- create_LambertW_input("t", beta = c(2, 1, 3)) t.input params <- list(gamma = 0.05) # skew it LW.t <- create_LambertW_output(LambertW.input = t.input, theta = params) LW.t plot(t.input$d, -7, 11, col = 1) plot(LW.t$d, -7, 11, col = 2, add = TRUE) abline(v = t.input$beta["location"], lty = 2) # draw samples from the skewed t_3 yy <- LW.t$r(n = 100) test_normality(yy) ### create a skewed exponential distribution exp.input <- create_LambertW_input("exp", beta = 1) plot(exp.input) params <- list(gamma = 0.2) LW.exp <- create_LambertW_output(exp.input, theta = params) plot(LW.exp) # create a heavy-tail exponential distribution params <- list(delta = 0.2) LW.exp <- create_LambertW_output(exp.input, theta = params) plot(LW.exp) # create a skewed chi-square distribution with 5 df chi.input <- create_LambertW_input("chisq", beta = 5) plot(chi.input) params <- list(gamma = sqrt(2)*0.2) LW.chi <- create_LambertW_output(chi.input, theta = params) plot(LW.chi) # a demo on how a user-defined U input needs to look like user.tmp <- list(d = function(u, beta) dnorm(u), r = function(n, beta) rnorm(n), p = function(u, beta) pnorm(u), q = function(p, beta) qnorm(p), beta2tau = function(beta) { c(mu_x = beta[1], sigma_x = beta[2], gamma = 0, alpha = 1, delta = 0) }, distname = "MyNormal", is.non.negative = FALSE) my.input <- create_LambertW_input(input.u = user.tmp, beta = c(0, 1)) my.input plot(my.input)
# create a Gaussian N(1, 2) input Gauss.input <- create_LambertW_input("normal", beta = c(1, 2)) # create a heavy-tailed version of a normal # gamma = 0, alpha = 1 are set by default; beta comes from input params <- list(delta = c(0.3)) LW.Gauss <- create_LambertW_output(LambertW.input = Gauss.input, theta = params) LW.Gauss op <- par(no.readonly = TRUE) par(mfrow = c(2, 1), mar = c(3, 3, 2, 1)) curve(LW.Gauss$d(x, params), -7, 10, col = "red") # parameter will get detected automatically from the input curve(LW.Gauss$d(x), -7, 10, col = "blue") # same in blue; # compare to the input case (i.e. set delta = 0) params.0 <- params params.0$delta <- 0 # to evaluate the RV at a different parameter value, # it is necessary to pass the new parameter curve(LW.Gauss$d(x, params.0), -7, 10, add = TRUE, col = 1) #' par(op) curve(LW.Gauss$p(x, params), -7, 10, col = "red") curve(LW.Gauss$p(x, params.0), -7, 10, add = TRUE, col = 1) test_normality(LW.Gauss$r(n = 100), add.legend = FALSE) ## generate a positively skewed version of a shifted, scaled t_3 t.input <- create_LambertW_input("t", beta = c(2, 1, 3)) t.input params <- list(gamma = 0.05) # skew it LW.t <- create_LambertW_output(LambertW.input = t.input, theta = params) LW.t plot(t.input$d, -7, 11, col = 1) plot(LW.t$d, -7, 11, col = 2, add = TRUE) abline(v = t.input$beta["location"], lty = 2) # draw samples from the skewed t_3 yy <- LW.t$r(n = 100) test_normality(yy) ### create a skewed exponential distribution exp.input <- create_LambertW_input("exp", beta = 1) plot(exp.input) params <- list(gamma = 0.2) LW.exp <- create_LambertW_output(exp.input, theta = params) plot(LW.exp) # create a heavy-tail exponential distribution params <- list(delta = 0.2) LW.exp <- create_LambertW_output(exp.input, theta = params) plot(LW.exp) # create a skewed chi-square distribution with 5 df chi.input <- create_LambertW_input("chisq", beta = 5) plot(chi.input) params <- list(gamma = sqrt(2)*0.2) LW.chi <- create_LambertW_output(chi.input, theta = params) plot(LW.chi) # a demo on how a user-defined U input needs to look like user.tmp <- list(d = function(u, beta) dnorm(u), r = function(n, beta) rnorm(n), p = function(u, beta) pnorm(u), q = function(p, beta) qnorm(p), beta2tau = function(beta) { c(mu_x = beta[1], sigma_x = beta[2], gamma = 0, alpha = 1, delta = 0) }, distname = "MyNormal", is.non.negative = FALSE) my.input <- create_LambertW_input(input.u = user.tmp, beta = c(0, 1)) my.input plot(my.input)
F Random VariablesDensity, distribution, quantile function and random number generation for a
Lambert W
random
variable with parameter
.
Following the usual R dqpr
family of functions (e.g., rnorm
,
dnorm
, ...) the Lambert W F utility functions work as
expected:
dLambertW
evaluates the pdf at y
,
pLambertW
evaluates the cdf at y
, qLambertW
is the
quantile function, and rLambertW
generates random samples from a
Lambert W
distribution.
mLambertW
computes the first 4 central/standardized moments of a Lambert W
F. Works only for Gaussian distribution.
qqLambertW
computes and plots the sample quantiles of the data
y
versus the theoretical Lambert W
theoretical
quantiles given
.
dLambertW( y, distname = NULL, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, use.mean.variance = TRUE, log = FALSE ) mLambertW( theta = NULL, distname = c("normal"), beta, gamma = 0, delta = 0, alpha = 1 ) pLambertW( q, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, log = FALSE, lower.tail = FALSE, use.mean.variance = TRUE ) qLambertW( p, distname = NULL, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, is.non.negative = FALSE, use.mean.variance = TRUE ) qqLambertW( y, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, plot.it = TRUE, use.mean.variance = TRUE, ... ) rLambertW( n, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, return.x = FALSE, input.u = NULL, tau = NULL, use.mean.variance = TRUE )
dLambertW( y, distname = NULL, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, use.mean.variance = TRUE, log = FALSE ) mLambertW( theta = NULL, distname = c("normal"), beta, gamma = 0, delta = 0, alpha = 1 ) pLambertW( q, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, log = FALSE, lower.tail = FALSE, use.mean.variance = TRUE ) qLambertW( p, distname = NULL, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, input.u = NULL, tau = NULL, is.non.negative = FALSE, use.mean.variance = TRUE ) qqLambertW( y, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, plot.it = TRUE, use.mean.variance = TRUE, ... ) rLambertW( n, distname, theta = NULL, beta = NULL, gamma = 0, delta = 0, alpha = 1, return.x = FALSE, input.u = NULL, tau = NULL, use.mean.variance = TRUE )
y , q
|
vector of quantiles. |
distname |
character; name of input distribution; see
|
theta |
list; a (possibly incomplete) list of parameters |
beta |
numeric vector (deprecated); parameter |
gamma |
scalar (deprecated); skewness parameter; default: |
delta |
scalar or vector (length 2) (deprecated); heavy-tail
parameter(s); default: |
alpha |
scalar or vector (length 2) (deprecated); heavy tail
exponent(s); default: |
input.u |
users can supply their own version of U (either a vector of
simulated values or a function defining the pdf/cdf/quanitle function of
U); default: |
tau |
optional; if |
use.mean.variance |
logical; if |
log |
logical; if |
lower.tail |
logical; if |
p |
vector of probability levels |
is.non.negative |
logical; by default it is set to |
plot.it |
logical; should the result be plotted? Default: |
... |
further arguments passed to or from other methods. |
n |
number of observations |
return.x |
logical; if |
All functions here have an optional input.u
argument where users can
supply their own version corresponding to zero-mean, unit variance input
. This function usually depends on the input parameter
; e.g., users can pass their own density function
dmydist <- function(u, beta) {...}
as dLambertW(..., input.u
= dmydist)
. dLambertW
will then use this function to evaluate
the pdf of the Lambert W x 'mydist' distribution.
Important: Make sure that all input.u
in dLambertW
,
pLambertW
, ... are supplied correctly and return correct values –
there are no unit-tests or sanity checks for user-defined functions.
See the references for the analytic expressions of the pdf and cdf. For
"h"
or "hh"
types and for scale-families of type =
"s"
quantiles can be computed analytically. For location (-scale)
families of type = "s"
quantiles need to be computed numerically.
mLambertW
returns a list with the 4 theoretical
(central/standardized) moments of implied by
and
distname
(currrently, this only works for
distname = "normal"
):
mean |
mean, |
sd |
standard deviation, |
skewness |
skewness, |
kurtosis |
kurtosis (not excess kurtosis, i.e., 3 for a Gaussian). |
rLambertW
returns a vector of length n
. If return.input =
TRUE
, then it returns a list of two vectors (each of length n
):
x |
simulated input, |
y |
Lambert W random sample (transformed from |
qqLambertW
returns a list of 2 vectors (analogous to qqnorm
):
x |
theoretical quantiles (sorted), |
y |
empirical quantiles (sorted). |
############################### ######### mLambertW ########### mLambertW(theta = list(beta = c(0, 1), gamma = 0.1)) mLambertW(theta = list(beta = c(1, 1), gamma = 0.1)) # mean shifted by 1 mLambertW(theta = list(beta = c(0, 1), gamma = 0)) # N(0, 1) ############################### ######### rLambertW ########### set.seed(1) # same as rnorm(1000) x <- rLambertW(n=100, theta = list(beta=c(0, 1)), distname = "normal") skewness(x) # very small skewness medcouple_estimator(x) # also close to zero y <- rLambertW(n=100, theta = list(beta = c(1, 3), gamma = 0.1), distname = "normal") skewness(y) # high positive skewness (in theory equal to 3.70) medcouple_estimator(y) # also the robust measure gives a high value op <- par(no.readonly=TRUE) par(mfrow = c(2, 2), mar = c(2, 4, 3, 1)) plot(x) hist(x, prob=TRUE, 15) lines(density(x)) plot(y) hist(y, prob=TRUE, 15) lines(density(y)) par(op) ############################### ######### dLambertW ########### beta.s <- c(0, 1) gamma.s <- 0.1 # x11(width=10, height=5) par(mfrow = c(1, 2), mar = c(3, 3, 3, 1)) curve(dLambertW(x, theta = list(beta = beta.s, gamma = gamma.s), distname = "normal"), -3.5, 5, ylab = "", main="Density function") plot(dnorm, -3.5, 5, add = TRUE, lty = 2) legend("topright" , c("Lambert W x Gaussian" , "Gaussian"), lty = 1:2) abline(h=0) ############################### ######### pLambertW ########### curve(pLambertW(x, theta = list(beta = beta.s, gamma = gamma.s), distname = "normal"), -3.5, 3.5, ylab = "", main = "Distribution function") plot(pnorm, -3.5,3.5, add = TRUE, lty = 2) legend("topleft" , c("Lambert W x Gaussian" , "Gaussian"), lty = 1:2) par(op) ######## Animation ## Not run: gamma.v <- seq(-0.15, 0.15, length = 31) # typical, empirical range of gamma b <- get_support(gamma_01(min(gamma.v)))[2]*1.1 a <- get_support(gamma_01(max(gamma.v)))[1]*1.1 for (ii in seq_along(gamma.v)) { curve(dLambertW(x, beta = gamma_01(gamma.v[ii])[c("mu_x", "sigma_x")], gamma = gamma.v[ii], distname="normal"), a, b, ylab="", lty = 2, col = 2, lwd = 2, main = "pdf", ylim = c(0, 0.45)) plot(dnorm, a, b, add = TRUE, lty = 1, lwd = 2) legend("topright" , c("Lambert W x Gaussian" , "Gaussian"), lty = 2:1, lwd = 2, col = 2:1) abline(h=0) legend("topleft", cex = 1.3, c(as.expression(bquote(gamma == .(round(gamma.v[ii],3)))))) Sys.sleep(0.04) } ## End(Not run) ############################### ######### qLambertW ########### p.v <- c(0.01, 0.05, 0.5, 0.9, 0.95,0.99) qnorm(p.v) # same as above except for rounding errors qLambertW(p.v, theta = list(beta = c(0, 1), gamma = 0), distname = "normal") # positively skewed data -> quantiles are higher qLambertW(p.v, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") ############################### ######### qqLambertW ########## ## Not run: y <- rLambertW(n=500, distname="normal", theta = list(beta = c(0,1), gamma = 0.1)) layout(matrix(1:2, ncol = 2)) qqnorm(y) qqline(y) qqLambertW(y, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") ## End(Not run)
############################### ######### mLambertW ########### mLambertW(theta = list(beta = c(0, 1), gamma = 0.1)) mLambertW(theta = list(beta = c(1, 1), gamma = 0.1)) # mean shifted by 1 mLambertW(theta = list(beta = c(0, 1), gamma = 0)) # N(0, 1) ############################### ######### rLambertW ########### set.seed(1) # same as rnorm(1000) x <- rLambertW(n=100, theta = list(beta=c(0, 1)), distname = "normal") skewness(x) # very small skewness medcouple_estimator(x) # also close to zero y <- rLambertW(n=100, theta = list(beta = c(1, 3), gamma = 0.1), distname = "normal") skewness(y) # high positive skewness (in theory equal to 3.70) medcouple_estimator(y) # also the robust measure gives a high value op <- par(no.readonly=TRUE) par(mfrow = c(2, 2), mar = c(2, 4, 3, 1)) plot(x) hist(x, prob=TRUE, 15) lines(density(x)) plot(y) hist(y, prob=TRUE, 15) lines(density(y)) par(op) ############################### ######### dLambertW ########### beta.s <- c(0, 1) gamma.s <- 0.1 # x11(width=10, height=5) par(mfrow = c(1, 2), mar = c(3, 3, 3, 1)) curve(dLambertW(x, theta = list(beta = beta.s, gamma = gamma.s), distname = "normal"), -3.5, 5, ylab = "", main="Density function") plot(dnorm, -3.5, 5, add = TRUE, lty = 2) legend("topright" , c("Lambert W x Gaussian" , "Gaussian"), lty = 1:2) abline(h=0) ############################### ######### pLambertW ########### curve(pLambertW(x, theta = list(beta = beta.s, gamma = gamma.s), distname = "normal"), -3.5, 3.5, ylab = "", main = "Distribution function") plot(pnorm, -3.5,3.5, add = TRUE, lty = 2) legend("topleft" , c("Lambert W x Gaussian" , "Gaussian"), lty = 1:2) par(op) ######## Animation ## Not run: gamma.v <- seq(-0.15, 0.15, length = 31) # typical, empirical range of gamma b <- get_support(gamma_01(min(gamma.v)))[2]*1.1 a <- get_support(gamma_01(max(gamma.v)))[1]*1.1 for (ii in seq_along(gamma.v)) { curve(dLambertW(x, beta = gamma_01(gamma.v[ii])[c("mu_x", "sigma_x")], gamma = gamma.v[ii], distname="normal"), a, b, ylab="", lty = 2, col = 2, lwd = 2, main = "pdf", ylim = c(0, 0.45)) plot(dnorm, a, b, add = TRUE, lty = 1, lwd = 2) legend("topright" , c("Lambert W x Gaussian" , "Gaussian"), lty = 2:1, lwd = 2, col = 2:1) abline(h=0) legend("topleft", cex = 1.3, c(as.expression(bquote(gamma == .(round(gamma.v[ii],3)))))) Sys.sleep(0.04) } ## End(Not run) ############################### ######### qLambertW ########### p.v <- c(0.01, 0.05, 0.5, 0.9, 0.95,0.99) qnorm(p.v) # same as above except for rounding errors qLambertW(p.v, theta = list(beta = c(0, 1), gamma = 0), distname = "normal") # positively skewed data -> quantiles are higher qLambertW(p.v, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") ############################### ######### qqLambertW ########## ## Not run: y <- rLambertW(n=500, distname="normal", theta = list(beta = c(0,1), gamma = 0.1)) layout(matrix(1:2, ncol = 2)) qqnorm(y) qqline(y) qqLambertW(y, theta = list(beta = c(0, 1), gamma = 0.1), distname = "normal") ## End(Not run)
F RVsEvaluates the log-likelihood for given observations
y
.
loglik_LambertW
computes the log-likelihood of
for a Lambert W
F distribution given observations
y
.
loglik_input
computes the log-likelihood of various distributions for
the parameter given the data
x
. This can be
used independently of the Lambert W x F framework to compute
the log-likelihood of parameters for common distributions.
loglik_penalty
computes the penalty for transforming the
data back to the input (see Goerg 2016). This penalty is independent of
the distribution specified by distname
, but only depends on
. If
type = "s"
then the penalty term exists if the
distribution is non-negative (see get_distname_family
) and
gamma >= 0
; otherwise, it returns NA
.
loglik_LambertW( theta, y, distname, type, return.negative = FALSE, flattened.theta.names = names(theta), use.mean.variance = TRUE ) loglik_input( beta, x, distname, dX = NULL, log.dX = function(x, beta) log(dX(x, beta)) ) loglik_penalty(tau, y, type = c("h", "hh", "s"), is.non.negative = FALSE)
loglik_LambertW( theta, y, distname, type, return.negative = FALSE, flattened.theta.names = names(theta), use.mean.variance = TRUE ) loglik_input( beta, x, distname, dX = NULL, log.dX = function(x, beta) log(dX(x, beta)) ) loglik_penalty(tau, y, type = c("h", "hh", "s"), is.non.negative = FALSE)
theta |
list; a (possibly incomplete) list of parameters |
y |
a numeric vector of real values (the observed data). |
distname |
character; name of input distribution; see
|
type |
type of Lambert W |
return.negative |
logical; if |
flattened.theta.names |
vector of strings with names of flattened
|
use.mean.variance |
logical; if |
beta |
numeric vector (deprecated); parameter |
x |
a numeric vector of real values (the input data). |
dX |
optional; density function of |
log.dX |
optional; a function that returns the logarithm of the density
function of |
tau |
named vector |
is.non.negative |
logical; by default it is set to |
For heavy-tail Lambert W F distributions (
type = "h"
or
type = "hh"
) the log-likelihood decomposes into an input
log-likelihood plus a penalty term for transforming the data.
For skewed Lambert W F distributions this decomposition only
exists for non-negative input RVs (e.g.,
"exp"
onential,
"gamma"
, "f"
, ...). If negative values are possible
("normal"
, "t"
, "unif"
, "cauchy"
, ...)
then loglik_input
and loglik_penalty
return NA
, but
the value of the output log-likelihood will still be returned correctly
as loglik.LambertW
.
See Goerg (2016) for details on the decomposition of the log-likelihood into a log-likelihood on the input parameters plus a penalty term for transforming the data.
loglik_input
and loglik_penalty
return a scalar;
loglik_LambertW
returns a list with 3 values:
loglik.input |
loglikelihood of |
loglik.penalty |
penalty for transforming the data, |
loglik.LambertW |
total log-likelihood of |
set.seed(1) yy <- rLambertW(n = 1000, distname = "normal", theta = list(beta = c(0, 1), delta = 0.2)) loglik_penalty(tau = theta2tau(list(beta = c(1, 1), delta = c(0.2, 0.2)), distname = "normal"), y = yy, type = "hh") # For a type = 's' Lambert W x F distribution with location family input # such a decomposition doesn't exist; thus NA. loglik_penalty(tau = theta2tau(list(beta = c(1, 1), gamma = 0.03), distname = "normal"), is.non.negative = FALSE, y = yy, type = "s") # For scale-family input it does exist loglik_penalty(tau = theta2tau(list(beta = 1, gamma = 0.01), distname = "exp"), is.non.negative = TRUE, y = yy, type = "s") # evaluating the Gaussian log-likelihood loglik_input(beta = c(0, 1), x = yy, distname = "normal") # built-in version # or pass your own log pdf function loglik_input(beta = c(0, 1), x = yy, distname = "user", log.dX = function(xx, beta = beta) { dnorm(xx, mean = beta[1], sd = beta[2], log = TRUE) }) ## Not run: # you must specify distname = 'user'; otherwise it does not work loglik_input(beta = c(0, 1), x = yy, distname = "mydist", log.dX = function(xx, beta = beta) { dnorm(xx, mean = beta[1], sd = beta[2], log = TRUE) }) ## End(Not run) ### loglik_LambertW returns all three values loglik_LambertW(theta = list(beta = c(1, 1), delta = c(0.09, 0.07)), y = yy, type = "hh", distname ="normal") # can also take a flattend vector; must provide names though for delta loglik_LambertW(theta = flatten_theta(list(beta = c(1, 1), delta = c(delta_l = 0.09, delta_r = 0.07))), y = yy, type = "hh", distname ="normal")
set.seed(1) yy <- rLambertW(n = 1000, distname = "normal", theta = list(beta = c(0, 1), delta = 0.2)) loglik_penalty(tau = theta2tau(list(beta = c(1, 1), delta = c(0.2, 0.2)), distname = "normal"), y = yy, type = "hh") # For a type = 's' Lambert W x F distribution with location family input # such a decomposition doesn't exist; thus NA. loglik_penalty(tau = theta2tau(list(beta = c(1, 1), gamma = 0.03), distname = "normal"), is.non.negative = FALSE, y = yy, type = "s") # For scale-family input it does exist loglik_penalty(tau = theta2tau(list(beta = 1, gamma = 0.01), distname = "exp"), is.non.negative = TRUE, y = yy, type = "s") # evaluating the Gaussian log-likelihood loglik_input(beta = c(0, 1), x = yy, distname = "normal") # built-in version # or pass your own log pdf function loglik_input(beta = c(0, 1), x = yy, distname = "user", log.dX = function(xx, beta = beta) { dnorm(xx, mean = beta[1], sd = beta[2], log = TRUE) }) ## Not run: # you must specify distname = 'user'; otherwise it does not work loglik_input(beta = c(0, 1), x = yy, distname = "mydist", log.dX = function(xx, beta = beta) { dnorm(xx, mean = beta[1], sd = beta[2], log = TRUE) }) ## End(Not run) ### loglik_LambertW returns all three values loglik_LambertW(theta = list(beta = c(1, 1), delta = c(0.09, 0.07)), y = yy, type = "hh", distname ="normal") # can also take a flattend vector; must provide names though for delta loglik_LambertW(theta = flatten_theta(list(beta = c(1, 1), delta = c(delta_l = 0.09, delta_r = 0.07))), y = yy, type = "hh", distname ="normal")
Computes the norm of an n-dimensional (real/complex)
vector
where is the absolute value of
. For
this is Euclidean norm; for
it is Manhattan norm. For
it is defined as the number of non-zero elements in
; for
it is the maximum of the absolute
values of
.
The norm of equals
if and only if
.
lp_norm(x, p = 2)
lp_norm(x, p = 2)
x |
n-dimensional vector (possibly complex values) |
p |
which norm? Allowed values |
Non-negative float, the norm of .
kRealVec <- c(3, 4) # Pythagoras lp_norm(kRealVec) # did not know Manhattan, lp_norm(kRealVec, p = 1) # so he just imagined running in circles. kComplexVec <- exp(1i * runif(20, -pi, pi)) plot(kComplexVec) sapply(kComplexVec, lp_norm)
kRealVec <- c(3, 4) # Pythagoras lp_norm(kRealVec) # did not know Manhattan, lp_norm(kRealVec, p = 1) # so he just imagined running in circles. kComplexVec <- exp(1i * runif(20, -pi, pi)) plot(kComplexVec) sapply(kComplexVec, lp_norm)
A robust measure of asymmetry. See References for details.
medcouple_estimator(x, seed = sample.int(1e+06, 1))
medcouple_estimator(x, seed = sample.int(1e+06, 1))
x |
numeric vector; if length > 3,000, it uses a random subsample
(otherwise it takes too long to compute as calculations are of order
|
seed |
numeric; seed used for sampling (when |
float; measures the degree of asymmetry
Brys, G., M. Hubert, and A. Struyf (2004). “A robust measure of skewness”. Journal of Computational and Graphical Statistics 13 (4), 996 - 1017.
# a simulation kNumSim <- 100 kNumObs <- 200 ################# Gaussian (Symmetric) #### A <- t(replicate(kNumSim, {xx <- rnorm(kNumObs); c(skewness(xx), medcouple_estimator(xx))})) ########### skewed LambertW x Gaussian #### tau.s <- gamma_01(0.2) # zero mean, unit variance, but positive skewness rbind(mLambertW(theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal")) B <- t(replicate(kNumSim, { xx <- rLambertW(n = kNumObs, theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal") c(skewness(xx), medcouple_estimator(xx)) })) colnames(A) <- colnames(B) <- c("MedCouple", "Pearson Skewness") layout(matrix(1:4, ncol = 2)) plot(A, main = "Gaussian") boxplot(A) abline(h = 0) plot(B, main = "Skewed Lambert W x Gaussian") boxplot(B) abline(h = mLambertW(theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal")["skewness"]) colMeans(A) apply(A, 2, sd) colMeans(B) apply(B, 2, sd)
# a simulation kNumSim <- 100 kNumObs <- 200 ################# Gaussian (Symmetric) #### A <- t(replicate(kNumSim, {xx <- rnorm(kNumObs); c(skewness(xx), medcouple_estimator(xx))})) ########### skewed LambertW x Gaussian #### tau.s <- gamma_01(0.2) # zero mean, unit variance, but positive skewness rbind(mLambertW(theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal")) B <- t(replicate(kNumSim, { xx <- rLambertW(n = kNumObs, theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal") c(skewness(xx), medcouple_estimator(xx)) })) colnames(A) <- colnames(B) <- c("MedCouple", "Pearson Skewness") layout(matrix(1:4, ncol = 2)) plot(A, main = "Gaussian") boxplot(A) abline(h = 0) plot(B, main = "Skewed Lambert W x Gaussian") boxplot(B) abline(h = mLambertW(theta = list(beta = tau.s[c("mu_x", "sigma_x")], gamma = tau.s["gamma"]), distname="normal")["skewness"]) colMeans(A) apply(A, 2, sd) colMeans(B) apply(B, 2, sd)
F distributionsMaximum Likelihood Estimation (MLE) for Lambert W
distributions computes
.
For type = "s"
, the skewness parameter is estimated and
is held fixed; for
type = "h"
the one-dimensional
is estimated and
is held fixed; and for
type = "hh"
the 2-dimensional is estimated and
is held fixed.
By default is fixed for any
type
. If you want to
also estimate (for
type = "h"
or "hh"
)
set theta.fixed = list()
.
MLE_LambertW( y, distname, type = c("h", "s", "hh"), theta.fixed = list(alpha = 1), use.mean.variance = TRUE, theta.init = get_initial_theta(y, distname = distname, type = type, theta.fixed = theta.fixed, use.mean.variance = use.mean.variance, method = "IGMM"), hessian = TRUE, return.estimate.only = FALSE, optim.fct = c("optim", "nlm", "solnp"), not.negative = FALSE )
MLE_LambertW( y, distname, type = c("h", "s", "hh"), theta.fixed = list(alpha = 1), use.mean.variance = TRUE, theta.init = get_initial_theta(y, distname = distname, type = type, theta.fixed = theta.fixed, use.mean.variance = use.mean.variance, method = "IGMM"), hessian = TRUE, return.estimate.only = FALSE, optim.fct = c("optim", "nlm", "solnp"), not.negative = FALSE )
y |
a numeric vector of real values. |
distname |
character; name of input distribution; see
|
type |
type of Lambert W |
theta.fixed |
a list of fixed parameters in the optimization; default
only |
use.mean.variance |
logical; if |
theta.init |
a list containing the starting values of |
hessian |
indicator for returning the (numerically obtained) Hessian at
the optimum; default: |
return.estimate.only |
logical; if |
optim.fct |
character; which R optimization function should be
used. Either |
not.negative |
logical; if |
A list of class LambertW_fit
:
data |
data |
loglik |
scalar; log-likelihood evaluated at the optimum
|
theta.init |
list; starting values for numerical optimization, |
beta |
estimated |
theta |
list; MLE for |
type |
see Arguments, |
hessian |
Hessian matrix; used to calculate standard errors (only if |
call |
function call, |
distname |
see Arguments, |
message |
message from the optimization method. What kind of convergence?, |
method |
estimation method; here |
# See ?LambertW-package
# See ?LambertW-package
Computes the probability that (at least) one (out of n)
observation(s) of the latent variable lies in the non-principal
branch region. The '
m1
' in p_m1
stands for 'minus 1', i.e,
the non-principal branch.
See Goerg (2011) and Details for mathematical derivations.
p_m1(gamma, beta, distname, n = 1, use.mean.variance = TRUE)
p_m1(gamma, beta, distname, n = 1, use.mean.variance = TRUE)
gamma |
scalar; skewness parameter. |
beta |
numeric vector (deprecated); parameter |
distname |
character; name of input distribution; see
|
n |
number of RVs/observations. |
use.mean.variance |
logical; if |
The probability that one observation of the latent RV U lies in the non-principal region equals at most
where is the zero-mean,
unit variance version of the input
– see References.
For independent RVs
, the probability that at
least one data point came from the non-principal region equals
This equals (assuming independence)
For improved numerical stability the cdf of a geometric RV
(pgeom
) is used to evaluate the last
expression. Nevertheless, numerical problems can occur for (returns
0
due to rounding errors).
Note that reduces to
for
.
non-negative float; the probability for
n
observations.
beta.01 <- c(mu = 0, sigma = 1) # for n=1 observation p_m1(0, beta = beta.01, distname = "normal") # identical to 0 # in theory != 0; but machine precision too low p_m1(0.01, beta = beta.01, distname = "normal") p_m1(0.05, beta = beta.01, distname = "normal") # extremely small p_m1(0.1, beta = beta.01, distname = "normal") # != 0, but very small # 1 out of 4 samples is a non-principal input; p_m1(1.5, beta = beta.01, distname = "normal") # however, gamma=1.5 is not common in practice # for n=100 observations p_m1(0, n=100, beta = beta.01, distname = "normal") # == 0 p_m1(0.1, n=100, beta = beta.01, distname = "normal") # still small p_m1(0.3, n=100, beta = beta.01, distname = "normal") # a bit more likely p_m1(1.5, n=100, beta = beta.01, distname = "normal") # Here we can be almost 100% sure (rounding errors) that at least one # y_i was caused by an input in the non-principal branch.
beta.01 <- c(mu = 0, sigma = 1) # for n=1 observation p_m1(0, beta = beta.01, distname = "normal") # identical to 0 # in theory != 0; but machine precision too low p_m1(0.01, beta = beta.01, distname = "normal") p_m1(0.05, beta = beta.01, distname = "normal") # extremely small p_m1(0.1, beta = beta.01, distname = "normal") # != 0, but very small # 1 out of 4 samples is a non-principal input; p_m1(1.5, beta = beta.01, distname = "normal") # however, gamma=1.5 is not common in practice # for n=100 observations p_m1(0, n=100, beta = beta.01, distname = "normal") # == 0 p_m1(0.1, n=100, beta = beta.01, distname = "normal") # still small p_m1(0.3, n=100, beta = beta.01, distname = "normal") # a bit more likely p_m1(1.5, n=100, beta = beta.01, distname = "normal") # Here we can be almost 100% sure (rounding errors) that at least one # y_i was caused by an input in the non-principal branch.
All functions here are for the transformation parameter vector
.
check_tau
checks if is correctly specified (correct names, non-negativity
constraints, etc.)
complete_tau
completes missing values so users don't have to specify
every element of explicitly.
'mu_x'
and
'sigma_x'
must be specified, but alpha = 1
, gamma =
0
, and delta = 0
will be set automatically if missing.
get_initial_tau
provides starting estimates for .
normalize_by_tau
shifts and scales data given the tau
vector as
Parameters and
are not necessarily mean and
standard deviation in the
vector; that depends on the family
type and
use.mean.variance
(for location families they usually are
mean and standard deviation if use.mean.variance = TRUE
; for scale
and non-location non-scale families they are just location/scale
parameters for the transformation).
tau2theta
converts to the parameter list
(inverse of
theta2tau
).
tau2type
guesses the type ('s'
, 'h'
, 'hh'
) from the names
of tau
vector; thus make sure tau
is named correctly.
check_tau(tau) complete_tau(tau, type = tau2type(tau)) get_initial_tau(y, type = c("h", "hh", "s"), location.family = TRUE) normalize_by_tau(data, tau, inverse = FALSE) tau2theta(tau, beta) tau2type(tau)
check_tau(tau) complete_tau(tau, type = tau2type(tau)) get_initial_tau(y, type = c("h", "hh", "s"), location.family = TRUE) normalize_by_tau(data, tau, inverse = FALSE) tau2theta(tau, beta) tau2type(tau)
tau |
named vector |
type |
type of Lambert W |
y |
a numeric vector of real values (the observed data). |
location.family |
logical; if |
data |
numeric; a numeric object in R. Usually this is either
|
inverse |
logical; if |
beta |
numeric vector (deprecated); parameter |
check_tau
throws an error if does not define a proper
transformation.
complete_tau
returns a named numeric vector.
get_initial_tau
returns a named numeric vector.
tau2theta
returns a list with entries alpha
, beta
,
gamma
, and delta
.
tau2type
returns a string: either "s"
, "h"
, or
"hh"
.
Graphical and statistical check if data is Gaussian (three common Normality tests, QQ-plots, histograms, etc).
test_normality
does not show the autocorrelation function (ACF)
estimate for lag , since it always equals
. Thus removing it
does not lose any information, but greatly improves the y-axis scale for
higher order lags (which are usually very small compared to 1).
test_norm
is a shortcut for test_normality
.
test_normality( data, show.volatility = FALSE, plot = TRUE, pch = 1, add.legend = TRUE, seed = sample(1e+06, 1) ) test_norm(...)
test_normality( data, show.volatility = FALSE, plot = TRUE, pch = 1, add.legend = TRUE, seed = sample(1e+06, 1) ) test_norm(...)
data |
a numeric vector of data values. |
show.volatility |
logical; if |
plot |
Should visual checks (histogram, densities, qqplot, ACF) be
plotted? Default |
pch |
a vector of plotting characters or symbols; default |
add.legend |
logical; if |
seed |
optional; if sample size > 5,000, then some normality tests fail to run. In this case it uses a subsample of size 5,000. For reproducibility, the seed can be specified by user. By default it uses a random seed. |
... |
arguments as in |
A list with results of 3 normality tests (each of class htest
)
and the seed used for subsampling:
anderson.darling |
Anderson Darling (if nortest package is available), |
shapiro.francia |
Shapiro-Francia (if nortest package is available), |
shapiro.wilk |
Shapiro-Wilk, |
seed |
seed for subsampling (only used if sample size > 5,000). |
Thode Jr., H.C. (2002): “Testing for Normality”. Marcel Dekker, New York.
shapiro.test
in the stats package;
ad.test
, sf.test
in the
nortest package.
y <- rLambertW(n = 1000, theta = list(beta = c(3, 4), gamma = 0.3), distname = "normal") test_normality(y) x <- rnorm(n = 1000) test_normality(x) # mixture of exponential and normal test_normality(c(rexp(100), rnorm(100, mean = -5)))
y <- rLambertW(n = 1000, theta = list(beta = c(3, 4), gamma = 0.3), distname = "normal") test_normality(y) x <- rnorm(n = 1000) test_normality(x) # mixture of exponential and normal test_normality(c(rexp(100), rnorm(100, mean = -5)))
Performs a test for the null hypothesis of symmetry, , versus the alternative of asymmetry. This can be done using a Wald
test of the linear restriction
or a
likelihood ratio test.
By default it uses "Wald"
test since this only requires the Hessian of
the "hh"
Lambert W fit. The "LR"
test requires the
log-likelihood values for both MLEs (type "h"
and "hh"
) and
thus takes longer to compute.
test_symmetry(LambertW.fit, method = c("Wald", "LR"))
test_symmetry(LambertW.fit, method = c("Wald", "LR"))
LambertW.fit |
an object of class |
method |
test methodology: |
A list of class "htest"
containing:
statistic |
value of the test statistic, |
p.value |
p-value for the test, |
method |
character string describing the test, |
data.name |
a character string giving the name(s) of the data. |
## Not run: # skewed yy <- rLambertW(n = 500, theta = list(delta = c(0.1, 0.25), beta = c(2, 1)), distname = "normal") fit.ml <- MLE_LambertW(yy, type = "hh", distname = "normal", hessian = TRUE) summary(fit.ml) test_symmetry(fit.ml, "LR") test_symmetry(fit.ml, "Wald") # symmetric yy <- rLambertW(n = 500, theta = list(delta = c(0.2, 0.2), beta = c(2, 1)), distname = "normal") fit.ml <- MLE_LambertW(yy, type = "hh", distname = "normal") summary(fit.ml) test_symmetry(fit.ml, "LR") test_symmetry(fit.ml, "Wald") ## End(Not run)
## Not run: # skewed yy <- rLambertW(n = 500, theta = list(delta = c(0.1, 0.25), beta = c(2, 1)), distname = "normal") fit.ml <- MLE_LambertW(yy, type = "hh", distname = "normal", hessian = TRUE) summary(fit.ml) test_symmetry(fit.ml, "LR") test_symmetry(fit.ml, "Wald") # symmetric yy <- rLambertW(n = 500, theta = list(delta = c(0.2, 0.2), beta = c(2, 1)), distname = "normal") fit.ml <- MLE_LambertW(yy, type = "hh", distname = "normal") summary(fit.ml) test_symmetry(fit.ml, "LR") test_symmetry(fit.ml, "Wald") ## End(Not run)
F distributionsThese functions work with ,
which fully parametrizes Lambert W
F distributions.
See Details for more background information on some functions.
check_theta
checks if
describes a well-defined Lambert W distribution.
complete_theta
completes missing values in a parameters list so users
don't have to specify everything in detail. If not supplied, then
alpha = 1
, gamma = 0
, and delta = 0
will be set by default.
flatten_theta
and unflatten_theta
convert between the list
theta
and its vector-style flattened type. The flattened version is required
for several optimization routines, since they optimize over multivariate vectors – not lists.
get_initial_theta
provides initial estimates for ,
,
, and
, which are then
used in maximum likelihood (ML) estimation (
MLE_LambertW
).
get_theta_bounds
returns lower and upper bounds for
(necessary for optimization such as
MLE_LambertW
).
theta2tau
converts to the transformation vector
.
theta2unbounded
transforms from the bounded space to an
unrestricted space (by
-transformation on
,
, and
; note that this restricts
,
, and
.).
check_theta(theta, distname) complete_theta(theta = list(), LambertW.input = NULL) flatten_theta(theta) get_initial_theta( y, distname, type = c("h", "hh", "s"), theta.fixed = list(alpha = 1), method = c("Taylor", "IGMM"), use.mean.variance = TRUE ) get_theta_bounds( distname, beta, type = c("s", "h", "hh"), not.negative = FALSE ) theta2tau(theta = list(beta = c(0, 1)), distname, use.mean.variance = TRUE) theta2unbounded(theta, distname, type = c("h", "hh", "s"), inverse = FALSE) unflatten_theta(theta.flattened, distname, type)
check_theta(theta, distname) complete_theta(theta = list(), LambertW.input = NULL) flatten_theta(theta) get_initial_theta( y, distname, type = c("h", "hh", "s"), theta.fixed = list(alpha = 1), method = c("Taylor", "IGMM"), use.mean.variance = TRUE ) get_theta_bounds( distname, beta, type = c("s", "h", "hh"), not.negative = FALSE ) theta2tau(theta = list(beta = c(0, 1)), distname, use.mean.variance = TRUE) theta2unbounded(theta, distname, type = c("h", "hh", "s"), inverse = FALSE) unflatten_theta(theta.flattened, distname, type)
theta |
list; a (possibly incomplete) list of parameters |
distname |
character; name of input distribution; see
|
LambertW.input |
optional; if |
y |
a numeric vector of real values (the observed data). |
type |
type of Lambert W |
theta.fixed |
list; fixed parameters for the optimization; default:
|
method |
character; should a fast |
use.mean.variance |
logical; if |
beta |
numeric vector (deprecated); parameter |
not.negative |
logical; if |
inverse |
logical; if |
theta.flattened |
named vector; flattened version of list |
get_initial_theta
obtains a quick initial estimate of by
first finding the (approximate) input
by
IGMM
, and then estimating
for this input data
(see
estimate_beta
).
Converting theta
to an unbounded space is especially useful
for optimization routines (like nlm
), which can be
performed over an unconstrained space. The obtained optimum can be
converted back to the original space using the inverse transformation
(set inverse = TRUE
transforms it via ) – this
guarantees that the estimate satisfies non-negativity constraints (if
required). The main advantage is that this avoids using optimization
routines with boundary constraints – since they are much slower compared
to unconstrained optimization.
check_theta
throws an error if list theta
does not
define a proper Lambert W F distribution;
does nothing otherwise.
complete_theta
returns a list containing:
alpha |
heavy tail exponent(s), |
beta |
named vector |
gamma |
skewness parameter, |
delta |
heavy-tail parameter(s). |
get_initial_theta
returns a list containing:
alpha |
heavy tail exponent; default: |
beta |
named vector |
gamma |
skewness parameter; if |
delta |
heavy-tail parameter;
estimated from |
get_theta_bounds
returns a list containing two vectors:
lower |
flattened vector of lower bounds for valid |
upper |
flattened vector of upper bounds for valid |
estimate_beta
, get_initial_tau
## Not run: check_theta(theta = list(beta = c(1, 1, -1)), distname = "t") ## End(Not run) check_theta(theta = list(beta = c(1, 1)), distname = "normal") # ok params <- list(beta = c(2, 1), delta = 0.3) # alpha and gamma are missing complete_theta(params) # added default values params <- list(beta = c(2, 1), delta = 0.3, alpha = c(1, 2)) params <- complete_theta(params) check_theta(params, distname = 'normal') ### x <- rnorm(1000) get_initial_theta(x, distname = "normal", type = "h") get_initial_theta(x, distname = "normal", type = "s") # starting values for the skewed version of an exponential y <- rLambertW(n = 1000, distname = "exp", theta=list(beta = 2, gamma = 0.1)) get_initial_theta(y, distname = "exp", type = "s") # starting values for the heavy-tailed version of a Normal = Tukey's h y <- rLambertW(n = 1000, distname="normal", theta=list(beta = c(2, 1), delta = 0.2)) get_initial_theta(y, distname = "normal", type = "h")#' ### get_theta_bounds(type = "hh", distname = "normal", beta = c(0, 1)) ### theta.restr <- theta2unbounded(list(beta = c(-1, 0.1), delta = c(0.2, 0.2)), distname = "normal") theta.restr # returns again the beta and delta from above theta2unbounded(theta.restr, inverse = TRUE, distname = "normal")
## Not run: check_theta(theta = list(beta = c(1, 1, -1)), distname = "t") ## End(Not run) check_theta(theta = list(beta = c(1, 1)), distname = "normal") # ok params <- list(beta = c(2, 1), delta = 0.3) # alpha and gamma are missing complete_theta(params) # added default values params <- list(beta = c(2, 1), delta = 0.3, alpha = c(1, 2)) params <- complete_theta(params) check_theta(params, distname = 'normal') ### x <- rnorm(1000) get_initial_theta(x, distname = "normal", type = "h") get_initial_theta(x, distname = "normal", type = "s") # starting values for the skewed version of an exponential y <- rLambertW(n = 1000, distname = "exp", theta=list(beta = 2, gamma = 0.1)) get_initial_theta(y, distname = "exp", type = "s") # starting values for the heavy-tailed version of a Normal = Tukey's h y <- rLambertW(n = 1000, distname="normal", theta=list(beta = c(2, 1), delta = 0.2)) get_initial_theta(y, distname = "normal", type = "h")#' ### get_theta_bounds(type = "hh", distname = "normal", beta = c(0, 1)) ### theta.restr <- theta2unbounded(list(beta = c(-1, 0.1), delta = c(0.2, 0.2)), distname = "normal") theta.restr # returns again the beta and delta from above theta2unbounded(theta.restr, inverse = TRUE, distname = "normal")
Density, distribution function, quantile function and random number
generation for the shifted and scaled U of the
(location-)scale family input
- see References.
Since the normalized random variable U is one of the main building blocks of
Lambert W F distributions, these functions are wrappers used
by other functions such as
dLambertW
or
rLambertW
.
dU(u, beta, distname, use.mean.variance = TRUE) pU(u, beta, distname, use.mean.variance = TRUE) qU(p, beta, distname, use.mean.variance = TRUE) rU(n, beta, distname, use.mean.variance = TRUE)
dU(u, beta, distname, use.mean.variance = TRUE) pU(u, beta, distname, use.mean.variance = TRUE) qU(p, beta, distname, use.mean.variance = TRUE) rU(n, beta, distname, use.mean.variance = TRUE)
u |
vector of quantiles. |
beta |
numeric vector (deprecated); parameter |
distname |
character; name of input distribution; see
|
use.mean.variance |
logical; if |
p |
vector of probability levels |
n |
number of samples |
dU
evaluates the pdf at y
, pU
evaluates the
cdf, qU
is the quantile function, and rU
generates random
samples from U.
# a zero-mean, unit variance version of the t_3 distribution. curve(dU(x, beta = c(1, 1, 3), distname = "t"), -4, 4, ylab = "pdf", xlab = "u", main = "student-t \n zero-mean, unit variance") # cdf of unit-variance version of an exp(3) -> just an exp(1) curve(pU(x, beta = 3, distname = "exp"), 0, 4, ylab = "cdf", xlab = "u", main = "Exponential \n unit variance", col = 2, lwd = 2) curve(pexp(x, rate = 1), 0, 4, add = TRUE, lty = 2) # all have (empirical) variance 1 var(rU(n = 1000, distname = "chisq", beta = 2)) var(rU(n = 1000, distname = "normal", beta = c(3, 3))) var(rU(n = 1000, distname = "exp", beta = 1)) var(rU(n = 1000, distname = "unif", beta = c(0, 10)))
# a zero-mean, unit variance version of the t_3 distribution. curve(dU(x, beta = c(1, 1, 3), distname = "t"), -4, 4, ylab = "pdf", xlab = "u", main = "student-t \n zero-mean, unit variance") # cdf of unit-variance version of an exp(3) -> just an exp(1) curve(pU(x, beta = 3, distname = "exp"), 0, 4, ylab = "cdf", xlab = "u", main = "Exponential \n unit variance", col = 2, lwd = 2) curve(pexp(x, rate = 1), 0, 4, add = TRUE, lty = 2) # all have (empirical) variance 1 var(rU(n = 1000, distname = "chisq", beta = 2)) var(rU(n = 1000, distname = "normal", beta = c(3, 3))) var(rU(n = 1000, distname = "exp", beta = 1)) var(rU(n = 1000, distname = "unif", beta = c(0, 10)))
The Lambert W function is defined as the inverse
of (see
xexp
)
i.e., it satisfies .
W
evaluates the Lambert W function (W
), its first derivative
(deriv_W
), and its logarithm (log_W
). All of them have a
principal (branch = 0
(default)) and non-principal branch
(branch = -1
) solution.
W
is a wrapper for lambertW0
and
lambertWm1
in the lamW package.
W(z, branch = 0) deriv_W(z, branch = 0, W.z = W(z, branch = branch)) log_deriv_W(z, branch = 0, W.z = W(z, branch = branch)) deriv_log_W(z, branch = 0, W.z = W(z, branch = branch)) log_W(z, branch = 0, W.z = W(z, branch = branch))
W(z, branch = 0) deriv_W(z, branch = 0, W.z = W(z, branch = branch)) log_deriv_W(z, branch = 0, W.z = W(z, branch = branch)) deriv_log_W(z, branch = 0, W.z = W(z, branch = branch)) log_W(z, branch = 0, W.z = W(z, branch = branch))
z |
a numeric vector of real values; note that |
branch |
either |
W.z |
Lambert W function evaluated at |
Depending on the argument of
one can distinguish 3 cases:
solution is unique W(z) = W(z, branch =
0)
;
two solutions: the principal (W(z,
branch = 0)
) and non-principal (W(z, branch = -1)
) branch;
no solution exists in the reals.
log_W
computes the natural logarithm of . This can be done
efficiently since
. Similarly, the
derivative can be expressed as a function of
:
Note that and
.
Moreover, by taking logs on both sides we can even simplify further to
which, since
, simplifies to
For this reason it is numerically faster to pass the value of as
an argument to
deriv_W
since W(z)
often has already been
evaluated in a previous step.
numeric; same dimensions/size as z
.
W
returns numeric, Inf
(for z = Inf
), or NA
if
.
Note that W
handles NaN
differently to
lambertW0
/ lambertWm1
in the lamW package; it returns
NA
.
Corless, R. M., G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey and D. E. Knuth (1996). “On the Lambert W function”. Advances in Computational Mathematics, pp. 329-359.
lambertW0
/ lambertWm1
in the lamW package;
xexp
.
W(-0.25) # "reasonable" input event W(-0.25, branch = -1) # "extreme" input event curve(W(x, branch = -1), -1, 2, type = "l", col = 2, lwd = 2) curve(W(x), -1, 2, type = "l", add = TRUE, lty = 2) abline(v = - 1 / exp(1)) # For lower values, the principal branch gives the 'wrong' solution; # the non-principal must be used. xexp(-10) W(xexp(-10), branch = 0) W(xexp(-10), branch = -1) curve(log(x), 0.1, 5, lty = 2, col = 1, ylab = "") curve(W(x), 0, 5, add = TRUE, col = "red") curve(log_W(x), 0.1, 5, add = TRUE, col = "blue") grid() legend("bottomright", c("log(x)", "W(x)", "log(W(x))"), col = c("black", "red", "blue"), lty = c(2, 1, 1))
W(-0.25) # "reasonable" input event W(-0.25, branch = -1) # "extreme" input event curve(W(x, branch = -1), -1, 2, type = "l", col = 2, lwd = 2) curve(W(x), -1, 2, type = "l", add = TRUE, lty = 2) abline(v = - 1 / exp(1)) # For lower values, the principal branch gives the 'wrong' solution; # the non-principal must be used. xexp(-10) W(xexp(-10), branch = 0) W(xexp(-10), branch = -1) curve(log(x), 0.1, 5, lty = 2, col = 1, ylab = "") curve(W(x), 0, 5, add = TRUE, col = "red") curve(log_W(x), 0.1, 5, add = TRUE, col = "blue") grid() legend("bottomright", c("log(x)", "W(x)", "log(W(x))"), col = c("black", "red", "blue"), lty = c(2, 1, 1))
Inverse transformation W_delta_alpha
for heavy-tail Lambert W RVs and its derivative.
This is the inverse of Tukey's h transformation as a special case of alpha = 1
.
W_delta(z, delta = 0) W_delta_alpha(z, delta = 0, alpha = 1) W_2delta(z, delta = c(0, 1/5)) W_2delta_2alpha(z, delta = c(0, 0), alpha = c(1, 1)) deriv_W_delta(z, delta = 0) deriv_W_delta_alpha(z, delta = 1, alpha = 1)
W_delta(z, delta = 0) W_delta_alpha(z, delta = 0, alpha = 1) W_2delta(z, delta = c(0, 1/5)) W_2delta_2alpha(z, delta = c(0, 0), alpha = c(1, 1)) deriv_W_delta(z, delta = 0) deriv_W_delta_alpha(z, delta = 1, alpha = 1)
z |
a numeric vector of real values. |
delta |
heavy-tail parameter(s); by default |
alpha |
heavy-tail exponent(s) in |
Computes sgn. If
is a vector, so is the output.
G_delta(0) W_delta(0) # W_delta is the inverse of G_delta u.v <- -2:2 W_delta(G_delta(u.v, delta = 0.3), delta = 0.3) # with alpha too G_delta_alpha(u.v, delta = 1, alpha = 0.33) W_delta_alpha(G_delta_alpha(u.v, delta = 1, alpha = 0.33), delta = 1, alpha = 0.33) # the inverse
G_delta(0) W_delta(0) # W_delta is the inverse of G_delta u.v <- -2:2 W_delta(G_delta(u.v, delta = 0.3), delta = 0.3) # with alpha too G_delta_alpha(u.v, delta = 1, alpha = 0.33) W_delta_alpha(G_delta_alpha(u.v, delta = 1, alpha = 0.33), delta = 1, alpha = 0.33) # the inverse
Inverse transformation for skewed Lambert W RVs and its derivative.
W_gamma(z, gamma = 0, branch = 0) deriv_W_gamma(z, gamma = 0, branch = 0)
W_gamma(z, gamma = 0, branch = 0) deriv_W_gamma(z, gamma = 0, branch = 0)
z |
a numeric vector of real values; note that |
gamma |
skewness parameter; by default |
branch |
either |
A skewed Lambert W F RV
(for simplicity assume zero mean, unit variance input)
is defined by the transformation (see
H_gamma
)
where is a zero-mean and/or unit-variance version of the distribution
.
The inverse transformation is , where
is the Lambert W function.
W_gamma(z, gamma, branch = 0)
(and W_gamma(z, gamma, branch = -1)
)
implement this inverse.
If , then
and the inverse also equals the identity.
If , the inverse transformation can be computed by
Same holds for W_gamma(z, gamma, branch = -1)
.
The derivative of with respect to
simplifies to
deriv_W_gamma
implements this derivative (for both branches).
numeric; if is a vector, so is the output.
The Lambert W function is the inverse of
.
In versions < 0.6.0 of the package this function was denoted as H
.
It is now replaced with the more descriptive xexp
(and H
is deprecated).
xexp(x) deriv_xexp(x, degree = 1)
xexp(x) deriv_xexp(x, degree = 1)
x |
a numeric vector of real/complex values. |
degree |
non-negative integer; degree of the derivative |
The n-th derviative of is available in closed for as
Returns for
. If
is a
vector/matrix, so is
.
plot(xexp, -5, 0.5, type="l", xlab="u", ylab="z") grid() abline(h=0, lty = 2) abline(v=0, lty = 2)
plot(xexp, -5, 0.5, type="l", xlab="u", ylab="z") grid() abline(h=0, lty = 2) abline(v=0, lty = 2)