Package 'logNormReg' reference manual

Title:	log Normal Linear Regression
Description:	Functions to fits simple linear regression models with log normal errors and identity link, i.e. taking the responses on the original scale. See Muggeo (2018) <doi:10.13140/RG.2.2.18118.16965>.
Authors:	Vito M. R. Muggeo [aut, cre]
Maintainer:	Vito M. R. Muggeo <[email protected]>
License:	GPL
Version:	0.5-0
Built:	2025-02-14 02:39:04 UTC
Source:	https://github.com/cran/logNormReg

log Normal Linear Regression

Description

Functions to perform and to display results from simple multiple linear regression with log Normal errors and identity link. Standard errors of estimates are based on the sandwich formula.

Details

Package:	logNormReg
Type:	Package
Version:	0.5-0
Date:	2021-11-08
License:	GPL

Acknowledgement

This package was inspired by a fruitful discussion with Andrew Beet (Marine Policy Center, Woods Hole Oceanographic Institution, U.S.).

Author(s)

Vito M.R. Muggeo <[email protected]>

References

Gustavsson, S., Fagerberg, B., Sallsten, G., Andersson, E. M. (2014). Regression Models for Log-Normal Data: Comparing Different Methods for Quantifying the Association between Abdominal Adiposity and Biomarkers of Inflammation and Insulin Resistance . International Journal of Environmental Research and Public Health, 11, 3521–3539.

Muggeo, V.M.R. (2018) A note on regression with log Normal errors: linear and piecewise linear modelling in R, doi: 10.13140/RG.2.2.18118.16965.

Confidence intervals for the parameters in log normal regression

Description

Computes confidence intervals (based on the Gradient, Wald or Likelihood Ratio sattistic) for the linear parameters in a fitted ‘lognreg’ model.

Usage

## S3 method for class 'lognlm'
confint(object, parm, level=0.95, type = c("wald", "gradient", "lrt"), ...)
## S3 method for class 'lognlm'
confint(object, parm, level=0.95, type = c("wald", "gradient", "lrt"), ...)

Arguments

`object`	a fitted `lognlm` object.
`parm`	the parameter of interest. Numeric (covariate number) or character (covariate name). If missing `parm=2` is taken, i.e. the coefficient of the first covariate, provided the intercept is in the model.
`level`	the required confidence level (default to 0.95).
`type`	Which statistics should be used? Currently `"wald"`, `"gradient"`, or `"lrt"`. Names can be abbreviated. If `object` has been obtained with `lik=FALSE`, only `type="wald"` or `"gradient"` is permitted.
`...`	When `type` is not `"wald"`, other optional arguments to be passed on the internal functions: - `lim` to specify the range of the evaluation points (default to (-3,3) resulting in the interval $\hat\beta \pm 3\times \mathrm{SE}(\hat\beta)$ ); - `values` to set explicitly the evaluation point(s); - `return.val` to return (if `TRUE`) the evaluation points and the corresponding statistic values (useful to plot the profiled statistic). If the supplied `values` includes just one scalar, `return.val` is set to `TRUE`.

Details

Confidence intervals are computed and returned. Currently the Wald, Gradient or Likelihood ratio statistic can be used. Based on some simulation experiments the simple Wald based CIs appears adeguate to guarantee the nominal coverage levels.

Value

The end-points of confidence intervals.

Author(s)

Vito Muggeo

References

For a gentle and general introduction about the likelihood-based statistics (including the gradient) see

Muggeo V.M.R., Lovison G. (2014), The 'three plus one' likelihood-based test statistics: unified geometrical and graphical interpretations. The American Statistician, 68, 302-306.

Examples



n=50
s=.4
set.seed(1515)      #just to get reproducible results..

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)

#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z, lik=TRUE) #ML estimation

confint(o, "x", type="g")
confint(o, "z", type="w") #same than confint.default(o) 

n=50
s=.4
set.seed(1515)      #just to get reproducible results..

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)

#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z, lik=TRUE) #ML estimation

confint(o, "x", type="g")
confint(o, "z", type="w") #same than confint.default(o)

Log Likelihood for log Normal linear regression

Description

The function returns the log-likelihood value of the log Normal linear regression model evaluated at the estimated coefficients

Usage

## S3 method for class 'lognlm'
logLik(object, full=FALSE, ...)
## S3 method for class 'lognlm'
extractAIC(fit, scale=0, k=2, ...)

## S3 method for class 'lognlm'
logLik(object, full=FALSE, ...)
## S3 method for class 'lognlm'
extractAIC(fit, scale=0, k=2, ...)

Arguments

`object`, `fit`	A `lognlm` fit returned by `lognlm()`
`full`	If `FALSE`, only the kernel of the log likelihood is returned, otherwise the complete log likelihood (including terms depending on data only)
`scale`	Optional numeric specifying the scale parameter of the model. Currenty not used.
`k`	Optional numeric specifying the penalty of the edf in the AIC formula. If `k<=0`, the BIC is returned.
`...`	optional arguments (nothing in this method).

Details

If object has been obtained via lognlm(.., lik=TRUE), logLik.lognlm returns the log likelihood (kernel or complete, depending on argument full), otherwise the sum of log residuals, $\sum(\log(y_i)-\log(\hat\mu_i))^2)$ . The value returned by AIC is based on the kernel log likelihood or the the sum of log residuals, while extractAIC can return the AIC (or BIC) using the full log likelihood (via extractAIC(.., full=TRUE))

Value

The log likelihood (or the sum of log residuals squared) of the model fit object

Author(s)

Vito Muggeo

Examples

   
# o is the fit object, see ?lognlm
n=50
s=.4

#covariates
x<-seq(.1,10,l=n) 

#response
set.seed(1234)      #just to get reproducible results..
mu<- 10+.5*x  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x, lik=TRUE) #the model

logLik(o) #the kernel log likelihood value
logLik(o, full=TRUE)
   
# o is the fit object, see ?lognlm
n=50
s=.4

#covariates
x<-seq(.1,10,l=n) 

#response
set.seed(1234)      #just to get reproducible results..
mu<- 10+.5*x  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x, lik=TRUE) #the model

logLik(o) #the kernel log likelihood value
logLik(o, full=TRUE)

Multiple linear regression with log Normal errors

Description

The function fits simple multiple linear regression models with log Normals erros. Two objectives as well as two optimizing functions can be used.

Usage

lognlm(formula, data, subset, weights, na.action, y = TRUE, start, model = TRUE, 
     lik = FALSE, opt = c("nlminb", "optim"), contrasts=NULL, ...)
lognlm(formula, data, subset, weights, na.action, y = TRUE, start, model = TRUE, 
     lik = FALSE, opt = c("nlminb", "optim"), contrasts=NULL, ...)

Arguments

`formula`	a standard R formula with response and explanatory variables (and possible offset) specifying the regression model being fitted.
`data`	an optional data frame, list or environment containing some or all the variables in the model.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`weights`	an optional vector of (positive) weights to be used in the fitting process. Currently implemented only if `lik=FALSE`.
`na.action`	a function indicating what should happen when the data contain NAs. The default is set by the `na.action` setting of `options`.
`y`	logical. If `TRUE` the response vector is returned as `y` in the fit object.
`start`	(optional) starting values of the parameter to be estimated. If `start` is missing they are computed via ordinary least squares with the intercept $\hat\beta_0$ replaced by $max(\hat\beta_0,median\{y_i\})$ . If `lik=TRUE` (i.e. a log Normal model is fitted), `start` refers to the regression parameters and the error standard deviation; if `lik=FALSE`, `start` does not include the starting guess for the standard deviation.
`model`	logical. If `TRUE` the model frame is returned as `model` in the fit object.
`lik`	If `TRUE` the log Normal log likelihood is optimized, otherwise the sum of squared residuals based on the logs (see Details).
`opt`	the optimization function to be used. `nlminb` appears to be more efficient, probably because it uses (unlike `optim`) also the hessian matrix (supplied in the code). However results are often indistinguishable.
`contrasts`	an optional list. See the contrasts.arg of model.matrix.default.
`...`	optional arguments passed on to the optimizing functions (`nlminb` or `optim`), (and therefore should be consistent with that).

Details

lognlm fits simple linear regression models with log Normal errors and identity link. Actually two objectives could be used.

If lik=TRUE the usual log Normal likelihood is optimized, otherwise estimation is based on minimization of the following loss function

$\sum_i(\log y_i-\log \mu_i )^2$

where $\mu_i=x_i^T\beta$ is the mean function equal to the linear predictor (as an identity link is exploited).

Value

A list with components

`coefficients`	the regression parameters estimate.
`loglik`	The objective function value, namely the log Normal log likelihood or the sum of the squared ‘log residuals’ (depending on `lik` option).
`s2`	the error variance estimate.
`fitted.values`	the fitted values.
`residuals`	the raw residuals on the original scale, i.e. 'observed - fitted'.
`grad`	the gradient at solution.
`hessian`	the hessian matrix at solution.
`Ehessian`	the expected hessian matrix at solution (only if `lik=FALSE`).
`convergence`	the convergence code coming from the fitter function. `0` means succefull convergence.
`call`	the matched call.
`y`	the response vector (provided that `y=TRUE` has been set).
`opt`	the employed optimizer.
`lik`	logical, indicating if the fit comes from a log Normal likelihood approach.
`xlevels`	(only where relevant) a record of the levels of the factors used in fitting.
`terms`	the terms object used.
`contrasts`	(only where relevant) the contrasts used.
`model`	if requested, i.e. `model=TRUE` has been set (the default), the model frame used.
`offset`	the (possible) offset used.

Author(s)

Vito M.R. Muggeo

Examples


n=300
s=.4
set.seed(123)      #just to get reproducible results..

x<-seq(.1,10,l=n) #covariate
mu<- 10+2*x  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o0<-lm(log(y)~x) #the usual but WRONG model
o<- lognlm(y~x, lik=TRUE) #fit the 'right' model by ML

plot(x,y)
lines(x, mu, lwd=2)
points(x, exp(fitted(o0)), col=2, type="l", lwd=2)
points(x, fitted(o), col=3, type="l", lwd=2)
legend("topleft", legend=c("true", "lm(log(y)~x)", "lognlm(y~x)"), 
    col=c(1,2,3), lwd=2)

#Sometimes people would estimate parameters by minimizing a least square objective 
# (i.e. by setting 'lik=FALSE', see Details), wherein data would come from 
# Y = mu * exp(eps) where eps~N(0,s)..
y1<-mu*exp(rnorm(n,0,1)) #data..
o1<-lognlm(y1~x, lik=FALSE) #set 'lik=FALSE', see Details


n=300
s=.4
set.seed(123)      #just to get reproducible results..

x<-seq(.1,10,l=n) #covariate
mu<- 10+2*x  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o0<-lm(log(y)~x) #the usual but WRONG model
o<- lognlm(y~x, lik=TRUE) #fit the 'right' model by ML

plot(x,y)
lines(x, mu, lwd=2)
points(x, exp(fitted(o0)), col=2, type="l", lwd=2)
points(x, fitted(o), col=3, type="l", lwd=2)
legend("topleft", legend=c("true", "lm(log(y)~x)", "lognlm(y~x)"), 
    col=c(1,2,3), lwd=2)

#Sometimes people would estimate parameters by minimizing a least square objective 
# (i.e. by setting 'lik=FALSE', see Details), wherein data would come from 
# Y = mu * exp(eps) where eps~N(0,s)..
y1<-mu*exp(rnorm(n,0,1)) #data..
o1<-lognlm(y1~x, lik=FALSE) #set 'lik=FALSE', see Details

The fitter function for log Normal Linear Models

Description

lognlm.fit is called by lognlm to fit log Normal linear regression models. Two optimizing functions can be used, nlminb and optim. This function is not meant to be called by the user directly.

Usage

lognlm.fit(X, y, par, lik = TRUE, opt = c("nlminb", "optim"), 
    offset=NULL, weights=NULL,...)
lognlm.fit(X, y, par, lik = TRUE, opt = c("nlminb", "optim"), 
    offset=NULL, weights=NULL,...)

Arguments

`X`	design matrix for standard linear terms.
`y`	vector of observations of length `n`.
`par`	starting values of parameters to be estimated.
`lik`	logical. See `lik` in `lognlm`
`opt`	the optimizing algorithm. Default to `nlminb`.
`offset`	a possible offset term.
`weights`	apossible weights to be used if `lik=FALSE`.
`...`	other arguments to be passed to the optimizer specified in `opt`.

Details

See lognlm for more details on the arguments and returned objects.

Value

A list of fit information

Note

This function should usually not be used directly by the user.

Author(s)

Vito M.R. Muggeo

Examples

## See ?lognlm
## See ?lognlm

Air quality in Palermo (Italy), 1997-2001

Description

Daily time series of some pollutants and meteorological variables in Palermo, 1997-2001

Usage

data("palermo")data("palermo")

Format

A data frame with 1826 observations on the following 8 variables.

day: day of month
month: month of year
year: year
so2: Sulfur dioxide
no2: Nitrogen dioxide
pm10: particular matter
temp: tempearture (degrees Celsius)
hum: air humidity (%)

Details

Data refer to air pollution, temperature and humidity registered in Palermo, (Sicily, Italy) in 1997-2001. Data are averages from eight monitoring stations in the city.

Examples

## Not run: 
data(palermo)

o<-lognlm(pm10 ~ temp + hum, data=palermo)



## End(Not run)
## Not run: 
data(palermo)

o<-lognlm(pm10 ~ temp + hum, data=palermo)



## End(Not run)

PM2.5 and PM10 measurements in Paris in 2019

Description

Hourly time series of PM2.5 and PM10 measurements in Paris in 2019

Usage

data("paris")data("paris")

Format

A data frame with 647 observations on the following 4 variables.

utc: date and time of measurements
pm25: The PM2.5 measurements
pm10: The PM10 measurements
hours: numeric, the measurement hours

Details

Ambient particulate matter measurements (PM 2.5 and PM 10) measured by reference-grade instruments for Paris - Centre, hourly measurements for July 2019. Non-physical measurements (zero values and outliers with PM2.5 greater than 2xPM10) were removed, as mass of PM2.5 is a fraction of mass of PM10 by definition.

Source

European Environmental Agency, through Open AQ (https://openaq.org/#/location/4146). Thanks to Vito Ilacqua (EPA) for pointing out that.

Examples


data(paris)
plot(pm10~pm25, data=paris)
o<-lm(pm10~pm25, data=paris) #negative intercept! it's meaningless..
o1<-lognlm(pm10 ~ pm25, data=paris, lik=TRUE)
abline(coef=coef(o),  col=2)
abline(coef=coef(o1), col=3)
data(paris)
plot(pm10~pm25, data=paris)
o<-lm(pm10~pm25, data=paris) #negative intercept! it's meaningless..
o1<-lognlm(pm10 ~ pm25, data=paris, lik=TRUE)
abline(coef=coef(o),  col=2)
abline(coef=coef(o1), col=3)

Print method for the lognlm class

Description

Printing the most important feautures of a 'lognlm' model.

Usage

## S3 method for class 'lognlm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'lognlm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	object of class `segmented`
`digits`	number of digits to be printed
`...`	arguments passed to other functions

Author(s)

Vito M.R. Muggeo

Summarizing model fits for log Normal regression

Description

summary method for class lognlm.

Usage

## S3 method for class 'lognlm'
summary(object, ...)

## S3 method for class 'summary.lognlm'
print(x, digits = max(3L, getOption("digits") - 3L), 
       signif.stars = getOption("show.signif.stars"), ...)

## S3 method for class 'lognlm'
summary(object, ...)

## S3 method for class 'summary.lognlm'
print(x, digits = max(3L, getOption("digits") - 3L), 
       signif.stars = getOption("show.signif.stars"), ...)

Arguments

`object`	object of class `"lognreg"`.
`x`	a `summary.segmented` object produced by `summary.segmented()`.
`digits`	controls number of digits printed in output.
`signif.stars`	logical, should stars be printed on summary tables of coefficients?
`...`	further arguments to be passed to `vcov`, for instance `sandw=TRUE`.

Details

These functions compute and print some useful information relevant to "lognlm" fits, including point estimates, standard errors and p-values.

Value

A list (similar to one returned by lognlm with additional components, such as the estimate standard errors and corresponding p-values.

Author(s)

Vito Muggeo

Examples

## Not run: 
n=20
s=.2
set.seed(10)      #just to get reproducible results..

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)

#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z) #the model
summary(o, sandw=TRUE)

## End(Not run)
## Not run: 
n=20
s=.2
set.seed(10)      #just to get reproducible results..

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)

#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z) #the model
summary(o, sandw=TRUE)

## End(Not run)

Covariance matrix for lognlm fits

Description

Computes covariance matrix of parameter estimates from a lognlm fit via the sandwich formula.

Usage

## S3 method for class 'lognlm'
vcov(object, emp = FALSE, exH = TRUE, se = FALSE, ...)
## S3 method for class 'lognlm'
vcov(object, emp = FALSE, exH = TRUE, se = FALSE, ...)

Arguments

`object`	a fitted model object of class `"lognlm"` returned by `lognlm()`
`emp`	logical; if `TRUE`, the ‘meat’ (i.e the information matrix) is computed empirically by the outer product of the individual score contributions.
`exH`	logical; if `TRUE` the expected (rather than the observed) hessian is used in the sandwich formula.
`se`	logical; if `TRUE` the square root of the elements of the main diagonal are returned (rather than the whole matrix).
`...`	additional arguments.

Details

If object has been obtained via lognlm(.., lik=TRUE) the returned covariance matrix (or standard errors only) refers to regression coefficients and the log response standard deviation. Otherwise (if lik=FALSE has been set), it includes entries relevant to regression coefficients only. The var-covariance matrix comes from the sandwich formula using expected (if exH=TRUE) or the observed (if exH=FALSE) hessian at solution. Some simulations under correct model specification show that emp=TRUE and exH=FALSE lead to somewhat more unstable standard errors.

Value

The variance-covariance matrix of the parameter estimates, if se=FALSE; otherwise the square root of the main diagonal entries.

Note

Currently for likelihood-based fits, exH=FALSE and emp=TRUE are always set.

Author(s)

Vito Muggeo

Examples


n=50
s=.3

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)
#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z, lik=TRUE) #the model
vcov(o) #the full covariance matrix 
vcov(o, se=TRUE) #st.errs only 

 
n=50
s=.3

#covariates
x<-seq(.1,10,l=n) 
z<-rnorm(n)
#response
mu<- 10+.5*x- z  #linear regression function
y<-rlnorm(n, log(mu)-s^2/2, s) #data..

o<- lognlm(y~x+z, lik=TRUE) #the model
vcov(o) #the full covariance matrix 
vcov(o, se=TRUE) #st.errs only

Package 'logNormReg'

Help Index

log Normal Linear Regression

Description

Details

Acknowledgement

Author(s)

References

See Also

Confidence intervals for the parameters in log normal regression

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Log Likelihood for log Normal linear regression

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Multiple linear regression with log Normal errors

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

The fitter function for log Normal Linear Models

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Air quality in Palermo (Italy), 1997-2001

Description

Usage

Format

Details

Examples

PM2.5 and PM10 measurements in Paris in 2019

Description

Usage

Format

Details

Source

Examples

Print method for the lognlm class

Description

Usage

Arguments

Author(s)

See Also

Summarizing model fits for log Normal regression

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Covariance matrix for lognlm fits

Description

Usage

Arguments

Details