Title: | Non-Crossing Additive Regression Quantiles and Non-Parametric Growth Charts |
---|---|
Description: | Fits non-crossing regression quantiles as a function of linear covariates and multiple smooth terms, including varying coefficients, via B-splines with L1-norm difference penalties. Random intercepts and variable selection are allowed via the lasso penalties. The smoothing parameters are estimated as part of the model fitting, see Muggeo and others (2021) <doi:10.1177/1471082X20929802>. Monotonicity and concavity constraints on the fitted curves are allowed, see Muggeo and others (2013) <doi:10.1007/s10651-012-0232-1>, and also <doi:10.13140/RG.2.2.12924.85122> or <doi:10.13140/RG.2.2.29306.21445> some code examples. |
Authors: | Vito M. R. Muggeo [aut, cre] |
Maintainer: | Vito M. R. Muggeo <[email protected]> |
License: | GPL |
Version: | 1.7-1 |
Built: | 2024-11-19 05:11:28 UTC |
Source: | https://github.com/cran/quantregGrowth |
Fits non-crossing regression quantiles as a function of linear covariates and smooth terms via P-splines with difference penalties. Random intercepts and selection of linear variables are allowed via the lasso penalties. Estimation of (possibly adaptive) smoothing/tuning parameters (for the spline terms, the random intercepts and the variables selector) are carried out efficientely as part of model fitting.
Package: | quantregGrowth |
Type: | Package |
Version: | 1.7-1 |
Date: | 2024-05-20 |
License: | GPL |
Package quantregGrowth allows estimation of growth charts via quantile regression. Given a set of percentiles (i.e. probability values), gcrq
estimates non-crossing quantile curves as a flexible function of quantitative covariates (typically age in growth charts), and possibly additional linear terms. To ensure flexibility, B-splines with a difference penalty are employed to estimate non parametrically the curves wherein monotonicity and concavity constraints may be also set. Multiple smooth terms, including varying coefficients, are allowed and the amount of smoothness for each term is efficiently included in the model fitting algorithm, see Muggeo et al. (2021).
plot.gcrq
displays the fitted lines along with observations and poitwise confidence intervals.
Vito M.R. Muggeo
Maintainer: Vito M.R. Muggeo <[email protected]>
Muggeo VMR, Torretta F, Eilers PHC, Sciandra M, Attanasio M (2021). Multiple smoothing parameters selection in additive regression quantiles, Statistical Modelling, 21, 428-448.
Muggeo VMR (2021). Additive Quantile regression with automatic smoothness selection: the R package quantregGrowth.
https://www.researchgate.net/publication/350844895
Muggeo VMR, Sciandra M, Tomasello A, Calvo S (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology, Environ Ecol Stat, 20, 519-531.
Muggeo VMR (2018). Using the R package quantregGrowth: some examples.
https://www.researchgate.net/publication/323573492
Some references on growth charts (the first two papers employ the so-called LMS method)
Cole TJ, Green P (1992) Smoothing reference centile curves: the LMS method and penalized likelihood. Statistics in Medicine 11, 1305-1319.
Rigby RA, Stasinopoulos DM (2004) Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Statistics in Medicine 23, 3053-3076.
Wei Y, Pere A, Koenker R, He X (2006) Quantile regression methods for reference growth charts. Statistics in Medicine 25, 1369-1382.
Some references on regression quantiles
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge.
Cade BS, Noon BR (2003) A gentle introduction to quantile regression for ecologists. Front Ecol Environ 1, 412-420.
#see ?gcrq for some examples
#see ?gcrq for some examples
Computes and returns quantiles as a function of the specified covariate values
charts(fit, k, file = NULL, digits=2, conf.level=0, dataframe=FALSE, transf=NULL, se.type=c("sandw","boot"), ...)
charts(fit, k, file = NULL, digits=2, conf.level=0, dataframe=FALSE, transf=NULL, se.type=c("sandw","boot"), ...)
fit |
The object fit returned by |
k |
Numeric indicating the covariate values. If integer (and scalar, specified via |
file |
If specified, the (path) file name wherein the returned matrix including the quantiles will be written via |
digits |
Number of digits whereby the estimated quantiles are rounded. |
conf.level |
If larger than zero, the pointwise confidence intervals for the estimated quantiles are also returned. If |
dataframe |
Logical. If |
transf |
An optional character string (with |
se.type |
Which covariance matrix should be used, provided that |
... |
Further arguments passed on to |
This function is simply a wrapper for predict.gcrq
A matrix having number of columns equal to the number of quantile curves and number of rows depending k
charts
just works with models having a single smooth term. See predict.gcrq
when the model involves multiple covariates.
Vito Muggeo
## Not run: charts(_fit_, k=1L) #prediction at the minimum of covariate charts(_fit_, k=1) #prediction at covariate value 1. charts(_fit_, k=10L) ## End(Not run)
## Not run: charts(_fit_, k=1L) #prediction at the minimum of covariate charts(_fit_, k=1) #prediction at covariate value 1. charts(_fit_, k=10L) ## End(Not run)
Modelling unspecified nonlinear relationships between covariates and quantiles of the response conditional distribution. Typical example is estimation nonparametric growth charts (via quantile regression). Quantile curves are estimated via B-splines with
a penalty on the spline coefficient differences, while non-crossing and possible monotonicity and concavity restrictions are set to obtain estimates
more biologically plausible. Linear terms can be specified in the model formula. Multiple smooth terms, including varying coefficients, with automatic selection of corresponding smoothing parameters are allowed.
gcrq(formula, tau=c(.1,.25,.5,.75,.9), data, subset, weights, na.action, transf=NULL, y=TRUE, n.boot=0, eps=0.001, display=FALSE, method=c("REML","ML"), df.opt=2, df.nc=FALSE, lambda0=.1, h=0.8, lambda.max=1000, tol=0.01, it.max=15, single.lambda=TRUE, foldid=NULL, nfolds=10, lambda.ridge=0, adjX.constr=TRUE, contrasts=NULL, sparse=FALSE)
gcrq(formula, tau=c(.1,.25,.5,.75,.9), data, subset, weights, na.action, transf=NULL, y=TRUE, n.boot=0, eps=0.001, display=FALSE, method=c("REML","ML"), df.opt=2, df.nc=FALSE, lambda0=.1, h=0.8, lambda.max=1000, tol=0.01, it.max=15, single.lambda=TRUE, foldid=NULL, nfolds=10, lambda.ridge=0, adjX.constr=TRUE, contrasts=NULL, sparse=FALSE)
formula |
a standard R formula to specify the response in the left hand side, and the covariates in the right hand side, such as |
tau |
a numeric vector to specify the quantile curves of interest. Default to probability values |
data |
the dataframe where the variables required by the formula, subset and weights arguments are stored. |
subset |
optional. A vector specifying a subset of observations to be used in the fitting process. |
weights |
optional. A numeric vector specifying weights to be assigned to the observations in the fitting process. Currently unimplemented. |
na.action |
a function which indicates how the possible ‘NA’s are handled. |
transf |
an optional character string (with |
y |
logical. If |
n.boot |
Number of nonparametric (cases resampling) bootstrap samples to be used. If |
eps |
A small positive constant to ensure noncrossing curves (i.e. the minimum distance between two consecutive curves). Use it at your risk! If |
display |
Logical. Should the iterative process be printed? Ignored if no smooth is specified in the formula or if all the smoothing parameters specified in |
method |
character, |
df.opt |
How the model and term-specific degrees of freedom are computed. |
df.nc |
logical. If |
lambda0 |
the starting value for the lambdas to be estimated. Ignored if all the smoothing parameters specified in |
h |
The step halving factor affecting estimation of the smoothing parameters. Lower values lead to slower updates in the lambda values. Ignored if all the smoothing parameters specified in |
lambda.max |
The upper bound for lambda estimation. Ignored if all the smoothing parameters specified in |
tol |
The tolerance value to declare convergence. Ignored if all the smoothing parameters specified in |
it.max |
The maximum number of iterations in lambdas estimation. Ignored if all the smoothing parameters specified in |
single.lambda |
Logical. Should the smoothing parameter (for each smooth term) to be the same across the quantile curves being estimated? Ignored when just a single quantile curve is being estimated. |
foldid |
optional. A numeric vector identifying the group labels to perform cross validation to select the smoothing parameter.
Ignored if the |
nfolds |
optional. If |
lambda.ridge |
Numerical value (typically very small) to stabilize model estimation. |
adjX.constr |
logical. If |
contrasts |
an optional list. See argument |
sparse |
logical. If |
The function fits regression quantiles at specified percentiles given in tau
as a function of
covariates specified in the formula
argument. The formula
may include linear terms and one or several ps
terms to model nonlinear relationships with quantitative covariates, usually age in growth charts. When the lambda
argument
in ps()
is a negative scalar, the smoothing parameter is estimated iteratively as discussed in Muggeo et al. (2021). If a positive scalar, it represents the actual smoothing parameter value.
When the model includes a single ps
term, setting sparse=TRUE
(introduced since version 1.7-0) could reduce the computational time especially when the sample size is large and the basis involves several terms. However when sparse=TRUE
is set, no linear term is allowed and for the smooth term a full and uncentred basis is used (i.e., equivalent to setting dropc=FALSE
and center=FALSE
along with constr.fit=FALSE
in ps()
). Therefore the correct call would be
gcrq(y ~ 0+ps(x), sparse=TRUE) #if y~ps(x), a warning is printed as the intercept is not explicitly removed
which is equivalent to
gcrq(y ~ 0+ps(x, dropc=FALSE, center=FALSE)) #sparse=FALSE is the default
Smoothing parameter selection via 'K-fold' cross validation (CV) is also allowed (but not recommended) if the model includes a single ps
term: lambda
should be a vector of candidate values, and the final fit is returned at the ‘optimal’ lambda value. To select the smoothing parameter via CV, foldid
or nfolds
may be supplied. If provided, foldid
overwrites nfolds
, otherwise foldid
is obtained via random extraction, namely sample(rep(seq(nfolds), length = n))
. However selection of smoothing parameter via CV is allowed only with a unique ps
term in the formula.
This function returns an object of class gcrq
, that is a list with the following components (only the most important are listed)
coefficients |
The matrix of estimated regression parameters; the number of columns equals the number of the fitted quantile curves. |
x |
the design matrix of the final fit (including the dummy rows used by penalty). |
edf.j |
a matrix reporting the edf values for each term at each quantile curve. See the section 'Warning' below. |
rho |
a vector including the values of the objective functions at the solution for each quantile curve. |
fitted.values |
a matrix of fitted quantiles (a column for each |
residuals |
a matrix of residuals (a column for each |
D.matrix |
the penalty matrix (multiplied by the smoothing parameter value). |
D.matrix.nolambda |
the penalty matrix. |
pLin |
number of linear covariates in the model. |
info.smooth |
some information on the smoothing term (if included in the formula via |
BB |
further information on the smoothing term (if present in the formula via |
Bderiv |
if the smooth term is included, the first derivative of the B spline basis. |
boot.coef |
The array including the estimated coefficients at different bootstrap samples (provided that |
y |
the response vector (if |
contrasts |
the contrasts used, when the model contains a factor. |
xlevels |
the levels of the factors (when included) used in fitting. |
taus |
a vector of values between 0 and 1 indicating the estimated quantile curves. |
call |
the matched call. |
Variable selection is still at an experimental stage.
When including linear terms, the covariates are shifted such that
.
The options 'REML'
or 'ML'
of the argument method
, refer to how the degrees of freedom are computed to update the lambda estimates.
Currently, standard errors are obtained via the sandwich formula or the nonparametric bootstrap (case resampling). Both methods ignore uncertainty in the smoothing parameter selection.
Since version 1.2-1, computation of the approximate edf can account for the noncrossing constraints by specifying df.nc=TRUE
. That could affect model estimation when the smoothing parameter(s) have to be estimated, because the term specific edf are used to update the lambda value(s). When lambda is not being estimated (it is fixed or there is no ps
term in the formula), parameter estimate is independent of the df.nc
value. The summary.gcrq
method reports if the edf account for the noncrossing constraints.
Using ps(.., center=TRUE)
in the formula leads to lower uncertainty in the fitted curve while guaranteeing noncrossing constraints.
Currently, decomposition of Bsplines (i.e. ps(..,decom=TRUE)
) is incompatible with shape (monotonicity and concavity) restrictions and even with noncrossing constraints.
This function is based upon the package quantreg by R. Koenker.
Currently methods specific to the class "gcrq"
are print.gcrq
, summary.gcrq
, vcov.gcrq
, plot.gcrq
, predict.gcrq
, AIC.gcrq
, and logLik.gcrq
.
If the sample is not large, and/or the basis rank is large (i.e. a large number of columns) and/or there are relatively few distinct values in the covariate distribution, the fitting algorithm may fail returning error messages like the following
> Error info = 20 in stepy2: singular design
To remedy it, it suffices to change some arguments in ps()
: to decrease ndx
or deg
(even by a small amount) or
to increase (even by a small amount) the lambda value. Sometimes even by changing slightly the tau probability value (for instance from 0.80 to 0.79) can bypass the aforementioned errors.
Vito M. R. Muggeo, [email protected]
V.M.R. Muggeo, F. Torretta, P.H.C. Eilers, M. Sciandra, M. Attanasio (2021). Multiple smoothing parameters selection in additive regression quantiles, Statistical Modelling, 21: 428-448.
V. M. R. Muggeo (2021). Additive Quantile regression with automatic smoothness selection: the R package quantregGrowth. https://www.researchgate.net/publication/350844895
V. M. R. Muggeo, M. Sciandra, A. Tomasello, S. Calvo (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology, Environ Ecol Stat, 20, 519-531.
V. M. R. Muggeo (2018). Using the R package quantregGrowth: some examples.
https://www.researchgate.net/publication/323573492
## Not run: ##=== Example 1: an additive model from ?mgcv::gam d<-mgcv::gamSim(n=200, eg=1) o<-gcrq(y ~ ps(x0) + ps(x1)+ ps(x2) + ps(x3), data=d, tau=.5, n.boot=50) plot(o, res=TRUE, col=2, conf.level=.9, shade=TRUE, split=TRUE) ##=== Example 2: some simple examples involving just a single smooth data(growthData) #load data tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest m1<-gcrq(y~ps(x), tau=tauss, data=growthData) #lambda estimated.. m2<-gcrq(y~ps(x, lambda=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves #strongly penalized models m3<-gcrq(y~ps(x, lambda=1000, d=2), tau=tauss, data=growthData) #linear m4<-gcrq(y~ps(x, lambda=1000, d=3), tau=tauss, data=growthData) #quadratic #penalized model with monotonicity restrictions m5<-gcrq(y~ps(x, monotone=1, lambda=10), tau=tauss, data=growthData) #monotonicity constraints,lambda estimated, and varying penalty m6<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)"), tau=tauss, data=growthData) m6a<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)^2"), tau=tauss, data=growthData) par(mfrow=c(2,3)) plot(m1, res=TRUE, col=-1) plot(m2, pch=20, res=TRUE) plot(m3, add=TRUE, lty=2, col=4) plot(m4, pch=20, res=TRUE) plot(m5, pch=4, res=TRUE, legend=TRUE, col=2) plot(m6, lwd=2, col=3) plot(m6a, lwd=2, col=4) #select lambda via 'K-fold' CV (only with a single smooth term) m7<-gcrq(y~ps(x, lambda=seq(0,10,l=20)), tau=tauss, data=growthData) par(mfrow=c(1,2)) plot(m7, cv=TRUE) #display CV score versus lambda values plot(m7, res=TRUE, grid=list(x=5, y=8), col=4) #fit at the best lambda (by CV) ##=== Example 3: VC models n=50 x<-1:n/n mu0<-10+sin(2*pi*x) mu1<- 7+4*x y<-c(mu0,mu1)+rnorm(2*n)*.2 #small noise.. just to illustrate.. x<-c(x,x) z<-rep(0:1, each=n) # approach 1: a smooth in each *factor* level g<-factor(z) o <-gcrq(y~ g+ps(x,by=g), tau=.5) predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(0,1)))) par(mfrow=c(1,2)) plot(x[1:50],mu0,type="l") plot(o, term=1, add=TRUE) points(c(.3,.7), predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(0,0)))), pch=4, lwd=3,col=2) plot(x[1:50],mu1,type="l") plot(o, term=2, add=T, shift=coef(o)["g1",], col=3) #note the argument 'shift' points(c(.3,.7), predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(1,1)))), pch=4, lwd=3,col=3) # approach 2: a general smooth plus the (smooth) 'interaction' with a continuous covariate.. b1 <-gcrq(y~ ps(x) + z+ ps(x,by=z), tau=.5) par(mfrow=c(1,2)) plot(x[1:50],mu0,type="l") plot(b1, add=TRUE, term=1) #plot the 1st smooth term #plot the 2nd smooth of 'b1' (which is actually the difference mu1-mu0) plot(x[1:50], mu1-mu0, type="l") plot(b1, term=2, add=TRUE, interc=FALSE, shift=coef(b1)["z",]) ##=== Example 4: random intercepts example n=50 x<-1:n/n set.seed(69) z<- sample(1:15, size=n, replace=TRUE) #table(z) #true model: linear effect + 3 non-null coeffs when z= 3, 7, and 13 y<-2*x+ I(z==3)- I(z==7) + 2*I(z==13) + rnorm(n)*.15 id<-factor(z) o <-gcrq(y~x+ps(id), tau=.5) plot(o, term=1) #plot the subject-specific intercepts #== variable selection n=50 p=30 p1<-10 X<-matrix(rnorm(n*p,5),n,p) b<-rep(0,p) id<-sample(1:p, p1) b[id]<-round(runif(p1,.5,2),2) b[id]<-b[id]* sign(ifelse(runif(p1)<.5,1,-1)) lp <- drop(tcrossprod(X,t(b))) y <- 2+lp+rnorm(n)*1.5 gcrq(y~ps(X), tau=.7) ## End(Not run)
## Not run: ##=== Example 1: an additive model from ?mgcv::gam d<-mgcv::gamSim(n=200, eg=1) o<-gcrq(y ~ ps(x0) + ps(x1)+ ps(x2) + ps(x3), data=d, tau=.5, n.boot=50) plot(o, res=TRUE, col=2, conf.level=.9, shade=TRUE, split=TRUE) ##=== Example 2: some simple examples involving just a single smooth data(growthData) #load data tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest m1<-gcrq(y~ps(x), tau=tauss, data=growthData) #lambda estimated.. m2<-gcrq(y~ps(x, lambda=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves #strongly penalized models m3<-gcrq(y~ps(x, lambda=1000, d=2), tau=tauss, data=growthData) #linear m4<-gcrq(y~ps(x, lambda=1000, d=3), tau=tauss, data=growthData) #quadratic #penalized model with monotonicity restrictions m5<-gcrq(y~ps(x, monotone=1, lambda=10), tau=tauss, data=growthData) #monotonicity constraints,lambda estimated, and varying penalty m6<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)"), tau=tauss, data=growthData) m6a<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)^2"), tau=tauss, data=growthData) par(mfrow=c(2,3)) plot(m1, res=TRUE, col=-1) plot(m2, pch=20, res=TRUE) plot(m3, add=TRUE, lty=2, col=4) plot(m4, pch=20, res=TRUE) plot(m5, pch=4, res=TRUE, legend=TRUE, col=2) plot(m6, lwd=2, col=3) plot(m6a, lwd=2, col=4) #select lambda via 'K-fold' CV (only with a single smooth term) m7<-gcrq(y~ps(x, lambda=seq(0,10,l=20)), tau=tauss, data=growthData) par(mfrow=c(1,2)) plot(m7, cv=TRUE) #display CV score versus lambda values plot(m7, res=TRUE, grid=list(x=5, y=8), col=4) #fit at the best lambda (by CV) ##=== Example 3: VC models n=50 x<-1:n/n mu0<-10+sin(2*pi*x) mu1<- 7+4*x y<-c(mu0,mu1)+rnorm(2*n)*.2 #small noise.. just to illustrate.. x<-c(x,x) z<-rep(0:1, each=n) # approach 1: a smooth in each *factor* level g<-factor(z) o <-gcrq(y~ g+ps(x,by=g), tau=.5) predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(0,1)))) par(mfrow=c(1,2)) plot(x[1:50],mu0,type="l") plot(o, term=1, add=TRUE) points(c(.3,.7), predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(0,0)))), pch=4, lwd=3,col=2) plot(x[1:50],mu1,type="l") plot(o, term=2, add=T, shift=coef(o)["g1",], col=3) #note the argument 'shift' points(c(.3,.7), predict(o, newdata=data.frame(x=c(.3,.7), g=factor(c(1,1)))), pch=4, lwd=3,col=3) # approach 2: a general smooth plus the (smooth) 'interaction' with a continuous covariate.. b1 <-gcrq(y~ ps(x) + z+ ps(x,by=z), tau=.5) par(mfrow=c(1,2)) plot(x[1:50],mu0,type="l") plot(b1, add=TRUE, term=1) #plot the 1st smooth term #plot the 2nd smooth of 'b1' (which is actually the difference mu1-mu0) plot(x[1:50], mu1-mu0, type="l") plot(b1, term=2, add=TRUE, interc=FALSE, shift=coef(b1)["z",]) ##=== Example 4: random intercepts example n=50 x<-1:n/n set.seed(69) z<- sample(1:15, size=n, replace=TRUE) #table(z) #true model: linear effect + 3 non-null coeffs when z= 3, 7, and 13 y<-2*x+ I(z==3)- I(z==7) + 2*I(z==13) + rnorm(n)*.15 id<-factor(z) o <-gcrq(y~x+ps(id), tau=.5) plot(o, term=1) #plot the subject-specific intercepts #== variable selection n=50 p=30 p1<-10 X<-matrix(rnorm(n*p,5),n,p) b<-rep(0,p) id<-sample(1:p, p1) b[id]<-round(runif(p1,.5,2),2) b[id]<-b[id]* sign(ifelse(runif(p1)<.5,1,-1)) lp <- drop(tcrossprod(X,t(b))) y <- 2+lp+rnorm(n)*1.5 gcrq(y~ps(X), tau=.7) ## End(Not run)
The growthData
data frame has 200 rows and 3 columns.
data(growthData)
data(growthData)
A data frame with 200 observations on the following 3 variables.
x
the supposed ‘age’ variable.
y
the supposed growth variable (e.g. weight).
z
an additional variable to be considered in the model.
Simulated data to illustrate capabilities of the package.
data(growthData) with(growthData, plot(x,y))
data(growthData) with(growthData, plot(x,y))
The function returns the log-likelihood value(s) evaluated at the estimated coefficients
## S3 method for class 'gcrq' logLik(object, summ=TRUE, ...) ## S3 method for class 'gcrq' AIC(object, ..., k=2, bondell=FALSE)
## S3 method for class 'gcrq' logLik(object, summ=TRUE, ...) ## S3 method for class 'gcrq' AIC(object, ..., k=2, bondell=FALSE)
object |
A |
summ |
If |
k |
Optional numeric specifying the penalty of the edf in the AIC formula. |
bondell |
Logical. If |
... |
optional arguments (nothing in |
The 'logLikelihood' is computed by assuming an asymmetric Laplace distribution for the response as in logLik.rq
, namely , where
is the minimized objective function. When there are multiple quantile curves
(and
summ=TRUE
) the formula is
AIC.gcrq
simply returns -2*logLik + k*edf
where k
is 2 or log(n)
.
The log likelihood(s) of the model fit object
Vito Muggeo
Bondell HD, Reich BJ, Wang H (2010) Non-crossing quantile regression curve estimation, Biometrika, 97: 825-838.
## logLik(o) #a unique value (o is the fit object from gcrq) ## logLik(o, summ=FALSE) #vector of the log likelihood values ## AIC(o, k=-1) #BIC
## logLik(o) #a unique value (o is the fit object from gcrq) ## logLik(o, summ=FALSE) #vector of the log likelihood values ## AIC(o, k=-1) #BIC
These are internal functions of package quantregGrowth
and should be not
called by the user.
ncross.rq.fitXB(y, x, B=NULL, X=NULL, taus, monotone=FALSE, concave=FALSE, nomiBy=NULL, byVariabili=NULL, ndx=10, deg=3, dif=3, lambda=0, eps=.0001, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=FALSE, dropvcList=FALSE, centerList=FALSE, ridgeList=FALSE, ps.matrix.list=FALSE, colmeansB=NULL, Bconstr=NULL, adjX.constr=TRUE, adList=FALSE, it.j=10, myeps=NULL, ...) ncross.rq.fitXBsparse(y, x, B=NULL, X=NULL, taus, monotone=FALSE, concave=FALSE, nomiBy=NULL, byVariabili=NULL, ndx=10, deg=3, dif=3, lambda=0, eps=.0001, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=FALSE, dropvcList=FALSE, centerList=FALSE, ridgeList=FALSE, ps.matrix.list=FALSE, colmeansB=NULL, Bconstr=NULL, adjX.constr=TRUE, adList=FALSE, it.j=10, myeps=NULL, ...) ncross.rq.fitX(y, X = NULL, taus, adjX.constr=TRUE, lambda.ridge = 0, eps = 1e-04, ...) gcrq.rq.cv(y, B, X, taus, monotone, concave, ndx, lambda, deg, dif, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=vcList, dropvcList=FALSE, nfolds=10, foldid=NULL, eps=.0001, sparse=FALSE, ...)
ncross.rq.fitXB(y, x, B=NULL, X=NULL, taus, monotone=FALSE, concave=FALSE, nomiBy=NULL, byVariabili=NULL, ndx=10, deg=3, dif=3, lambda=0, eps=.0001, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=FALSE, dropvcList=FALSE, centerList=FALSE, ridgeList=FALSE, ps.matrix.list=FALSE, colmeansB=NULL, Bconstr=NULL, adjX.constr=TRUE, adList=FALSE, it.j=10, myeps=NULL, ...) ncross.rq.fitXBsparse(y, x, B=NULL, X=NULL, taus, monotone=FALSE, concave=FALSE, nomiBy=NULL, byVariabili=NULL, ndx=10, deg=3, dif=3, lambda=0, eps=.0001, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=FALSE, dropvcList=FALSE, centerList=FALSE, ridgeList=FALSE, ps.matrix.list=FALSE, colmeansB=NULL, Bconstr=NULL, adjX.constr=TRUE, adList=FALSE, it.j=10, myeps=NULL, ...) ncross.rq.fitX(y, X = NULL, taus, adjX.constr=TRUE, lambda.ridge = 0, eps = 1e-04, ...) gcrq.rq.cv(y, B, X, taus, monotone, concave, ndx, lambda, deg, dif, var.pen=NULL, penMatrix=NULL, lambda.ridge=0, dropcList=FALSE, decomList=FALSE, vcList=vcList, dropvcList=FALSE, nfolds=10, foldid=NULL, eps=.0001, sparse=FALSE, ...)
y |
the responses vector. see |
x |
the covariate supposed to have a nonlinear relationship. |
B |
the B-spline basis. |
X |
the design matrix for the linear parameters. |
taus |
the percentiles of interest. |
monotone |
numerical value (-1/0/+1) to define a non-increasing, unconstrained, and non-decreasing flexible fit, respectively. |
concave |
numerical value (-1/0/+1) to possibly define concave or convex fits. |
nomiBy |
useful for VC models (when |
byVariabili |
useful for VC models (when |
ndx |
number of internal intervals within the covariate range, see |
deg |
spline degree, see |
dif |
difference order of the spline coefficients in the penalty term. |
lambda |
smoothing parameter value(s), see |
eps |
tolerance value. |
var.pen |
Varying penalty, see |
penMatrix |
Specified penalty matrix, see |
lambda.ridge |
a (typically very small) value, see |
dropcList |
see |
decomList |
see |
vcList |
to indicate if the smooth is VC or not, see |
dropvcList |
see |
centerList |
see |
ridgeList |
see |
ps.matrix.list |
nothing relevant for the user. |
colmeansB |
see |
Bconstr |
see |
foldid |
vector (optional) to perform cross validation, see the same arguments in |
nfolds |
number of folds for crossvalidation, see the same arguments in |
cv |
returning cv scores; see the same arguments in |
adjX.constr |
logical to shift the linear covariates. Appropriate only with linear terms. |
adList |
see |
it.j |
Ignore. |
myeps |
Ignore. |
sparse |
logical, meaning if sparse computations have to be used. |
... |
optional. |
These functions are called by gcrq
to fit growth charts based on regression
quantiles with non-crossing and monotonicity restrictions. The computational methods are based on the package
quantreg by R. Koenker and details are described in the reference paper.
A list of fit information.
Vito M. R. Muggeo
##See ?gcrq
##See ?gcrq
Displaying the estimated growth charts from a gcrq
fit.
## S3 method for class 'gcrq' plot(x, term=NULL, add = FALSE, res = FALSE, conf.level=0, axis.tau=FALSE, interc=TRUE, se.interc=FALSE, legend = FALSE, select.tau, deriv = FALSE, cv = FALSE, transf=NULL, lambda0=FALSE, shade=FALSE, overlap=NULL, rug=FALSE, overall.eff=TRUE, grid=NULL, smoos=NULL, split=FALSE, shift=0, type=c("sandw","boot"), n.points=NULL, ...)
## S3 method for class 'gcrq' plot(x, term=NULL, add = FALSE, res = FALSE, conf.level=0, axis.tau=FALSE, interc=TRUE, se.interc=FALSE, legend = FALSE, select.tau, deriv = FALSE, cv = FALSE, transf=NULL, lambda0=FALSE, shade=FALSE, overlap=NULL, rug=FALSE, overall.eff=TRUE, grid=NULL, smoos=NULL, split=FALSE, shift=0, type=c("sandw","boot"), n.points=NULL, ...)
x |
a fitted |
term |
the variable name or its index in the formula entering the model. Can be vector. Both linear ad spline terms (i.e. included in the model via |
interc |
Should the smooth term be plotted along with the model intercept (provided it is included in the model)? Of course such argument is ignored if the smooth term has been called via |
se.interc |
logical. If |
add |
logical. If |
res |
logical. If |
conf.level |
logical. If larger than zero, pointwise confidence intervals for the fitted quantile curve are also shown (at the confidence level specified by |
axis.tau |
logical. If |
legend |
logical. If |
select.tau |
an optional numeric vector to draw only some of the fitted quantiles. Percentile values or integers 1 to |
deriv |
logical. If |
cv |
logical. If |
transf |
An optional character string (with |
lambda0 |
logical. If |
shade |
logical. If |
overlap |
NULL or numeric (scalar or vector). If provided and different from |
rug |
logical. If |
overall.eff |
logical. If the smooth term has been called via |
grid |
if provided, a grid of horizontal and vertical lines is drawn. |
smoos |
logical, indicating if the residuals (provided that |
split |
logical. If there are multiple terms (both smooth and linear) and |
shift |
Numerical value(s) to be added to the curve(s) to be plotted. If vector with length equal to the number of quantile curves to plot, the |
type |
If |
n.points |
On how many values the plotted lines should rely on? If |
... |
Additional graphical parameters: |
Takes a "gcrq" object and diplays the fitted quantile curves as a function of the covariate specified in term
. If conf.level
>0 pointwise confidence intervals are also displayed. When the object contains the component cv
, plot.gcrq
can display cross-validation scores against the lambda values, see argument cv
. If a single quantile curve is being displayed, the default 'ylab' includes the relevant edf value (leaving out the basis intercept). If axis.tau=TRUE
and the fit includes several quantile curves, plot.gcrq()
portrays the estimated coefficients versus the probability values. If term
refers to a categorical variable, the point estimates against the categories are plotted (conf.level
is ignored).
The function simply generates a new plot or adds fitted curves to an existing one.
Plotting non-crossing curves could depend on the arguments 'interc' and 'shift', in turn depending on how the model has been specified. Take care about that!
Vito M. R. Muggeo
## Not run: ## use the fits from ?gcrq #The additive model plot(o, res=TRUE, col=2, conf.level=.9, shade=TRUE, split=TRUE) par(mfrow=c(2,2)) plot(m5, select.tau=c(.1,.5,.9), overlap=0.6, legend=TRUE) plot(m5, grid=list(x=8,y=5), lty=1) #a 8 times 5 grid.. plot(m7, cv=TRUE) #display CV score versus lambda values plot(m7, res=TRUE, grid=list(x=5, y=8), col=4) #fitted curves at the best lambda value ## End(Not run)
## Not run: ## use the fits from ?gcrq #The additive model plot(o, res=TRUE, col=2, conf.level=.9, shade=TRUE, split=TRUE) par(mfrow=c(2,2)) plot(m5, select.tau=c(.1,.5,.9), overlap=0.6, legend=TRUE) plot(m5, grid=list(x=8,y=5), lty=1) #a 8 times 5 grid.. plot(m7, cv=TRUE) #display CV score versus lambda values plot(m7, res=TRUE, grid=list(x=5, y=8), col=4) #fitted curves at the best lambda value ## End(Not run)
Takes a "gcrq" objects and computes fitted values
## S3 method for class 'gcrq' predict(object, newdata, se.fit=FALSE, transf=NULL, xreg, type=c("sandw","boot"), ...)
## S3 method for class 'gcrq' predict(object, newdata, se.fit=FALSE, transf=NULL, xreg, type=c("sandw","boot"), ...)
object |
a fitted |
newdata |
a dataframe including all the covariates of the model. The smooth term is represented by a covariate
and proper basis functions will be build accordingly. If omitted, the fitted values are used. Ignored if |
se.fit |
logical. If |
transf |
An optional character string (with |
xreg |
the design matrix for which predictions are requested. If provided, |
type |
If |
... |
arguments passed to other functions |
predict.gcrq
computes fitted quantiles as a function of observations included in newdata
or xreg
.
Either newdata
or xreg
have to be supplied, but newdata
is ignored
when xreg
is provided.
If se.fit=FALSE
, a matrix of fitted values with number of rows equal to number of rows of input data
and number of columns depending on the number of fitted quantile curves (i.e length of taus
). If se.fit=TRUE
, a list of matrices (fitted values and standard errors).
Vito M.R. Muggeo
##see ?gcrq ## predict(m1, newdata=data.frame(x=c(.3,.7)))
##see ?gcrq ## predict(m1, newdata=data.frame(x=c(.3,.7)))
Printing the most important feautures of a gcrq model.
## S3 method for class 'gcrq' print(x, digits = max(3, getOption("digits") - 4), ...)
## S3 method for class 'gcrq' print(x, digits = max(3, getOption("digits") - 4), ...)
x |
object of class |
digits |
number of digits to be printed |
... |
arguments passed to other functions |
Vito M.R. Muggeo
Function used to define the smooth term (via P-splines) within the gcrq formula. The function actually does not evaluate a (spline) smooth, but simply it passes relevant information to proper fitter functions.
ps(..., lambda = -1, d = 3, by=NULL, ndx = NULL, deg = 3, knots=NULL, monotone = 0, concave = 0, var.pen = NULL, pen.matrix=NULL, dropc=TRUE, center=TRUE, K=NULL, decom=FALSE, constr.fit=TRUE, shared.pen=FALSE, st=FALSE, ad=0)
ps(..., lambda = -1, d = 3, by=NULL, ndx = NULL, deg = 3, knots=NULL, monotone = 0, concave = 0, var.pen = NULL, pen.matrix=NULL, dropc=TRUE, center=TRUE, K=NULL, decom=FALSE, constr.fit=TRUE, shared.pen=FALSE, st=FALSE, ad=0)
... |
The covariate supposed to have a nonlinear relationship with the quantile curve(s) being estimated. A B-spline is built, and a (difference) penalty is applied. In growth charts this variable is typically the age. If the covariate is a factor, category-specific coefficients are estimated subject to a lasso penalty. See the last example in ?gcrq. A matrix of (continuous) covariates can be also supplied to perfom variable selection (among its columns). |
lambda |
A supplied smoothing parameter for the smooth term. If it is negative scalar, the smoothing parameter is estimated iteratively as discussed in Muggeo et al. (2021). If a positive scalar, it represents the actual smoothing parameter. If it is a vector, cross validation is performed to select the ‘best’ value. See Details in |
d |
The difference order of the penalty. Default to 3 Ignored if |
by |
if different from |
ndx |
The number of intervals of the covariate range used to build the B-spline basis. Non-integer values are rounded by |
deg |
The degree of the spline polynomial. Default to 3. The B-spline basis is composed by |
knots |
The knots locations. If |
monotone |
Numeric value to set up monotonicity restrictions on the first derivative of fitted smooth function
|
concave |
Numeric value to set up monotonicity restrictions on the second derivative of fitted smooth function
|
var.pen |
A character indicating the varying penalty. See Details. |
pen.matrix |
if provided, a penalty matrix |
dropc |
logical. Should the first column of the B-spline basis be dropped for the basis identifiability? Default to |
center |
logical. If |
K |
A scalar tuning the selection of wiggliness of the smoothed curve when |
decom |
logical. If |
constr.fit |
logical. If |
shared.pen |
logical. If |
st |
logical. If |
ad |
a positive number to carry out a form of adaptive lasso. More specifically, at each step of the iterative algorithm, the penalty is |
If a numeric variable has been supplied, ps()
builds a B-spline basis with number of columns equal to ndx+deg
(or length(knots)-deg-1
). However, unless dropc=FALSE
is specified, the first column is removed for identifiability, and the spline coefficients are penalized via differences of order d
; d=0
leads to a penalty on the coefficients themselves. If pen.matrix
is supplied, d
is ignored. Since versions 1.5-0 and 1.6-0, a factor or matrix can be supplied.
lambda
is the tuning parameter, fixed or to be estimated. When lambda
=0 an unpenalized (and typically wiggly) fit is obtained, and as lambda increases the curve gets smoother till a d-1
degree polynomial. At 'intermediate' lambda values, the fitted curve is a piecewise polynomial of degree d-1
.
It is also possible to put a varying penalty via the argument var.pen
. Namely for a
constant smoothing (var.pen=NULL
) the penalty is where
is the k-th difference (of order
d
) of the spline coefficients. For instance if ,
where the
are the spline coefficients.
When a varying penalty is set, the penalty becomes
where the weights
depend on
var.pen
; for instance var.pen="((1:k)^2)"
results in . See models
m6
and m6a
in the examples of gcrq
.
If decom=TRUE
, the smooth can be plotted with or without the fixed part, see overall.eff
in the function plot.gcrq
.
The function simply returns the covariate with added attributes relevant to smooth term.
For shape-constrained fits, use constr.fit=FALSE
only if you are using a single full and uncentred basis, namely something like gcrq(y~0+ps(x, center=FALSE, dropc=FALSE, monotone=1, constr.fit=FALSE),..)
.
Vito M. R. Muggeo
Muggeo VMR, Torretta F, Eilers PHC, Sciandra M, Attanasio M (2021). Multiple smoothing parameters selection in additive regression quantiles, Statistical Modelling, 21, 428-448.
For a general discussion on using B-spline and penalties in regression model see
Eilers PHC, Marx BD. (1996) Flexible smoothing with B-splines and penalties. Statistical Sciences, 11:89-121.
##see ?gcrq ##gcrq(y ~ ps(x),..) #it works (default: center = TRUE, dropc = TRUE) ##gcrq(y ~ 0 + ps(x, center = TRUE, dropc = FALSE)) #it does NOT work ##gcrq(y ~ 0 + ps(x, center = FALSE, dropc = FALSE)) #it works
##see ?gcrq ##gcrq(y ~ ps(x),..) #it works (default: center = TRUE, dropc = TRUE) ##gcrq(y ~ 0 + ps(x, center = TRUE, dropc = FALSE)) #it does NOT work ##gcrq(y ~ 0 + ps(x, center = FALSE, dropc = FALSE)) #it works
Age, height and weight in a sample of 1424 Italian children born in Sicily in the eighties
data("SiChildren")
data("SiChildren")
A data frame with 1424 observations on the following 3 variables.
age
age in years
height
child height (in centimeter)
weight
child weight (in kilo)
Data refer on the usual antropometric measures of Italian boys born in Sicily in the first years of 80s. Data have been kindly provided by prof M. Chiodi
Gattuccio F., and Pirronello S., and Chiodi M (1988) Possibilita' di identificazione di tipologie evolutive del periodo puberale: proposta di una metodica pr finalita' predittive, Rivista di pediatria preventiva e sociale nipiologia, 189-199
data(SiChildren) ## see the package vignette for an example using such dataset
data(SiChildren) ## see the package vignette for an example using such dataset
summary and print methods for class gcrq
## S3 method for class 'gcrq' summary(object, type=c("sandw","boot"), digits = max(3, getOption("digits") - 3), signif.stars =getOption("show.signif.stars"), ...)
## S3 method for class 'gcrq' summary(object, type=c("sandw","boot"), digits = max(3, getOption("digits") - 3), signif.stars =getOption("show.signif.stars"), ...)
object |
An object of class |
type |
Which covariance matrix should be used to compute the estimate standard errors? |
digits |
controls number of digits printed in output. |
signif.stars |
Should significance stars be printed? |
... |
further arguments. |
summary.gcrq
returns some information on the fitted quantile curve at different probability values, such as the estimates, standard errors, values of check (objective) function values at solution. Currently there is no print.summary.gcrq
method, so
summary.gcrq
itself prints results.
The SIC returned by print.gcrq
and summary.gcrq
is computed as , where
is the usual asymmetric sum of residuals (in absolute value). For multiple
quantiles it is
. Note that computation of SIC in
AIC.gcrq
relies on the Laplace assumption for the response.
Vito M.R. Muggeo
## see ?gcrq ##summary(o)
## see ?gcrq ##summary(o)
Returns the variance-covariance matrix of the parameter estimates of a fitted gcrq model object.
## S3 method for class 'gcrq' vcov(object, term, type=c("sandw","boot"), ...)
## S3 method for class 'gcrq' vcov(object, term, type=c("sandw","boot"), ...)
object |
a fitted model object of class |
term |
if specified, the returned covariance matrix includes entries relevant to parameter estimates for that 'term' only. If missing, the returned matrices refer to all model parameter estimates. Currently |
type |
Which cov matrix should be returned? |
... |
additional arguments. |
Bootstrap-based covariance matrix, i.e. type="boot"
, is computable only if the object fit has been obtained by specifying n.boot>0
in gcrq()
.
A list (its length equal the length of tau
specified in gcrq
) of square matrices. Namely the list includes the covariance matrices of the parameter estimates for each regression quantile curve.
Vito Muggeo