Title: | GEE Solver for Correlated Nominal or Ordinal Multinomial Responses |
---|---|
Description: | GEE solver for correlated nominal or ordinal multinomial responses using a local odds ratios parameterization. |
Authors: | Anestis Touloumis [aut, cre] |
Maintainer: | Anestis Touloumis <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.9.1 |
Built: | 2025-01-13 05:35:00 UTC |
Source: | https://github.com/anestistouloumis/multgee |
Rheumatoid self-assessment scores for 302 patients, measured on a five-level ordinal response scale at three follow-up times.
arthritis
arthritis
A data frame with 906 observations on the following 7 variables:
Patient identifier variable.
Self-assessment score of rheumatoid arthritis measured on a five-level ordinal response scale.
Coded as (1) for female and (2) for male.
Recorded at the baseline.
Treatment group variable, coded as (1) for the placebo group and (2) for the drug group.
Self-assessment score of rheumatoid arthritis at the baseline.
Follow-up time recorded in months.
Lipsitz, S.R. and Kim, K. and Zhao, L. (1994) Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13, 1149–1163.
data(arthritis) str(arthritis)
data(arthritis) str(arthritis)
Computes confidence intervals for one or more parameters in a fitted LORgee model.
## S3 method for class 'LORgee' confint(object, parm, level = 0.95, method = "robust", ...)
## S3 method for class 'LORgee' confint(object, parm, level = 0.95, method = "robust", ...)
object |
a fitted model LORgee object. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
method |
character indicating whether the sandwich (robust) covariance
matrix ( |
... |
additional argument(s) for methods. |
The (Wald-type) confidence intervals are calculated using either the sandwich (robust) or the model-based (naive) covariance matrix.
A matrix (or vector) with columns giving lower and upper confidence
limits for each parameter. These will be labelled as (1-level)/2
and
1 - (1-level)/2
in % (by default 2.5% and 97.5%).
fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, LORstr = "uniform", repeated = time) confint(fitmod)
fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, LORstr = "uniform", repeated = time) confint(fitmod)
Reports commonly used criteria for variable selection
and for selecting the "working" association structure for one or several
fitted models from the multgee
package.
gee_criteria(object, ...)
gee_criteria(object, ...)
object |
an object of the class |
... |
optionally more objects of the class |
The Quasi Information Criterion (QIC), the Correlation Information Criterion (CIC) and the Rotnitzky and Jewell Criterion (RJC) are used for selecting the best association structure. The QICu criterion is used for selecting the best subset of covariates. When choosing among GEE models with different association structures but with the same subset of covariates, the model with the smallest value of QIC, CIC or RJC should be preffered. When choosing between GEE models with different number of covariates, the model with the smallest QICu value should be preferred.
A vector or matrix with the QIC, QICu, CIC, RJC and the number of regression parameters (including intercepts).
Anestis Touloumis
Hin, L.Y. and Wang, Y.G. (2009) Working correlation structure identification in generalized estimating equations. Statistics in Medicine 28, 642–658.
Pan, W. (2001) Akaike's information criterion in generalized estimating equations. Biometrics 57, 120–125.
Rotnitzky, A. and Jewell, N.P. (1990) Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika 77, 485–497.
data(arthritis) fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") fitmod1 <- update(fitmod, formula = .~. + age + factor(sex)) gee_criteria(fitmod, fitmod1)
data(arthritis) fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") fitmod1 <- update(fitmod, formula = .~. + age + factor(sex)) gee_criteria(fitmod, fitmod1)
Housing status for 362 severely mentally ill homeless subjects measured at baseline and at three follow-up times.
housing
housing
A data frame with 1448 observations on the following 4 variables:
Subject identifier variable.
Housing status response, coded as (1) for street living, (2) for community living and (3) for independent housing.
Time recorded in months.
Section 8 rent certificate indicator.
Hulrburt M.S., Wood, P.A. and Hough, R.L. (1996) Providing independent housing for the homeless mentally ill: a novel approach to evaluating longitudinal housing patterns. Journal of Community Psychology, 24, 291–310.
data(housing) str(housing)
data(housing) str(housing)
Utility function to assess the underlying association pattern.
intrinsic.pars(y = y, data = parent.frame(), id = id, repeated = NULL, rscale = "ordinal")
intrinsic.pars(y = y, data = parent.frame(), id = id, repeated = NULL, rscale = "ordinal")
y |
a vector that identifies the response vector of the desired marginal model. |
data |
an optional data frame containing the variables provided in
|
id |
a vector that identifies the clusters. |
repeated |
an optional vector that identifies the order of observations within each cluster. |
rscale |
a character string that indicates the nature of the response
scale. Options include " |
Simulation studies in Touloumis et al. (2013) suggested that if the range of the intrinsic parameter estimates is small then simple local odds ratios structures should adequately approximate the association pattern. Otherwise more complicated structures should be employed.
The intrinsic parameters are estimated under the heterogeneous linear-by-linear association model (Agresti, 2013) for ordinal response categories and under the RC-G(1) model (Becker and Clogg, 1989) with homogeneous score parameters for nominal response categories.
A detailed description of the arguments id
and repeated
can be
found in the Details section of nomLORgee or ordLORgee.
Returns a numerical vector with the estimated intrinsic parameters.
Anestis Touloumis
Agresti, A. (2013) Categorical Data Analysis. New York: John Wiley and Sons, Inc., 3rd Edition.
Becker, M. and Clogg, C. (1989) Analysis of sets of two-way contingency tables using association models. Journal of the American Statistical Association 84, 142–151.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
data(arthritis) intrinsic.pars(y, arthritis, id, time, rscale = "ordinal") ## The intrinsic parameters do not vary much. The 'uniform' local odds ratios ## structure might be a good approximation for the association pattern. set.seed(1) data(housing) intrinsic.pars(y, housing, id, time, rscale = "nominal") ## The intrinsic parameters vary. The 'RC' local odds ratios structure ## might be a good approximation for the association pattern.
data(arthritis) intrinsic.pars(y, arthritis, id, time, rscale = "ordinal") ## The intrinsic parameters do not vary much. The 'uniform' local odds ratios ## structure might be a good approximation for the association pattern. set.seed(1) data(housing) intrinsic.pars(y, housing, id, time, rscale = "nominal") ## The intrinsic parameters vary. The 'RC' local odds ratios structure ## might be a good approximation for the association pattern.
Control variables for the Iterative Proportion Fitting Procedure function
ipfp
.
ipfp.control(tol = 1e-06, maxit = 200)
ipfp.control(tol = 1e-06, maxit = 200)
tol |
positive convergence tolerance. The algorithm converges when the
absolute difference between the observed and the given row or column totals
is less than or equal to |
maxit |
positive integer that indicates the maximum number of iterations. |
Currently the function ipfp
is internal.
Anestis Touloumis
Control variables for the GEE solver in the nomLORgee and ordLORgee functions.
LORgee_control(tolerance = 0.001, maxiter = 15, verbose = FALSE, TRACE = FALSE)
LORgee_control(tolerance = 0.001, maxiter = 15, verbose = FALSE, TRACE = FALSE)
tolerance |
positive convergence tolerance. The algorithm converges
when the maximum of the absolute relative difference in parameter estimates
is less than or equal to |
maxiter |
positive integer that indicates the maximum number of iterations in the Fisher-scoring iterative algorithm. |
verbose |
logical that indicates if output should be printed at each iteration. |
TRACE |
logical that indicates if the parameter estimates and the convergence criterion at each iteration should be saved. |
Anestis Touloumis
data(arthritis) fitmod <- ordLORgee(y ~ factor(trt) + factor(baseline) + factor(time), data = arthritis, id = id, repeated = time) ## A one-step GEE estimator fitmod1 <- update(fitmod, control = LORgee_control(maxiter = 1)) coef(fitmod) coef(fitmod1)
data(arthritis) fitmod <- ordLORgee(y ~ factor(trt) + factor(baseline) + factor(time), data = arthritis, id = id, repeated = time) ## A one-step GEE estimator fitmod1 <- update(fitmod, control = LORgee_control(maxiter = 1)) coef(fitmod) coef(fitmod1)
Utility function to create a square probability matrix that satisfies the specified local odds ratios structure.
matrixLOR(x)
matrixLOR(x)
x |
a square matrix with positive entries that describes the desired local odds ratios matrix. |
It is designed to ease the construction of the argument LORterm
in the nomLORgee and ordLORgee functions.
Returns a square probability matrix that satisfies the local odds
ratios structure defined by x
.
Caution is needed for local odds ratios close to zero.
Anestis Touloumis
## Illustrating the construction of a "fixed" local odds ratios structure ## using the arthritis dataset. Here, we assume a uniform local odds ratios ## structure equal to 2 for each time pair. ## Create the uniform local odds ratios structure. lorterm <- matrixLOR(matrix(2, 4, 4)) ## Create the LORterm argument. lorterm <- c(lorterm) lorterm <- matrix(c(lorterm), 3, 25, TRUE) ## Fit the marginal model. data(arthritis) fitmod <- ordLORgee(y ~ factor(trt) + factor(time) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "fixed", LORterm = lorterm) fitmod
## Illustrating the construction of a "fixed" local odds ratios structure ## using the arthritis dataset. Here, we assume a uniform local odds ratios ## structure equal to 2 for each time pair. ## Create the uniform local odds ratios structure. lorterm <- matrixLOR(matrix(2, 4, 4)) ## Create the LORterm argument. lorterm <- c(lorterm) lorterm <- matrix(c(lorterm), 3, 25, TRUE) ## Fit the marginal model. data(arthritis) fitmod <- ordLORgee(y ~ factor(trt) + factor(time) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "fixed", LORterm = lorterm) fitmod
Solving the generalized estimating equations for correlated nominal multinomial responses assuming a baseline category logit model for the marginal probabilities.
nomLORgee(formula = formula(data), data = parent.frame(), id = id, repeated = NULL, bstart = NULL, LORstr = "time.exch", LORem = "3way", LORterm = NULL, add = 0, homogeneous = TRUE, control = LORgee_control(), ipfp.ctrl = ipfp.control(), IM = "solve")
nomLORgee(formula = formula(data), data = parent.frame(), id = id, repeated = NULL, bstart = NULL, LORstr = "time.exch", LORem = "3way", LORterm = NULL, add = 0, homogeneous = TRUE, control = LORgee_control(), ipfp.ctrl = ipfp.control(), IM = "solve")
formula |
a formula expression as for other regression models for multinomial responses. An intercept term must be included. |
data |
an optional data frame containing the variables provided in
|
id |
a vector that identifies the clusters. |
repeated |
an optional vector that identifies the order of observations within each cluster. |
bstart |
a vector that includes an initial estimate for the marginal regression parameter vector. |
LORstr |
a character string that indicates the marginalized local odds
ratios structure. Options include |
LORem |
a character string that indicates if the marginalized local
odds ratios structure is estimated simultaneously ( |
LORterm |
a matrix that satisfies the user-defined local odds ratios
structure. It is ignored unless |
add |
a positive constant to be added at each cell of the full marginalized contingency table in the presence of zero observed counts. |
homogeneous |
a logical that indicates homogeneous score parameters
when |
control |
a vector that specifies the control variables for the GEE solver. |
ipfp.ctrl |
a vector that specifies the control variables for the
function |
IM |
a character string that indicates the method used for inverting a
matrix. Options include |
The data
must be provided in case level or equivalently in ‘long’
format. See details about the ‘long’ format in the function reshape.
A term of the form offset(expression)
is allowed in the right hand
side of formula
.
The default set for the response categories is , where
is the maximum observed response category. If otherwise, the
function recodes the observed response categories onto this set.
The -th response category is treated as baseline.
The default set for the id
labels is , where
is the sample size. If otherwise, the function recodes the given
labels onto this set.
The argument repeated
can be ignored only when data
is written
in such a way that the -th observation in each cluster is recorded at
the
-th measurement occasion. If this is not the case, then the user
must provide
repeated
. The suggested set for the levels of
repeated
is , where
is the number of
observed levels. If otherwise, the function recodes the given levels onto
this set.
The variables id
and repeated
do not need to be pre-sorted.
Instead the function reshapes data
in an ascending order of id
and repeated
.
The fitted marginal baseline category logit model is
where is the
-th multinomial response for
cluster
,
is the associated covariates vector,
is the
-th response category specific intercept and
is the
-th response category specific parameter
vector.
The formula is easier to read from either the Vignette or the Reference Manual (both available here).
The LORterm
argument must be an x
matrix, where
is the number of level pairs of
repeated
. These are ordered
as and the rows of
LORterm
are supposed to preserve this order. Each row is assumed to
contain the vectorized form of a probability table that satisfies the
desired local odds ratios structure.
Returns an object of the class "LORgee"
. This has components:
call |
the matched call. |
title |
title for the GEE model. |
version |
the current version of the GEE solver. |
link |
the marginal link function. |
local.odds.ratios |
the marginalized local odds ratios structure variables. |
terms |
the |
contrasts |
the |
nobs |
the number of observations. |
convergence |
the values of the convergence variables. |
coefficients |
the estimated regression parameter vector of the marginal model. |
linear.pred |
the estimated linear predictor of the
marginal regression model. The |
fitted.values |
the estimated fitted
values of the marginal regression model. The |
residuals |
the residuals of the marginal regression model based on the
binary responses. The |
y |
the multinomial response variables. |
id |
the |
max.id |
the number of clusters. |
clusz |
the number of observations within each cluster. |
robust.variance |
the estimated sandwich (robust) covariance matrix. |
naive.variance |
the estimated model-based (naive) covariance matrix. |
xnames |
the regression coefficients' symbolic names. |
categories |
the number of observed response categories. |
occasions |
the levels of the |
LORgee_control |
the control values for the GEE solver. |
ipfp.control |
the control values for the function |
inverse.method |
the method used for inverting matrices. |
adding.constant |
the value used for |
pvalue |
the p-value based on a Wald test that no covariates are statistically significant. |
Generic coef, summary, print,
fitted and residuals methods are available. The pvalue
of the Null model
corresponds to the hypothesis based on the Wald test statistic.
Anestis Touloumis
Touloumis, A. (2011) GEE for multinomial responses. PhD dissertation, University of Florida.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
Touloumis, A. (2015) R Package multgee: A Generalized Estimating Equations Solver for Multinomial Responses. Journal of Statistical Software 64, 1–14.
For an ordinal response scale use the function ordLORgee.
## See the interpretation in Touloumis (2011). data(housing) fitmod <- nomLORgee(y ~ factor(time) * sec, data = housing, id = id, repeated = time) summary(fitmod)
## See the interpretation in Touloumis (2011). data(housing) fitmod <- nomLORgee(y ~ factor(time) * sec, data = housing, id = id, repeated = time) summary(fitmod)
Solving the generalized estimating equations for correlated ordinal multinomial responses assuming a cumulative link model or an adjacent categories logit model for the marginal probabilities.
ordLORgee(formula = formula(data), data = parent.frame(), id = id, repeated = NULL, link = "logit", bstart = NULL, LORstr = "category.exch", LORem = "3way", LORterm = NULL, add = 0, homogeneous = TRUE, restricted = FALSE, control = LORgee_control(), ipfp.ctrl = ipfp.control(), IM = "solve")
ordLORgee(formula = formula(data), data = parent.frame(), id = id, repeated = NULL, link = "logit", bstart = NULL, LORstr = "category.exch", LORem = "3way", LORterm = NULL, add = 0, homogeneous = TRUE, restricted = FALSE, control = LORgee_control(), ipfp.ctrl = ipfp.control(), IM = "solve")
formula |
a formula expression as for other regression models for multinomial responses. An intercept term must be included. |
data |
an optional data frame containing the variables provided in
|
id |
a vector that identifies the clusters. |
repeated |
an optional vector that identifies the order of observations within each cluster. |
link |
a character string that specifies the link function. Options
include |
bstart |
a vector that includes an initial estimate for the marginal regression parameter vector. |
LORstr |
a character string that indicates the marginalized local odds
ratios structure. Options include |
LORem |
a character string that indicates if the marginalized local
odds ratios structure is estimated simultaneously ( |
LORterm |
a matrix that satisfies the user-defined local odds ratios
structure. It is ignored unless |
add |
a positive constant to be added at each cell of the full marginalized contingency table in the presence of zero observed counts. |
homogeneous |
a logical that indicates homogeneous score parameters
when |
restricted |
a logical that indicates monotone score parameters when
|
control |
a vector that specifies the control variables for the GEE solver. |
ipfp.ctrl |
a vector that specifies the control variables for the
function |
IM |
a character string that indicates the method used for inverting a
matrix. Options include |
The data
must be provided in case level or equivalently in ‘long’
format. See details about the ‘long’ format in the function reshape.
A term of the form offset(expression)
is allowed in the right hand
side of formula
.
The default set for the response categories is , where
is the maximum observed response category. If otherwise, the
function recodes the observed response categories onto this set.
The -th response category is omitted.
The default set for the id
labels is , where
is the sample size. If otherwise, the function recodes the given
labels onto this set.
The argument repeated
can be ignored only when data
is written
in such a way that the -th observation in each cluster is recorded at
the
-th measurement occasion. If this is not the case, then the user
must provide
repeated
. The suggested set for the levels of
repeated
is , where
is the number of
observed levels. If otherwise, the function recodes the given levels onto
this set.
The variables id
and repeated
do not need to be pre-sorted.
Instead the function reshapes data
in an ascending order of id
and repeated
.
The fitted marginal cumulative link model is
where is the
-th multinomial response for cluster
,
is the
associated covariates vector,
is the cumulative distribution
function determined by
link
, is the
-th
response category specific intercept and
is the marginal
regression parameter vector excluding intercepts.
The marginal adjacent categories logit model
is fitted if
and only if link="acl"
. In contrast to a marginal cumulative link
model, here the intercepts do not need to be monotone increasing.
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The LORterm
argument must be an x
matrix, where
is the number of level pairs of
repeated
. These are ordered
as and the rows of
LORterm
are supposed to preserve this order. Each row is assumed to
contain the vectorized form of a probability table that satisfies the
desired local odds ratios structure.
Returns an object of the class "LORgee"
. This has components:
call |
the matched call. |
title |
title for the GEE model. |
version |
the current version of the GEE solver. |
link |
the marginal link function. |
local.odds.ratios |
the marginalized local odds ratios structure variables. |
terms |
the |
contrasts |
the |
nobs |
the number of observations. |
convergence |
the values of the convergence variables. |
coefficients |
the estimated regression parameter vector of the marginal model. |
linear.pred |
the estimated linear predictor of the marginal regression
model. The |
fitted.values |
the estimated fitted values of the marginal regression
model. The |
residuals |
the residuals of the marginal regression model. The
|
y |
the multinomial response variables. |
id |
the |
max.id |
the number of clusters. |
clusz |
the number of observations within each cluster. |
robust.variance |
the estimated sandwich (robust) covariance matrix. |
naive.variance |
the estimated model-based (naive) covariance matrix. |
xnames |
the regression coefficients' symbolic names. |
categories |
the number of observed response categories. |
occasions |
the levels of the |
LORgee_control |
the control values for the GEE solver. |
ipfp.control |
the control values for the function |
inverse.method |
the method used for inverting matrices. |
adding.constant |
the value used for |
pvalue |
the p-value based on a Wald test that no covariates are statistically significant. |
Generic coef, summary, print,
fitted and residuals methods are available. The pvalue
of the Null model
corresponds to the hypothesis based on
the Wald test statistic.
Anestis Touloumis
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics, 69, 633-640.
Touloumis, A. (2015) R Package multgee: A Generalized Estimating Equations Solver for Multinomial Responses. Journal of Statistical Software, 64, 1-14.
For a nominal response scale use the function nomLORgee.
data(arthritis) intrinsic.pars(y, arthritis, id, time) fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") summary(fitmod)
data(arthritis) intrinsic.pars(y, arthritis, id, time) fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") summary(fitmod)
Returns the variance-covariance matrix of the main parameters of a fitted model LORgee object.
## S3 method for class 'LORgee' vcov(object, method = "robust", ...)
## S3 method for class 'LORgee' vcov(object, method = "robust", ...)
object |
a fitted model LORgee object. |
method |
character indicating whether the sandwich (robust) covariance
matrix ( |
... |
additional argument(s) for methods. |
Default is to obtain the estimated sandwich (robust) covariance matrix and
method = "naive"
obtains the estimated model-based (naive) covariance
matrix
A matrix of the estimated covariances between the parameter estimates in the linear predictor of the GEE model. This should have row and column names corresponding to the parameter names given by the coef method.
fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") vcov(fitmod, method = "robust") vcov(fitmod, method = "naive")
fitmod <- ordLORgee(formula = y ~ factor(time) + factor(trt) + factor(baseline), data = arthritis, id = id, repeated = time, LORstr = "uniform") vcov(fitmod, method = "robust") vcov(fitmod, method = "naive")
Comparing two nested GEE models by carrying out a Wald test.
waldts(object0, object1)
waldts(object0, object1)
object0 |
A GEE model of the class " |
object1 |
A GEE model of the class " |
The two GEE models implied by object0
and object1
must be
nested.
Anestis Touloumis
data(housing) set.seed(1) fitmod1 <- nomLORgee(y ~ factor(time) * sec, data = housing, id = id, repeated = time) set.seed(1) fitmod0 <- update(fitmod1, formula = y ~ factor(time) + sec) waldts(fitmod0, fitmod1)
data(housing) set.seed(1) fitmod1 <- nomLORgee(y ~ factor(time) * sec, data = housing, id = id, repeated = time) set.seed(1) fitmod0 <- update(fitmod1, formula = y ~ factor(time) + sec) waldts(fitmod0, fitmod1)