Title: | Simulates Correlated Multinomial Responses |
---|---|
Description: | Simulates correlated multinomial responses conditional on a marginal model specification. |
Authors: | Anestis Touloumis [aut, cre] |
Maintainer: | Anestis Touloumis <[email protected]> |
License: | GPL-3 |
Version: | 1.9.1 |
Built: | 2025-01-12 04:50:49 UTC |
Source: | https://github.com/anestistouloumis/simcormultres |
Simulates correlated binary responses assuming a regression model for the marginal probabilities.
rbin(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
rbin(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical (or numeric vector of length |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function in the marginal
model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal model is
where is the cumulative distribution
function determined by
link
. For subject ,
is the
-th binary response and
is the associated covariates
vector. Finally,
and
are the intercept and
regression parameter vector at the
-th measurement occasion.
The binary response is obtained by extending the approach of
Emrich and Piedmonte (1991) as suggested in Touloumis (2016).
When for all
, then
intercepts
should be provided as a single number. Otherwise, intercepts
must be
provided as a numeric vector such that the -th element corresponds to
the intercept at measurement occasion
.
betas
should be provided as a numeric vector only when
for all
. Otherwise,
betas
must be
provided as a numeric matrix with clsize
rows such that the
-th row contains the value of
. In either case,
betas
should reflect the order of the terms implied by
xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the
rlatent
argument. In this case,
element () of
rlatent
represents the realization of
.
Returns a list that has components:
Ysim |
the simulated binary
responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Emrich, L. J. and Piedmonte, M. R. (1991) A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
rmult.bcl
for simulating correlated nominal
responses, rmult.clm
, rmult.crm
and
rmult.acl
for simulating correlated ordinal responses.
## See Example 3.5 in the Vignette. set.seed(123) sample_size <- 5000 cluster_size <- 4 beta_intercepts <- 0 beta_coefficients <- 0.2 latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9)) x <- rep(rnorm(sample_size), each = cluster_size) simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") library(gee) binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients ## See Example 3.6 in the Vignette. set.seed(8) library(evd) simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables <- simulated_latent_variables1 - simulated_latent_variables2 simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, rlatent = simulated_latent_variables) binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients
## See Example 3.5 in the Vignette. set.seed(123) sample_size <- 5000 cluster_size <- 4 beta_intercepts <- 0 beta_coefficients <- 0.2 latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9)) x <- rep(rnorm(sample_size), each = cluster_size) simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") library(gee) binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients ## See Example 3.6 in the Vignette. set.seed(8) library(evd) simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables <- simulated_latent_variables1 - simulated_latent_variables2 simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, rlatent = simulated_latent_variables) binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients
Simulates correlated ordinal responses assuming an adjacent-category logit model for the marginal probabilities.
rmult.acl(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
rmult.acl(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal adjacent-category logit model. |
betas |
numerical vector or matrix containing the value of the marginal regression parameter vector. |
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal adjacent-category logit model is
For subject ,
is the
-th ordinal response
and
is the associated covariates vector. Also
is the
-th category-specific intercept at the
-th measurement
occasion and
is the regression
parameter vector at the
-th measurement occasion.
The ordinal response is obtained by utilizing the threshold
approach described in the Vignette. This approach is based on the connection
between baseline-category and adjacent-category logit models.
When for all
, then
intercepts
should be provided as a numerical vector. Otherwise, intercepts
must
be a numerical matrix such that row contains the category-specific
intercepts at the
-th measurement occasion.
betas
should be provided as a numeric vector only when
for all
. Otherwise,
betas
must be
provided as a numeric matrix with clsize
rows such that the
-th row contains the value of
. In either case,
betas
should reflect the order of the terms implied by
xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in the Vignette. To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the
rlatent
argument. In this case,
row corresponds to subject
and columns
should contain the
realization of
, respectively, for
.
Returns a list that has components:
Ysim |
the simulated
nominal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rbin
for simulating correlated binary responses,
rmult.clm
and rmult.crm
for simulating
correlated ordinal responses, and rmult.bcl
for simulating
nominal responses.
## See Example 3.4 in the Vignette. beta_intercepts <- c(3, 1, 2) beta_coefficients <- c(1, 1) sample_size <- 500 cluster_size <- 3 set.seed(321) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) identity_matrix <- diag(4) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_ordinal_dataset <- rmult.acl(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) ordinal_gee_model <- ordLORgee(y ~ x1 + x2, data = simulated_ordinal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch", link = "acl") round(coef(ordinal_gee_model), 2)
## See Example 3.4 in the Vignette. beta_intercepts <- c(3, 1, 2) beta_coefficients <- c(1, 1) sample_size <- 500 cluster_size <- 3 set.seed(321) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) identity_matrix <- diag(4) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_ordinal_dataset <- rmult.acl(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) ordinal_gee_model <- ordLORgee(y ~ x1 + x2, data = simulated_ordinal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch", link = "acl") round(coef(ordinal_gee_model), 2)
Simulates correlated nominal responses assuming a baseline-category logit model for the marginal probabilities.
rmult.bcl(clsize = clsize, ncategories = ncategories, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
rmult.bcl(clsize = clsize, ncategories = ncategories, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
ncategories |
integer indicating the number of nominal response categories. |
betas |
numerical vector or matrix containing the value of the marginal regression parameter vector. |
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal baseline category logit model is
For subject ,
is the
-th nominal response
and
is the associated covariates vector. Also
is the
-th category-specific intercept at the
-th measurement
occasion and
is the
-th category-specific regression
parameter vector at the
-th measurement occasion.
The nominal response is obtained by extending the principle of
maximum random utility (McFadden, 1974) as suggested in
Touloumis (2016).
betas
should be provided as a numeric vector only when
and
for all
.
Otherwise,
betas
must be provided as a numeric matrix with
clsize
rows such that the -th row contains the value of
(
). In either case,
betas
should reflect the order of the
terms implied by xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). In this
case, the algorithm forces
cor.matrix
to respect the assumption of
choice independence. To import simulated values for the latent random
vectors without utilizing the NORTA method, the user can employ the
rlatent
argument. In this case, row corresponds to subject
and columns
should contain the
realization of
, respectively, for
.
Returns a list that has components:
Ysim |
the simulated
nominal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. New York: Academic Press, 105–142.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rbin
for simulating correlated binary responses,
rmult.clm
, rmult.crm
and
rmult.acl
for simulating correlated ordinal responses.
## See Example 3.1 in the Vignette. betas <- c(1, 3, 2, 1.25, 3.25, 1.75, 0.75, 2.75, 2.25, 0, 0, 0) sample_size <- 500 categories_no <- 4 cluster_size <- 3 set.seed(1) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) identity_matrix <- diag(categories_no) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_nominal_dataset <- rmult.bcl(clsize = cluster_size, ncategories = categories_no, betas = betas, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) nominal_gee_model <- nomLORgee(y ~ x1 + x2, data = simulated_nominal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch") round(coef(nominal_gee_model), 2)
## See Example 3.1 in the Vignette. betas <- c(1, 3, 2, 1.25, 3.25, 1.75, 0.75, 2.75, 2.25, 0, 0, 0) sample_size <- 500 categories_no <- 4 cluster_size <- 3 set.seed(1) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) identity_matrix <- diag(categories_no) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_nominal_dataset <- rmult.bcl(clsize = cluster_size, ncategories = categories_no, betas = betas, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) nominal_gee_model <- nomLORgee(y ~ x1 + x2, data = simulated_nominal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch") round(coef(nominal_gee_model), 2)
Simulates correlated ordinal responses assuming a cumulative link model for the marginal probabilities.
rmult.clm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
rmult.clm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal cumulative link model. |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function in the marginal
cumulative link model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal cumulative link model is
where is the
cumulative distribution function determined by
link
. For subject
,
is the
-th ordinal response and
is
the associated covariates vector. Finally,
is the
-th category-specific intercept at the
-th measurement
occasion and
is the
-th category-specific regression
parameter vector at the
-th measurement occasion.
The ordinal response is obtained by extending the approach of
McCullagh (1980) as suggested in Touloumis (2016).
When for all
, then
intercepts
should be provided as a numerical vector. Otherwise, intercepts
must
be a numerical matrix such that row contains the category-specific
intercepts at the
-th measurement occasion.
betas
should be provided as a numeric vector only when
for all
. Otherwise,
betas
must be
provided as a numeric matrix with clsize
rows such that the
-th row contains the value of
. In either case,
betas
should reflect the order of the terms implied by
xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the
rlatent
argument. In this case,
element () of
rlatent
represents the realization of
.
Returns a list that has components:
Ysim |
the simulated
ordinal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
McCullagh, P. (1980) Regression models for ordinal data. Journal of the Royal Statistical Society B 42, 109–142.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rmult.bcl
for simulating correlated nominal
responses, rmult.crm
and rmult.acl
for
simulating correlated ordinal responses and rbin
for
simulating correlated binary responses.
## See Example 3.2 in the Vignette. set.seed(12345) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- matrix(c(1, 2, 3, 4), 4, 1) x <- rep(rnorm(sample_size), each = cluster_size) latent_correlation_matrix <- toeplitz(c(1, 0.85, 0.5, 0.15)) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$simdata, n = 8) ## Same sampling scheme except that the parameter vector is time-stationary. set.seed(12345) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, betas = 1, xformula = ~x, cor.matrix = latent_correlation_matrix, intercepts = beta_intercepts, link = "probit") ## Fit a GEE model (Touloumis et al., 2013) to estimate the regression ## coefficients. library(multgee) ordinal_gee_model <- ordLORgee(y ~ x, id = id, repeated = time, link = "probit", data = simulated_ordinal_dataset$simdata) coef(ordinal_gee_model)
## See Example 3.2 in the Vignette. set.seed(12345) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- matrix(c(1, 2, 3, 4), 4, 1) x <- rep(rnorm(sample_size), each = cluster_size) latent_correlation_matrix <- toeplitz(c(1, 0.85, 0.5, 0.15)) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$simdata, n = 8) ## Same sampling scheme except that the parameter vector is time-stationary. set.seed(12345) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, betas = 1, xformula = ~x, cor.matrix = latent_correlation_matrix, intercepts = beta_intercepts, link = "probit") ## Fit a GEE model (Touloumis et al., 2013) to estimate the regression ## coefficients. library(multgee) ordinal_gee_model <- ordLORgee(y ~ x, id = id, repeated = time, link = "probit", data = simulated_ordinal_dataset$simdata) coef(ordinal_gee_model)
Simulates correlated ordinal responses assuming a continuation-ratio model for the marginal probabilities.
rmult.crm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
rmult.crm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal continuation-ratio model. |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function of the marginal
continuation-ratio model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal continuation-ratio model is
where is the
cumulative distribution function determined by
link
. For subject
,
is the
-th multinomial response and
is the associated covariates vector. Finally,
is the
-th category-specific intercept at the
-th measurement
occasion and
is the
-th category-specific regression
parameter vector at the
-th measurement occasion.
The ordinal response is determined by extending the latent
variable threshold approach of Tutz (1991) as suggested in
Touloumis (2016).
When for all
, then
intercepts
should be provided as a numerical vector. Otherwise, intercepts
must
be a numerical matrix such that row contains the category-specific
intercepts at the
-th measurement occasion.
betas
should be provided as a numeric vector only when
for all
. Otherwise,
betas
must be
provided as a numeric matrix with clsize
rows such that the
-th row contains the value of
. In either case,
betas
should reflect the order of the terms implied by
xformula
.
The appropriate use of xformula
is xformula = ~ covariates
,
where covariates
indicate the linear predictor as in other marginal
regression models.
The optional argument xdata
should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). In this
case, the algorithm forces
cor.matrix
to respect the local
independence assumption. To import simulated values for the latent random
vectors without utilizing the NORTA method, the user can employ the
rlatent
argument. In this case, row corresponds to subject
and columns
should contain the
realization of
, respectively, for
.
Returns a list that has components:
Ysim |
the simulated
ordinal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal (forthcoming).
Tutz, G. (1991) Sequential models in categorical regression. Computational Statistics & Data Analysis 11, 275–295.
rmult.bcl
for simulating correlated nominal
responses, rmult.clm
and rmult.acl
for simulating
correlated ordinal responses and rbin
for simulating
correlated binary responses.
## See Example 3.3 in the Vignette. set.seed(1) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- 1 x <- rnorm(sample_size * cluster_size) categories_no <- 5 identity_matrix <- diag(1, (categories_no - 1) * cluster_size) equicorrelation_matrix <- toeplitz(c(0, rep(0.24, categories_no - 2))) ones_matrix <- matrix(1, cluster_size, cluster_size) latent_correlation_matrix <- identity_matrix + kronecker(equicorrelation_matrix, ones_matrix) simulated_ordinal_dataset <- rmult.crm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$Ysim)
## See Example 3.3 in the Vignette. set.seed(1) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- 1 x <- rnorm(sample_size * cluster_size) categories_no <- 5 identity_matrix <- diag(1, (categories_no - 1) * cluster_size) equicorrelation_matrix <- toeplitz(c(0, rep(0.24, categories_no - 2))) ones_matrix <- matrix(1, cluster_size, cluster_size) latent_correlation_matrix <- identity_matrix + kronecker(equicorrelation_matrix, ones_matrix) simulated_ordinal_dataset <- rmult.crm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$Ysim)
Utility function to simulate random vectors with predefined marginal distributions via the NORTA method.
rnorta(R = R, cor.matrix = cor.matrix, distr = distr, qparameters = NULL)
rnorta(R = R, cor.matrix = cor.matrix, distr = distr, qparameters = NULL)
R |
integer indicating the sample size. |
cor.matrix |
matrix indicating the correlation matrix of the multivariate normal distribution employed in the NORTA method. |
distr |
character string vector of length |
qparameters |
list of |
Checks are made to ensure that cor.matrix
is a positive definite
correlation matrix. The positive definiteness of cor.matrix
is
assessed via eigenvalues.
The -th character string in
distr
indicates the quantile
function of the -th marginal distribution. See
Distributions
for the most common distributions. Quantile
functions supported by other R packages are allowed provided that these
packages have been uploaded first. However, note that no checks are made to
ensure that the character strings in distr
correspond to valid names
of quantile functions.
If qparameters = NULL
then the default parameter values for the
quantile functions specified by distr
are used. Otherwise,
qparameters
should be provided as a list of ncol(cor.matrix)
lists such that the -th list contains the desired parameter values of
the
-th quantile function.
Returns R
random vectors of size ncol(cor.matrix)
with
marginal distributions specified by distr
(and qparameters
).
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
## An example with standard logistic as marginal distribution. set.seed(1) sample_size <- 1000 latent_correlation_matrix <- toeplitz(c(1, rep(0.8, 2))) latent_correlation_matrix common_marginal_distribution <- rep("qlogis", 3) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = common_marginal_distribution) ## The following lines exemplify the NORTA method. set.seed(1) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = latent_correlation_matrix) norta_simulated <- qlogis(pnorm(simulated_normal_responses)) all(simulated_logistic_responses == norta_simulated) ## Change the marginal distributions to standard normal, standard ## logistic and standard extreme value distribution. set.seed(1) different_marginal_distributions <- c("qnorm", "qlogis", "qgumbel") simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd) ## Same as above but using parameter values other than the default ones. set.seed(1) qpars <- list(c(mean = 1, sd = 9), c(location = 2, scale = 1), c(loc = 3, scale = 1)) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions, qparameters = qpars) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd)
## An example with standard logistic as marginal distribution. set.seed(1) sample_size <- 1000 latent_correlation_matrix <- toeplitz(c(1, rep(0.8, 2))) latent_correlation_matrix common_marginal_distribution <- rep("qlogis", 3) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = common_marginal_distribution) ## The following lines exemplify the NORTA method. set.seed(1) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = latent_correlation_matrix) norta_simulated <- qlogis(pnorm(simulated_normal_responses)) all(simulated_logistic_responses == norta_simulated) ## Change the marginal distributions to standard normal, standard ## logistic and standard extreme value distribution. set.seed(1) different_marginal_distributions <- c("qnorm", "qlogis", "qgumbel") simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd) ## Same as above but using parameter values other than the default ones. set.seed(1) qpars <- list(c(mean = 1, sd = 9), c(location = 2, scale = 1), c(loc = 3, scale = 1)) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions, qparameters = qpars) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd)
Utility function to simulate continuous random vectors from a multivariate normal distribution such that all marginal distributions are univariate standard normal.
rsmvnorm(R = R, cor.matrix = cor.matrix)
rsmvnorm(R = R, cor.matrix = cor.matrix)
R |
integer indicating the sample size. |
cor.matrix |
matrix indicating the correlation matrix of the multivariate normal distribution. |
Checks are made to ensure that cor.matrix
is a positive definite
correlation matrix. The positive definiteness of cor.matrix
is
assessed via eigenvalues.
Returns R
random vectors of size ncol(cor.matrix)
.
Anestis Touloumis
## Simulating 10000 bivariate random vectors with correlation parameter ## equal to 0.4. set.seed(1) sample_size <- 10000 correlation_matrix <- toeplitz(c(1, 0.4)) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = correlation_matrix) colMeans(simulated_normal_responses) apply(simulated_normal_responses, 2, sd) cor(simulated_normal_responses)
## Simulating 10000 bivariate random vectors with correlation parameter ## equal to 0.4. set.seed(1) sample_size <- 10000 correlation_matrix <- toeplitz(c(1, 0.4)) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = correlation_matrix) colMeans(simulated_normal_responses) apply(simulated_normal_responses, 2, sd) cor(simulated_normal_responses)
A simulated dataset to explore the association between the correlation parameter of bivariate normally distributed random variables used in the intermediate step of the NORTA method and the correlation parameter of the resulting non-normal random responses generated by the NORTA method for all the threshold approached implemented in this package.
simulation
simulation
A data frame with 100 rows and 4 columns:
numeric indicating the true value of the correlation parameter.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is normal.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is logistic.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is Gumbel.
plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", ylim = c(0, 0.016), ylab = expression(rho - bar(rho)[sim]), xlab = expression(rho)) points(rho - logistic ~ rho, data = simulation, type = "l", col = "red") points(rho - gumbel ~ rho, data = simulation, type = "l", col = "grey") legend("topright", legend = c("Normal", "Logistic", "Gumbel"), col = c("blue", "red", "grey"), pch = "l" )
plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", ylim = c(0, 0.016), ylab = expression(rho - bar(rho)[sim]), xlab = expression(rho)) points(rho - logistic ~ rho, data = simulation, type = "l", col = "red") points(rho - gumbel ~ rho, data = simulation, type = "l", col = "grey") legend("topright", legend = c("Normal", "Logistic", "Gumbel"), col = c("blue", "red", "grey"), pch = "l" )