| Title: | Simulates Correlated Multinomial Responses |
|---|---|
| Description: | Simulates correlated multinomial responses conditional on a marginal model specification. |
| Authors: | Anestis Touloumis [aut, cre] (ORCID: <https://orcid.org/0000-0002-5965-1639>) |
| Maintainer: | Anestis Touloumis <[email protected]> |
| License: | GPL-3 |
| Version: | 1.9.1 |
| Built: | 2026-06-02 10:28:02 UTC |
| Source: | https://github.com/anestistouloumis/simcormultres |
Simulates correlated binary responses assuming a regression model for the marginal probabilities.
rbin(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)rbin(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical (or numeric vector of length |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function in the marginal
model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal model is
where is the cumulative distribution
function determined by link. For subject , is the
-th binary response and is the associated covariates
vector. Finally, and are the intercept and
regression parameter vector at the -th measurement occasion.
The binary response is obtained by extending the approach of
Emrich and Piedmonte (1991) as suggested in Touloumis (2016).
When for all , then intercepts
should be provided as a single number. Otherwise, intercepts must be
provided as a numeric vector such that the -th element corresponds to
the intercept at measurement occasion .
betas should be provided as a numeric vector only when
for all . Otherwise, betas must be
provided as a numeric matrix with clsize rows such that the
-th row contains the value of . In either case,
betas should reflect the order of the terms implied by
xformula.
The appropriate use of xformula is xformula = ~ covariates,
where covariates indicate the linear predictor as in other marginal
regression models.
The optional argument xdata should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the rlatent argument. In this case,
element () of rlatent represents the realization of
.
Returns a list that has components:
Ysim |
the simulated binary
responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Emrich, L. J. and Piedmonte, M. R. (1991) A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
rmult.bcl for simulating correlated nominal
responses, rmult.clm, rmult.crm and
rmult.acl for simulating correlated ordinal responses.
## See Example 3.5 in the Vignette. set.seed(123) sample_size <- 5000 cluster_size <- 4 beta_intercepts <- 0 beta_coefficients <- 0.2 latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9)) x <- rep(rnorm(sample_size), each = cluster_size) simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") library(gee) binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients ## See Example 3.6 in the Vignette. set.seed(8) library(evd) simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables <- simulated_latent_variables1 - simulated_latent_variables2 simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, rlatent = simulated_latent_variables) binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients## See Example 3.5 in the Vignette. set.seed(123) sample_size <- 5000 cluster_size <- 4 beta_intercepts <- 0 beta_coefficients <- 0.2 latent_correlation_matrix <- toeplitz(c(1, 0.9, 0.9, 0.9)) x <- rep(rnorm(sample_size), each = cluster_size) simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") library(gee) binary_gee_model <- gee(y ~ x, family = binomial("probit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients ## See Example 3.6 in the Vignette. set.seed(8) library(evd) simulated_latent_variables1 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables2 <- rmvevd(sample_size, dep = sqrt(1 - 0.9), model = "log", d = cluster_size) simulated_latent_variables <- simulated_latent_variables1 - simulated_latent_variables2 simulated_binary_dataset <- rbin(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, rlatent = simulated_latent_variables) binary_gee_model <- gee(y ~ x, family = binomial("logit"), id = id, data = simulated_binary_dataset$simdata) summary(binary_gee_model)$coefficients
Simulates correlated ordinal responses assuming an adjacent-category logit model for the marginal probabilities.
rmult.acl(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)rmult.acl(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal adjacent-category logit model. |
betas |
numerical vector or matrix containing the value of the marginal regression parameter vector. |
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal adjacent-category logit model is
For subject , is the -th ordinal response
and is the associated covariates vector. Also
is the -th category-specific intercept at the -th measurement
occasion and is the regression
parameter vector at the -th measurement occasion.
The ordinal response is obtained by utilizing the threshold
approach described in the Vignette. This approach is based on the connection
between baseline-category and adjacent-category logit models.
When for all , then intercepts
should be provided as a numerical vector. Otherwise, intercepts must
be a numerical matrix such that row contains the category-specific
intercepts at the -th measurement occasion.
betas should be provided as a numeric vector only when
for all . Otherwise, betas must be
provided as a numeric matrix with clsize rows such that the
-th row contains the value of . In either case,
betas should reflect the order of the terms implied by
xformula.
The appropriate use of xformula is xformula = ~ covariates,
where covariates indicate the linear predictor as in other marginal
regression models.
The optional argument xdata should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in the Vignette. To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the rlatent argument. In this case,
row corresponds to subject and columns
should contain the
realization of , respectively, for
.
Returns a list that has components:
Ysim |
the simulated
nominal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rbin for simulating correlated binary responses,
rmult.clm and rmult.crm for simulating
correlated ordinal responses, and rmult.bcl for simulating
nominal responses.
## See Example 3.4 in the Vignette. beta_intercepts <- c(3, 1, 2) beta_coefficients <- c(1, 1) sample_size <- 500 cluster_size <- 3 set.seed(321) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) identity_matrix <- diag(4) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_ordinal_dataset <- rmult.acl(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) ordinal_gee_model <- ordLORgee(y ~ x1 + x2, data = simulated_ordinal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch", link = "acl") round(coef(ordinal_gee_model), 2)## See Example 3.4 in the Vignette. beta_intercepts <- c(3, 1, 2) beta_coefficients <- c(1, 1) sample_size <- 500 cluster_size <- 3 set.seed(321) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) identity_matrix <- diag(4) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_ordinal_dataset <- rmult.acl(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) ordinal_gee_model <- ordLORgee(y ~ x1 + x2, data = simulated_ordinal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch", link = "acl") round(coef(ordinal_gee_model), 2)
Simulates correlated nominal responses assuming a baseline-category logit model for the marginal probabilities.
rmult.bcl(clsize = clsize, ncategories = ncategories, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)rmult.bcl(clsize = clsize, ncategories = ncategories, betas = betas, xformula = formula(xdata), xdata = parent.frame(), cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
ncategories |
integer indicating the number of nominal response categories. |
betas |
numerical vector or matrix containing the value of the marginal regression parameter vector. |
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal baseline category logit model is
For subject , is the -th nominal response
and is the associated covariates vector. Also
is the -th category-specific intercept at the -th measurement
occasion and is the -th category-specific regression
parameter vector at the -th measurement occasion.
The nominal response is obtained by extending the principle of
maximum random utility (McFadden, 1974) as suggested in
Touloumis (2016).
betas should be provided as a numeric vector only when
and for all .
Otherwise, betas must be provided as a numeric matrix with
clsize rows such that the -th row contains the value of
(). In either case, betas should reflect the order of the
terms implied by xformula.
The appropriate use of xformula is xformula = ~ covariates,
where covariates indicate the linear predictor as in other marginal
regression models.
The optional argument xdata should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). In this
case, the algorithm forces cor.matrix to respect the assumption of
choice independence. To import simulated values for the latent random
vectors without utilizing the NORTA method, the user can employ the
rlatent argument. In this case, row corresponds to subject
and columns
should contain the
realization of , respectively, for
.
Returns a list that has components:
Ysim |
the simulated
nominal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. New York: Academic Press, 105–142.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rbin for simulating correlated binary responses,
rmult.clm, rmult.crm and
rmult.acl for simulating correlated ordinal responses.
## See Example 3.1 in the Vignette. betas <- c(1, 3, 2, 1.25, 3.25, 1.75, 0.75, 2.75, 2.25, 0, 0, 0) sample_size <- 500 categories_no <- 4 cluster_size <- 3 set.seed(1) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) identity_matrix <- diag(categories_no) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_nominal_dataset <- rmult.bcl(clsize = cluster_size, ncategories = categories_no, betas = betas, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) nominal_gee_model <- nomLORgee(y ~ x1 + x2, data = simulated_nominal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch") round(coef(nominal_gee_model), 2)## See Example 3.1 in the Vignette. betas <- c(1, 3, 2, 1.25, 3.25, 1.75, 0.75, 2.75, 2.25, 0, 0, 0) sample_size <- 500 categories_no <- 4 cluster_size <- 3 set.seed(1) x1 <- rep(rnorm(sample_size), each = cluster_size) x2 <- rnorm(sample_size * cluster_size) xdata <- data.frame(x1, x2) equicorrelation_matrix <- toeplitz(c(1, rep(0.95, cluster_size - 1))) identity_matrix <- diag(categories_no) latent_correlation_matrix <- kronecker(equicorrelation_matrix, identity_matrix) simulated_nominal_dataset <- rmult.bcl(clsize = cluster_size, ncategories = categories_no, betas = betas, xformula = ~ x1 + x2, xdata = xdata, cor.matrix = latent_correlation_matrix) suppressPackageStartupMessages(library("multgee")) nominal_gee_model <- nomLORgee(y ~ x1 + x2, data = simulated_nominal_dataset$simdata, id = id, repeated = time, LORstr = "time.exch") round(coef(nominal_gee_model), 2)
Simulates correlated ordinal responses assuming a cumulative link model for the marginal probabilities.
rmult.clm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)rmult.clm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal cumulative link model. |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function in the marginal
cumulative link model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal cumulative link model is
where is the
cumulative distribution function determined by link. For subject
, is the -th ordinal response and is
the associated covariates vector. Finally, is the
-th category-specific intercept at the -th measurement
occasion and is the -th category-specific regression
parameter vector at the -th measurement occasion.
The ordinal response is obtained by extending the approach of
McCullagh (1980) as suggested in Touloumis (2016).
When for all , then intercepts
should be provided as a numerical vector. Otherwise, intercepts must
be a numerical matrix such that row contains the category-specific
intercepts at the -th measurement occasion.
betas should be provided as a numeric vector only when
for all . Otherwise, betas must be
provided as a numeric matrix with clsize rows such that the
-th row contains the value of . In either case,
betas should reflect the order of the terms implied by
xformula.
The appropriate use of xformula is xformula = ~ covariates,
where covariates indicate the linear predictor as in other marginal
regression models.
The optional argument xdata should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). To import
simulated values for the latent random vectors without utilizing the NORTA
method, the user can employ the rlatent argument. In this case,
element () of rlatent represents the realization of
.
Returns a list that has components:
Ysim |
the simulated
ordinal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
McCullagh, P. (1980) Regression models for ordinal data. Journal of the Royal Statistical Society B 42, 109–142.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
Touloumis, A., Agresti, A. and Kateri, M. (2013) GEE for multinomial responses using a local odds ratios parameterization. Biometrics 69, 633–640.
rmult.bcl for simulating correlated nominal
responses, rmult.crm and rmult.acl for
simulating correlated ordinal responses and rbin for
simulating correlated binary responses.
## See Example 3.2 in the Vignette. set.seed(12345) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- matrix(c(1, 2, 3, 4), 4, 1) x <- rep(rnorm(sample_size), each = cluster_size) latent_correlation_matrix <- toeplitz(c(1, 0.85, 0.5, 0.15)) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$simdata, n = 8) ## Same sampling scheme except that the parameter vector is time-stationary. set.seed(12345) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, betas = 1, xformula = ~x, cor.matrix = latent_correlation_matrix, intercepts = beta_intercepts, link = "probit") ## Fit a GEE model (Touloumis et al., 2013) to estimate the regression ## coefficients. library(multgee) ordinal_gee_model <- ordLORgee(y ~ x, id = id, repeated = time, link = "probit", data = simulated_ordinal_dataset$simdata) coef(ordinal_gee_model)## See Example 3.2 in the Vignette. set.seed(12345) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- matrix(c(1, 2, 3, 4), 4, 1) x <- rep(rnorm(sample_size), each = cluster_size) latent_correlation_matrix <- toeplitz(c(1, 0.85, 0.5, 0.15)) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$simdata, n = 8) ## Same sampling scheme except that the parameter vector is time-stationary. set.seed(12345) simulated_ordinal_dataset <- rmult.clm(clsize = cluster_size, betas = 1, xformula = ~x, cor.matrix = latent_correlation_matrix, intercepts = beta_intercepts, link = "probit") ## Fit a GEE model (Touloumis et al., 2013) to estimate the regression ## coefficients. library(multgee) ordinal_gee_model <- ordLORgee(y ~ x, id = id, repeated = time, link = "probit", data = simulated_ordinal_dataset$simdata) coef(ordinal_gee_model)
Simulates correlated ordinal responses assuming a continuation-ratio model for the marginal probabilities.
rmult.crm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)rmult.crm(clsize = clsize, intercepts = intercepts, betas = betas, xformula = formula(xdata), xdata = parent.frame(), link = "logit", cor.matrix = cor.matrix, rlatent = NULL)
clsize |
integer indicating the common cluster size. |
intercepts |
numerical vector or matrix containing the intercepts of the marginal continuation-ratio model. |
betas |
numerical vector or matrix containing the value of the marginal
regression parameter vector associated with the covariates (i.e., excluding
|
xformula |
formula expression as in other marginal regression models but without including a response variable. |
xdata |
optional data frame containing the variables provided in
|
link |
character string indicating the link function of the marginal
continuation-ratio model. Options include |
cor.matrix |
matrix indicating the correlation matrix of the
multivariate normal distribution when the NORTA method is employed
( |
rlatent |
matrix with |
The formulae are easier to read from either the Vignette or the Reference Manual (both available here).
The assumed marginal continuation-ratio model is
where is the
cumulative distribution function determined by link. For subject
, is the -th multinomial response and
is the associated covariates vector. Finally,
is the -th category-specific intercept at the -th measurement
occasion and is the -th category-specific regression
parameter vector at the -th measurement occasion.
The ordinal response is determined by extending the latent
variable threshold approach of Tutz (1991) as suggested in
Touloumis (2016).
When for all , then intercepts
should be provided as a numerical vector. Otherwise, intercepts must
be a numerical matrix such that row contains the category-specific
intercepts at the -th measurement occasion.
betas should be provided as a numeric vector only when
for all . Otherwise, betas must be
provided as a numeric matrix with clsize rows such that the
-th row contains the value of . In either case,
betas should reflect the order of the terms implied by
xformula.
The appropriate use of xformula is xformula = ~ covariates,
where covariates indicate the linear predictor as in other marginal
regression models.
The optional argument xdata should be provided in “long” format.
The NORTA method is the default option for simulating the latent random
vectors denoted by in Touloumis (2016). In this
case, the algorithm forces cor.matrix to respect the local
independence assumption. To import simulated values for the latent random
vectors without utilizing the NORTA method, the user can employ the
rlatent argument. In this case, row corresponds to subject
and columns
should contain the
realization of , respectively, for
.
Returns a list that has components:
Ysim |
the simulated
ordinal responses. Element ( |
simdata |
a data frame that includes the simulated
response variables (y), the covariates specified by |
rlatent |
the latent random variables denoted by
|
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal (forthcoming).
Tutz, G. (1991) Sequential models in categorical regression. Computational Statistics & Data Analysis 11, 275–295.
rmult.bcl for simulating correlated nominal
responses, rmult.clm and rmult.acl for simulating
correlated ordinal responses and rbin for simulating
correlated binary responses.
## See Example 3.3 in the Vignette. set.seed(1) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- 1 x <- rnorm(sample_size * cluster_size) categories_no <- 5 identity_matrix <- diag(1, (categories_no - 1) * cluster_size) equicorrelation_matrix <- toeplitz(c(0, rep(0.24, categories_no - 2))) ones_matrix <- matrix(1, cluster_size, cluster_size) latent_correlation_matrix <- identity_matrix + kronecker(equicorrelation_matrix, ones_matrix) simulated_ordinal_dataset <- rmult.crm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$Ysim)## See Example 3.3 in the Vignette. set.seed(1) sample_size <- 500 cluster_size <- 4 beta_intercepts <- c(-1.5, -0.5, 0.5, 1.5) beta_coefficients <- 1 x <- rnorm(sample_size * cluster_size) categories_no <- 5 identity_matrix <- diag(1, (categories_no - 1) * cluster_size) equicorrelation_matrix <- toeplitz(c(0, rep(0.24, categories_no - 2))) ones_matrix <- matrix(1, cluster_size, cluster_size) latent_correlation_matrix <- identity_matrix + kronecker(equicorrelation_matrix, ones_matrix) simulated_ordinal_dataset <- rmult.crm(clsize = cluster_size, intercepts = beta_intercepts, betas = beta_coefficients, xformula = ~x, cor.matrix = latent_correlation_matrix, link = "probit") head(simulated_ordinal_dataset$Ysim)
Utility function to simulate random vectors with predefined marginal distributions via the NORTA method.
rnorta(R = R, cor.matrix = cor.matrix, distr = distr, qparameters = NULL)rnorta(R = R, cor.matrix = cor.matrix, distr = distr, qparameters = NULL)
R |
integer indicating the sample size. |
cor.matrix |
matrix indicating the correlation matrix of the multivariate normal distribution employed in the NORTA method. |
distr |
character string vector of length |
qparameters |
list of |
Checks are made to ensure that cor.matrix is a positive definite
correlation matrix. The positive definiteness of cor.matrix is
assessed via eigenvalues.
The -th character string in distr indicates the quantile
function of the -th marginal distribution. See
Distributions for the most common distributions. Quantile
functions supported by other R packages are allowed provided that these
packages have been uploaded first. However, note that no checks are made to
ensure that the character strings in distr correspond to valid names
of quantile functions.
If qparameters = NULL then the default parameter values for the
quantile functions specified by distr are used. Otherwise,
qparameters should be provided as a list of ncol(cor.matrix)
lists such that the -th list contains the desired parameter values of
the -th quantile function.
Returns R random vectors of size ncol(cor.matrix) with
marginal distributions specified by distr (and qparameters).
Anestis Touloumis
Cario, M. C. and Nelson, B. L. (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
Li, S. T. and Hammond, J. L. (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man and Cybernetics 5, 557–561.
Touloumis, A. (2016) Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package. The R Journal 8, 79–91.
## An example with standard logistic as marginal distribution. set.seed(1) sample_size <- 1000 latent_correlation_matrix <- toeplitz(c(1, rep(0.8, 2))) latent_correlation_matrix common_marginal_distribution <- rep("qlogis", 3) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = common_marginal_distribution) ## The following lines exemplify the NORTA method. set.seed(1) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = latent_correlation_matrix) norta_simulated <- qlogis(pnorm(simulated_normal_responses)) all(simulated_logistic_responses == norta_simulated) ## Change the marginal distributions to standard normal, standard ## logistic and standard extreme value distribution. set.seed(1) different_marginal_distributions <- c("qnorm", "qlogis", "qgumbel") simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd) ## Same as above but using parameter values other than the default ones. set.seed(1) qpars <- list(c(mean = 1, sd = 9), c(location = 2, scale = 1), c(loc = 3, scale = 1)) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions, qparameters = qpars) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd)## An example with standard logistic as marginal distribution. set.seed(1) sample_size <- 1000 latent_correlation_matrix <- toeplitz(c(1, rep(0.8, 2))) latent_correlation_matrix common_marginal_distribution <- rep("qlogis", 3) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = common_marginal_distribution) ## The following lines exemplify the NORTA method. set.seed(1) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = latent_correlation_matrix) norta_simulated <- qlogis(pnorm(simulated_normal_responses)) all(simulated_logistic_responses == norta_simulated) ## Change the marginal distributions to standard normal, standard ## logistic and standard extreme value distribution. set.seed(1) different_marginal_distributions <- c("qnorm", "qlogis", "qgumbel") simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd) ## Same as above but using parameter values other than the default ones. set.seed(1) qpars <- list(c(mean = 1, sd = 9), c(location = 2, scale = 1), c(loc = 3, scale = 1)) simulated_logistic_responses <- rnorta(R = sample_size, cor.matrix = latent_correlation_matrix, distr = different_marginal_distributions, qparameters = qpars) cor(simulated_logistic_responses) colMeans(simulated_logistic_responses) apply(simulated_logistic_responses, 2, sd)
Utility function to simulate continuous random vectors from a multivariate normal distribution such that all marginal distributions are univariate standard normal.
rsmvnorm(R = R, cor.matrix = cor.matrix)rsmvnorm(R = R, cor.matrix = cor.matrix)
R |
integer indicating the sample size. |
cor.matrix |
matrix indicating the correlation matrix of the multivariate normal distribution. |
Checks are made to ensure that cor.matrix is a positive definite
correlation matrix. The positive definiteness of cor.matrix is
assessed via eigenvalues.
Returns R random vectors of size ncol(cor.matrix).
Anestis Touloumis
## Simulating 10000 bivariate random vectors with correlation parameter ## equal to 0.4. set.seed(1) sample_size <- 10000 correlation_matrix <- toeplitz(c(1, 0.4)) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = correlation_matrix) colMeans(simulated_normal_responses) apply(simulated_normal_responses, 2, sd) cor(simulated_normal_responses)## Simulating 10000 bivariate random vectors with correlation parameter ## equal to 0.4. set.seed(1) sample_size <- 10000 correlation_matrix <- toeplitz(c(1, 0.4)) simulated_normal_responses <- rsmvnorm(R = sample_size, cor.matrix = correlation_matrix) colMeans(simulated_normal_responses) apply(simulated_normal_responses, 2, sd) cor(simulated_normal_responses)
A simulated dataset to explore the association between the correlation parameter of bivariate normally distributed random variables used in the intermediate step of the NORTA method and the correlation parameter of the resulting non-normal random responses generated by the NORTA method for all the threshold approached implemented in this package.
simulationsimulation
A data frame with 100 rows and 4 columns:
numeric indicating the true value of the correlation parameter.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is normal.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is logistic.
numeric indicating the simulated correlation parameter when the marginal distribution of each of the latent variables is Gumbel.
plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", ylim = c(0, 0.016), ylab = expression(rho - bar(rho)[sim]), xlab = expression(rho)) points(rho - logistic ~ rho, data = simulation, type = "l", col = "red") points(rho - gumbel ~ rho, data = simulation, type = "l", col = "grey") legend("topright", legend = c("Normal", "Logistic", "Gumbel"), col = c("blue", "red", "grey"), pch = "l" )plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", ylim = c(0, 0.016), ylab = expression(rho - bar(rho)[sim]), xlab = expression(rho)) points(rho - logistic ~ rho, data = simulation, type = "l", col = "red") points(rho - gumbel ~ rho, data = simulation, type = "l", col = "grey") legend("topright", legend = c("Normal", "Logistic", "Gumbel"), col = c("blue", "red", "grey"), pch = "l" )