Sample Covariance Matrix

Suppose that the goal is to estimate the covariance matrix Σ based on a sample of N independent and identically distributed p random vectors X₁, X₂, …, X_N. The sample covariance matrix is defined by $$ \mathbf S = \frac{1}{N - 1} \sum_{i=1}^{N} (\mathbf X_i - \bar {\mathbf X})(\mathbf X_i - \bar {\mathbf X})^{T} $$

where $\bar {\mathbf X} = \sum_{i=1}^{N} \mathbf X_{i}/N$ is the sample mean vector.

Although S is a natural estimator of the covariance matrix Σ, it is known that S is problematic in high-dimensional settings, i.e. when the number of features p (i.e. the dimension of the N vectors) is a lot larger than the sample size N. For example, S is singular in high-dimensional settings while Σ is a positive-definite matrix.

Steinian Estimators

A simple solution is to consider covariance estimators of the form S_T^* = (1 − λ_T)S + λ_TT

where T is a known positive-definite covariance matrix and 0 < λ_T < 1 is the known optimal intensity. The advantages of S_T^* include that it is: (i) non-singular (ii) well-conditioned, (iii) invariant to permutations of the order of the p variables, (iv) consistent to departures from a multivariate normal model, (v) not necessarily sparse, (vi) expressed in closed form, and (vii) computationally cheap regardless of p.

In practice, the optimal shrinkage intensity λ_T is unknown and needs to be estimated by minimizing a risk function, such as the expectation of the Frobenius norm of the difference between S_T^* and Σ. This package implements the estimation procedures for λ_T described in Touloumis (2015).

Target Covariance Matrix

Let s₁₁², s₂₂², …, s_pp² be the corresponding diagonal elements of the sample covariance matrix S, that is the sample variances of the p features.

The diagonal target covariance matrix T_D is a diagonal matrix whose diagonal elements are equal to the sample variances

$$ \mathbf T_{D} = \begin{bmatrix} s_{11}^{2} & 0 & \ldots & 0 \\ 0 & s_{22}^{2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \ldots & 0 & s_{pp}^{2} \\ \end{bmatrix}. $$

The spherical target covariance matrix T_S is the diagonal matrix

$$ \mathbf T_{S} = \begin{bmatrix} s^{2} & 0 & \ldots & 0 \\ 0 & s^{2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \ldots & 0 & s^{2}\\ \end{bmatrix}. $$

where s² is the average of the sample variances

$$ s^2 = \frac{1}{p} \sum_{k=1}^{p} s_{kk}^{2}. $$

The identity target covariance matrix T_I is the p × p identity matrix

$$ \mathbf T_{I} = \mathbf I_{p} = \begin{bmatrix} 1 & 0 & \ldots & 0 \\ 0 & 1 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \ldots & 0 & 1 \\ \end{bmatrix}. $$

Positive-definiteness of the Target Matrices

The identity covariance target matrix T_I is always positive-definite.
The spherical covariance target matrix is T_S is positive-definite provided that at least one of the p sample variances is not 0.
The diagonal covariance target matrix T_D is positive-definite provided that none of the p sample variances is equal to 0.

An error message will be returned when T_D or T_S will not be positive-definite. In this case, the user should either remove all the features (rows) whose sample variance is 0 or use a different target matrix (e.g. T_I).

Selection of Target Matrix

In practice, to select a suitable target covariance matrix, one can inspect the optimal shrinkage intensity of the three possible target matrices. If these differ significantly, then one can choose as target matrix the one with the largest λ value. Otherwise, the choice of the target matrix can be based on examining the p sample variances.

The identity target matrix T_I is sensible when all the values of the p sample variances are close to 1 s₁₁² ≈ s₁₁² ≈ … ≈ s_pp² ≈ 1.

The spherical target covariance matrix T_S is sensible when the range of the p sample variances is small s₁₁² ≈ s₂₂² ≈ … ≈ s_pp².

The diagonal target covariance matrix T_D is sensible when the values of the p sample variances vary significantly.

Hence, the target matrix selection should be based on inspecting the optimal shrinkage intensities and the range and average of the p sample variances.

Example

The colon cancer data, analyzed in Touloumis (2015), consists of two tissue groups: the normal tissue group and the tumor tissue group.

library("ShrinkCovMat")
data("colon")
normal_group <- colon[, 1:40]
dim(normal_group)
#> [1] 2000   40
tumor_group <- colon[, 41:62]
dim(tumor_group)
#> [1] 2000   22

For each of the 40 subjects in the normal group, their gene expression levels were measured for 2000 genes. To select the target matrix for covariance matrix of the normal group, we use the function targetselection:

targetselection(normal_group)
#> ESTIMATED SHRINKAGE INTENSITIES WITH TARGET MATRIX THE 
#> Spherical matrix : 0.1401 
#> Identity  matrix : 0.1125 
#> Diagonal  matrix : 0.14 
#> 
#> SAMPLE VARIANCES 
#> Range   : 0.4714 
#> Average : 0.0882

The estimated optimal shrinkage intensity for the spherical matrix is slightly larger than the other two. In addition the sample variances appear to be of similar magnitude and their average is smaller than 1. Thus, the spherical matrix seems to be the most appropriate target for the covariance matrix. The resulting covariance matrix estimate is:

estimated_covariance_normal <- shrinkcovmat(normal_group, target = "spherical")
estimated_covariance_normal
#> SHRINKAGE ESTIMATION OF THE COVARIANCE MATRIX 
#> 
#> Estimated Optimal Shrinkage Intensity = 0.1401 
#> 
#> Estimated Covariance Matrix [1:5,1:5] =
#>        [,1]   [,2]   [,3]   [,4]   [,5]
#> [1,] 0.0396 0.0107 0.0101 0.0214 0.0175
#> [2,] 0.0107 0.0499 0.0368 0.0171 0.0040
#> [3,] 0.0101 0.0368 0.0499 0.0147 0.0045
#> [4,] 0.0214 0.0171 0.0147 0.0523 0.0091
#> [5,] 0.0175 0.0040 0.0045 0.0091 0.0483
#> 
#> Target Matrix [1:5,1:5] =
#>        [,1]   [,2]   [,3]   [,4]   [,5]
#> [1,] 0.0882 0.0000 0.0000 0.0000 0.0000
#> [2,] 0.0000 0.0882 0.0000 0.0000 0.0000
#> [3,] 0.0000 0.0000 0.0882 0.0000 0.0000
#> [4,] 0.0000 0.0000 0.0000 0.0882 0.0000
#> [5,] 0.0000 0.0000 0.0000 0.0000 0.0882

We follow a similar procedure to estimate the covariance matrix of the tumor group:

targetselection(tumor_group)
#> ESTIMATED SHRINKAGE INTENSITIES WITH TARGET MATRIX THE 
#> Spherical matrix : 0.1956 
#> Identity  matrix : 0.1705 
#> Diagonal  matrix : 0.1955 
#> 
#> SAMPLE VARIANCES 
#> Range   : 0.4226 
#> Average : 0.0958
estimated_covariance_tumor <- shrinkcovmat(tumor_group, target = "spherical")
estimated_covariance_tumor
#> SHRINKAGE ESTIMATION OF THE COVARIANCE MATRIX 
#> 
#> Estimated Optimal Shrinkage Intensity = 0.1956 
#> 
#> Estimated Covariance Matrix [1:5,1:5] =
#>        [,1]   [,2]   [,3]   [,4]   [,5]
#> [1,] 0.0490 0.0179 0.0170 0.0195 0.0052
#> [2,] 0.0179 0.0450 0.0265 0.0092 0.0034
#> [3,] 0.0170 0.0265 0.0465 0.0084 0.0031
#> [4,] 0.0195 0.0092 0.0084 0.0498 0.0036
#> [5,] 0.0052 0.0034 0.0031 0.0036 0.0361
#> 
#> Target Matrix [1:5,1:5] =
#>        [,1]   [,2]   [,3]   [,4]   [,5]
#> [1,] 0.0958 0.0000 0.0000 0.0000 0.0000
#> [2,] 0.0000 0.0958 0.0000 0.0000 0.0000
#> [3,] 0.0000 0.0000 0.0958 0.0000 0.0000
#> [4,] 0.0000 0.0000 0.0000 0.0958 0.0000
#> [5,] 0.0000 0.0000 0.0000 0.0000 0.0958

Compatibility

Version 2.0.0 introduces the function shrinkcovmat which in the next release of ShrinkCovMat will replace the deprecated functions shinkcovmat.identity, shrinkcovmat.equal and shrinkcovmat.unequal. The table below illustrates the changes:

Deprecated functions since v2.0.0 and their replacements in newer versions.
Deprecated	Replacement
`shrinkcovmat.identity(data)`	`shrinkcovmat(data, target = 'identity')`
`shrinkcovmat.identity(data)`	`shrinkcovmat(data, target = 'spherical')`
`shrinkcovmat.unequal(data)`	`shrinkcovmat(data, target = 'diagonal')`

How To Cite

citation("ShrinkCovMat")
#> To cite 'ShrinkCovMat' in publications, please use:
#> 
#>   Touloumis A. (2015). "Nonparametric Stein-type Shrinkage Covariance
#>   Matrix Estimators in High-Dimensional Settings." _Computational
#>   Statistics & Data Analysis_, *83*, 251-261.
#>   <https://www.sciencedirect.com/science/article/pii/S0167947314003107>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Nonparametric Stein-type Shrinkage Covariance Matrix Estimators in High-Dimensional Settings},
#>     author = {{Touloumis A.}},
#>     year = {2015},
#>     journal = {Computational Statistics & Data Analysis},
#>     volume = {83},
#>     pages = {251-261},
#>     url = {https://www.sciencedirect.com/science/article/pii/S0167947314003107},
#>   }

Linear Shrinkage of Covariance Matrices