Worked Example: msPCA on mtcars

Overview

This vignette shows the basic workflow of msPCA on the built-in mtcars dataset. We compute two sparse principal components, inspect the solution, and compare the sparse result with dense PCA.

Install and load

Install the package directly from CRAN.

install.packages("msPCA")

You can then load the package as usual.

library(msPCA)

Fit two sparse PCs

We work with the correlation matrix of mtcars and ask for two 4-sparse principal components under the default orthogonality constraint.

Sigma <- cor(datasets::mtcars)

set.seed(42)
res <- mspca(Sigma, r = 2, ks = c(4, 4), feasibilityConstraintType = 0, verbose = FALSE)
print_mspca(res, Sigma)
#> 
#> msPCA solution:
#> 2 sparse PCs 
#> Pct. of variance explained: 32.5 28.0 
#> Num. of non-zero loadings :  4 4 
#> Sparse PCs 
#>        [,1]   [,2]
#> mpg  -0.499  0.000
#> cyl   0.495  0.000
#> disp  0.510  0.000
#> hp    0.000 -0.518
#> wt    0.495  0.000
#> qsec  0.000  0.506
#> vs    0.000  0.494
#> carb  0.000 -0.482

Orthogonality versus zero correlation

Sparse PCA typically requires a constraint to avoid redundancy between the PCs. Traditionally, this is done by enforcing orthogonality of the loading vectors, which is the default in mspca. Another notion of non-redundancy is to enforce zero pairwise correlation between the PCs. The package allows for both options, and the choice can lead to different solutions when the variables are strongly correlated. feasibilityConstraintType = 0 (default) enforces orthogonality of the loading vectors. feasibilityConstraintType = 1 instead enforces zero pairwise correlation between the resulting components.

res_corr <- mspca(Sigma, r = 2, ks = c(4, 4), feasibilityConstraintType = 1, verbose = FALSE)
print_mspca(res_corr, Sigma)
#> 
#> msPCA solution:
#> 2 sparse PCs 
#> Pct. of variance explained: 24.7 22.8 
#> Num. of non-zero loadings :  4 4 
#> Sparse PCs 
#>        [,1]   [,2]
#> hp    0.312  0.000
#> drat  0.000 -0.337
#> wt    0.000  0.087
#> qsec -0.674  0.000
#> vs   -0.279  0.000
#> am    0.000 -0.624
#> gear  0.000 -0.700
#> carb  0.609  0.000

Diagnostics

The package provides helper functions for checking feasibility and summarizing variance explained. Below, we report the same diagnostic checks for each fitted solution.

cat("Diagnostics for res (feasibilityConstraintType = 0)\n")
#> Diagnostics for res (feasibilityConstraintType = 0)
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 0)
#> [1] 0
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 1)
#> [1] 2.335602
fraction_variance_explained(Sigma, res$x_best)
#> [1] 0.6043866
fraction_variance_explained_perPC(Sigma, res$x_best)
#> [1] 0.3245835 0.2798031

cat("\nDiagnostics for res_corr (feasibilityConstraintType = 1)\n")
#> 
#> Diagnostics for res_corr (feasibilityConstraintType = 1)
feasibility_violation_off(Sigma, res_corr$x_best, feasibilityConstraintType = 0)
#> [1] 0
feasibility_violation_off(Sigma, res_corr$x_best, feasibilityConstraintType = 1)
#> [1] 9.62078e-05
fraction_variance_explained(Sigma, res_corr$x_best)
#> [1] 0.4753908
fraction_variance_explained_perPC(Sigma, res_corr$x_best)
#> [1] 0.2472306 0.2281602

Comparison with dense PCA

For reference, the first two dense principal components explain more variance, but they are not sparse.

pca_res <- prcomp(datasets::mtcars, scale. = TRUE)
fraction_variance_explained(Sigma, pca_res$rotation[, 1:2])
#> [1] 0.8417153

Interpretation

Sparse PCA typically trades some explained variance for a much more interpretable loading pattern. For a quick summary of the fitted components, print_mspca() is usually the most useful first diagnostic.