An R Package Exploration

December 22, 2023

If you’re working with non-normally distributed data and looking for a tool to perform principal component analysis (PCA), then the glmpca R package might be just what you need. In this article, we’ll delve into the details of this powerful package and explore how it can help you extract valuable insights and drive data exploration.

The glmpca R package provides a generalized approach to performing PCA on datasets that do not follow a normal distribution. This is particularly useful when dealing with skewed or heavy-tailed distributions commonly found in real-world datasets. By incorporating generalized linear models (GLMs) into the PCA framework, glmpca enables you to uncover meaningful patterns and relationships in your data that may be obscured by traditional PCA methods.

To get started with glmpca, you can easily install the package from CRAN by using the following command:

install.packages("glmpca")

If you prefer working with the latest development version, you can install it directly from the GitHub repository using the following command:

remotes::install_github("willtownes/glmpca")

Once installed, you can use the glmpca package to perform PCA on your non-normally distributed datasets. The package provides a straightforward interface for analyzing your data and extracting latent structures. You can easily visualize the results using the provided plotting functions, gaining insights into the underlying patterns and clusters present in your data.

For example, let’s say you have a dataset with two clusters and you want to visualize the latent structure within it. Using the glmpca package, you can easily achieve this by following these steps:

library(glmpca)

#create a simple dataset with two clusters
mu <- rep(c(.5,3), each = 10)
mu <- matrix(exp(rnorm(100*20)), nrow = 100)
mu[,1:10] <- mu[,1:10] * exp(rnorm(100))
clust <- rep(c("red", "black"), each = 10)
Y <- matrix(rpois(prod(dim(mu)), mu), nrow = nrow(mu))

#visualize the latent structure
res <- glmpca(Y, 2)
factors <- res$factors
plot(factors[, 1], factors[, 2], col = clust, pch = 19)

By following these steps, you can easily visualize the latent structure in your data, gaining insights into the relationship between variables and clusters.

For more in-depth guidance and examples, the glmpca package provides comprehensive vignettes that cover various aspects of its functionality. Additionally, if you are working with Bioconductor or Seurat objects, you can find compatibility tools such as scry and Seurat-wrappers respectively, which can enhance your analysis workflow.

If you encounter any issues or have suggestions for improvement, the glmpca GitHub repository provides a dedicated issue tracker where you can submit bug reports and provide valuable feedback.

In conclusion, the glmpca R package offers a powerful solution for performing PCA on non-normally distributed data. By leveraging generalized linear models, glmpca enables you to extract valuable insights and visualize latent structures that may be hidden in your datasets. Give it a try and empower yourself with a versatile tool for data analysis.

References:

glmpca GitHub Repository
glmpca Pytohn implementation
“Feature Selection and Dimension Reduction based on a Multinomial Model” (doi:10.1186/s13059-019-1861-6)
glmpca CRAN package

Note: The glmpca package and its associated materials are subject to their respective licenses. Please refer to the documentation and license information provided in the repositories for the package and its dependencies.

Group Sum

An R Package Exploration

Leave a Reply Cancel reply