Analysis of the residuals

Workspace setup

Load the package in the workspace.

library(sgdGMF)

Load other useful packages in the workspace.

library(ggplot2)
library(ggpubr)
library(reshape2)

Ant traits data

Load the ant traits data in the workspace and define the response matrix Y and covariate matrices X and Z.

# install.packages("mvabund")
# data(antTraits, package = "mvabund")

load(url("https://raw.githubusercontent.com/cran/mvabund/master/data/antTraits.RData"))

Y = as.matrix(antTraits$abund)
X = as.matrix(antTraits$env[,-3])
Z = matrix(1, nrow = ncol(Y), ncol = 1)

n = nrow(Y)
m = ncol(Y)

Model specification

Set the model family to Poisson since the response matrix contain count data.

family = poisson()

Select the optimal number of latent factors using the function , which employs an adjusted eigenvalue thresholding method to identify the optimal elbow point of a screeplot.

ncomp = sgdgmf.rank(Y = Y, X = X, Z = Z, family = family)$ncomp
cat("Selected rank: ", ncomp)
#> Selected rank:  2

Model estimation

Estimate a Poisson GMF model using iterated least squares.

gmf = sgdgmf.fit(Y, X, Z, ncomp = ncomp, family = family, method = "airwls")

Model validation

Compute the deviance residuals of the model the estimated matrix factorization. Additionally, compute the spectrum of such a residual matrix.

res = residuals(gmf, spectrum = TRUE, ncomp = 20)

Compare the residuals of two competing models: VGLM and GMF. Notice that VGLM is a particular case of GMF of which only include the regression effects and does not include a residual matrix factorization in the linear predictor.

ggpubr::ggarrange(
  plot(gmf, type = "res-idx"),
  plot(gmf, type = "res-fit"),
  plot(gmf, type = "hist"),
  plot(gmf, type = "qq"),
  nrow = 2, ncol = 2, align = "hv")

We now have a look to the spectrum of the residual matrices, i.e., the eigenvalues of the corresponding covariance matrix. However, instead of analyzing the actual values of the eigenvalues, we normalize them in such a way to plot the percentage of variance explained by each principal component.

ggpubr::ggarrange(
  screeplot(gmf, cumulative = FALSE, proportion = TRUE),
  screeplot(gmf, cumulative = TRUE, proportion = TRUE),
  nrow = 1, ncol = 2, align = "hv")

Observations vs fitted values

Plot the deviance and Pearson residuals using a heatmap. This could be helpful to graphically detect if there are some structured patterns in the matrix that have not been captured by the model.

plt.dev = image(gmf, type = "deviance", resid = TRUE, symmetric = TRUE)
plt.prs = image(gmf, type = "pearson", resid = TRUE, symmetric = TRUE)

ggpubr::ggarrange(
  plt.dev + labs(x = "Species", y = "Environments", title = "Deviance residuals"), 
  plt.prs + labs(x = "Species", y = "Environments", title = "Pearson residuals"),
  nrow = 1, ncol = 2, common.legend = FALSE, legend = "bottom", align = "hv")