The Zero-Inflated Poisson Log-Normal model (ZIPLN) is a model for count data with excess zeros. It assumes that the observed zeros come from a mixture of a degenerate distribution at zero and a Poisson Log-Normal model (PLN), with both components influenced by explanatory covariates. For example, in abundance data of microbial communities, depending on the experimental conditions and covariates, some species may be absent or have a number of occurrences that follows a PLN distribution. When the observed abundances depend on a large number of covariates, it becomes necessary to implement variable selection procedures to improve interpretability, reduce noise, and simplify the model. This paper extends the Smooth Information Criterion (SIC) to the ZIPLN model. The proposed method consists of penalizing an Evidence Lower Bound (ELBO) of the marginal log-likelihood associated with the ZIPLN model using a smooth approximation of the L0 -norm. The resulting algorithm combines a specific algorithm for the SIC, ε-telescoping, and a Variational Expectation Maximisation algorithm (VEM). Our study aims to induce sparsity on the regressors explaining the random occurrences of the ZIPLN model. To support our proposal, we conduct simulation studies and apply it to data from a microbial communities study involved in the milk production process.
Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.