Sparse Poisson Log Normal Model for Zero-Inflated Multivariate Count Data

Abstract

The Zero-Inflated Poisson Log-Normal model (ZIPLN) is a model for count data with excess zeros. It assumes that the observed zeros come from a mixture of a degenerate distribution at zero and a Poisson Log-Normal model (PLN), with both components influenced by explanatory covariates. For example, in abundance data of microbial communities, depending on the experimental conditions and covariates, some species may be absent or have a number of occurrences that follows a PLN distribution. When the observed abundances depend on a large number of covariates, it becomes necessary to implement variable selection procedures to improve interpretability, reduce noise, and simplify the model. This paper extends the Smooth Information Criterion (SIC) to the ZIPLN model. The proposed method consists of penalizing an Evidence Lower Bound (ELBO) of the marginal log-likelihood associated with the ZIPLN model using a smooth approximation of the L0 -norm. The resulting algorithm combines a specific algorithm for the SIC, ε-telescoping, and a Variational Expectation Maximisation algorithm (VEM). Our study aims to induce sparsity on the regressors explaining the random occurrences of the ZIPLN model. To support our proposal, we conduct simulation studies and apply it to data from a microbial communities study involved in the milk production process.

Publication
Submitted paper
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.

KIOYE Togo Jean Yves
KIOYE Togo Jean Yves
PhD student

My research interests include statistical learning, big data matter.