In the framework of model-based clustering a model allowing for several latent class variables is proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables, each one following its own mixture distribution. The considered mixture distribution is a latent class model (i.e. conditional independence assumption). The proposed model includes variables selection as a special case and is able to cope with the mixed-data setting. The simplicity of the model allows to estimate the repartition of the variables into blocks and the mixtures parameters simultaneously, thus avoiding the need to run EM algorithms for each possible repartition of variables into blocks. The considered model choice criteria used to determine the number of block, the number of cluster inside each block and the repartition of variables into block are the BIC and the MICL criteria for which an efficient optimisation is proposed. The performances of the model are studied on simulated and real data for which it is shown to give a rich interpretation of the dataset at hand, i.e. analysis of the repartition of the variables into blocks and analysis of the clusters produced by each block of variables.

# A Tractable Multi-Partitions Clustering

Institutional tag:

Thematic tag(s):

Dates:

Tuesday, January 23, 2018 - 11:00 to 12:00

Location:

Inria Lille - Nord Europe, salle plénière

Speaker(s):

Vincent Vandewalle

Affiliation(s):

Université de Lille, Département de statistique et informatique décisionnelle

Speaker's URL: