Rank data, in which each row is a complete or partial ranking of

available items (columns), is ubiquitous. It can be used to

represent, for instance, preferences of users, the levels of gene

expression, and the outcomes of sports events. While rank data has

been analysed in the data mining literature, mining patterns in such

data has so far not received much attention.

In this talk, I will discuss matrix factorisation based methods for

pattern set mining in rank data. First, I will discuss a general

framework called Semiring Rank Matrix Factorisation. The framework

employs semiring theory rather than relying on the traditional linear

algebra for matrix factorisation, which results in a more elegant way

of aggregating rankings.

Subsequently, I will introduce two instantiations of the framework:

Sparse RMF and ranked tiling. We introduce Sparse RMF to mine a set

of sparse rank vectors that can be used to summarise given rank

matrices succinctly and show the main categories of rankings. We

introduce ranked tiling to discover a set of data regions in a rank

matrix which have high ranks. Such data regions are interesting as

they can show local associations between subsets of the rows and

subsets of the columns of the given matrices.

Finally, I will discuss how to use ranked tiling to formally define

the concept of driver pathways, from which we can find cancer

subtypes, i.e., groups of tumour samples having the same molecular

mechanism driving tumorigenesis.

# T. Le Van (Inria Magnet): Semiring Rank Matrix Factorisation

Institutional tag:

Thematic tag(s):

Dates:

Thursday, April 6, 2017 - 10:00 to 11:00

Location:

Inria B31

Speaker(s):

Thanh Le Van

Affiliation(s):

Inria Magnet