To facilitate training with gradients, supervised learning methods often

transform selecting a single element within a set of outputs to predicting a

probability distribution over this set (using e.g. the softmax operator). In

this talk, we will understand this transformation as a functional smoothing of

the output selection mechanism. Engineering this Nesterov smoothing yields new modelling perspective. First, we will observe that

selecting an output within a combinatorial set (e.g. a sequence of tags) is

often solved using dynamic programming algorithms. Smoothing turn DP algorithms

into differentiable operators, that may predict potentially sparse

probabilities over the output set. Secondly, we will design a smoothing that

takes into account a cost function defined on the output set. This approach

transforms the softmax operator into a cost-informed geometric softmax, that

has the further capabilities of predicting distributions over a continuous set.

# A. Mensch (ENS Paris): Functional smoothing for sparse and cost-informed prediction of output distributions

Institutional tag:

Thematic tag(s):

Dates:

Thursday, November 14, 2019 - 11:00 to 12:00

Location:

Inria B21

Speaker(s):

Arthur Mensch

Speaker's URL: