Page 13 - Contributed Paper Session (CPS) - Volume 4
P. 13
CPS2101 Bertail Patrice et al.
machine-learning communities (see [5] for instance), no rigorous asymptotic
framework for statistical recovery of the NMF, even in a simple parametric
setup, has been given yet in the statistical learning literature. It is the goal of
this paper to formulate NMF as an identifiable statistical problem, for which
M-estimation techniques in a semiparametric context, yield consistent
estimates. We will compute the efficient scores (see the terminology in [3]for
instance), the efficiency bound and propose new efficient semiparametric
estimation methods based on a estimated version of efficient score. We will
see that the NMF model has some strong links with the dimension reduction
method considered in single index models so that the recent paper by [8] is
also of interest for our work.
It is next shown how to use popular model selection methods in order to
choose the number of latent vectors involved in the NMF representation.
Consistency of the maximum-penalized-likelihood estimator is proved in this
context, when the penalty term is the Bayesian Information Criterion. Finally,
these approaches are illustrated by preliminary simulation results.
2. Background theory and concepts
2
In the following, for any (p,q) ∈ N , we denote by Mpq(R+) the space of p
∗
× q matrices with nonnegative entries. det(M) is the determinant of any square
matrix M with real entries, A denotes the transpose of any rectangular matrix
t
A . ||.|| is the euclidian norm on R . The indicator function of any event E is
F
denoted by I{E}. Finally, we use ΦF for the characteristic function of any
F
probability distribution F on R and by ”⇒” the convergence in distribution. If
a rectangular matrix A is full rank, we denote by A −1 the Moore-Penrose
generalized pseudo-inverse of A, refer to [1]. Recall that we have A −1 =
−1
t
−1 t
(A A) A , denoting by M the standard inverse of a square matrix M and by
t
Q the transpose of any matrix Q, see [4] for instance.
Let F ≥ 1 be the dimension of the space where the observations lie. The
NMF task can be formulated as follows. One observes (column) vectors vi =
(v1i, ..., vFi), 1 ≤ i ≤ n, with nonnegative coefficients: ∀(f,i) ∈ {1, ..., F}×{1, ..., n}, vfi
≥ 0. It is believed that these data can be ’well described’ by a conical hull
generated by K ≤ F linearly independent vectors W.1, ..., W.K lying in the positive
orthant that is
= {∑ ℎ : ℎ ≥ 0} (1)
=1
With Wfk ≥ 0 for all (f , k) ∈ {1,…,F} x {1,…,K}.
Assume that the observed data are i.i.d. copies of the random vector:
v = Wh (2)
where W ∈ MFK(R+) and h is a random column vector of length K with
distribution G(dh) supported by the positive orthant . In the following we
2 | I S I W S C 2 0 1 9